Voices of Video

The Onion Layers of Video: A Deep Dive into WebRTC

NETINT Technologies Season 2 Episode 15

WebRTC pioneer Tsahi Levent-Levi shares his extensive knowledge on this real-time communication protocol, explaining its inner workings, challenges, and proper implementation approaches.

• WebRTC consists of both a protocol stack (standard specification) and Google's implementation (libwebrtc) used in all major browsers
• The protocol is designed specifically for real-time communication with sub-second latency requirements
• When building with WebRTC, consider using third-party solutions rather than building from scratch
• Quality challenges arise from network unpredictability, requiring compromises to maintain real-time communication
• Simulcast (creating multiple streams at different bitrates) remains more widely adopted than SVC due to hardware compatibility
• Media servers are essential for scaling WebRTC applications beyond peer-to-peer communications
• WebRTC can scale to millions of users when properly implemented, but ultra-low latency requirements dramatically increase costs
• Companies should analyze actual problems before jumping to solutions like codec changes
• AV1 works well for text-heavy content at low bitrates but requires significant CPU resources

Stay tuned for more in-depth insights on video technology, trends, and practical applications. Subscribe to Voices of Video: Inside the Tech for exclusive, hands-on knowledge from the experts. For more resources, visit Voices of Video.

Mark Donnigan:

Voices of Video. Good morning. We are here for another episode of Voices of Video and, as always, we are bringing you the foremost experts and I would even say interesting people who are doing really fun and cool things in videos. So this week we are not going to disappoint you, I promise. And with that I want to welcome Sahid. Thank you for joining us. It's great to have you here, yeah. So let's just jump right in to this conversation, but first of all tell us who you are and how it is that you began to work in WebRTC.

Tsahi Levent-Levi:

How it is that you began to work in WebRTC. So Tzachil Levent-Levy, blog geek me for those who don't know and also Senior Director of Product Management at CR. So I guess I started doing voice and video over IP years ago, From the age of 21,. That was 25, 26 years ago. I've been doing that as a developer, then project manager, product manager and then CTO. I was CTO on one of the business units at Rad Vision back in the day, and then WebRTC came out, and when it came out, I looked at it. That was 2011.

Tsahi Levent-Levi:

I looked at it and I said well, something is going to change. I went back and said we need to invest in this. I gave four different angles they wanted to invest in. One of them was WebRTC. Give them half a year, a year, a person, and we'll decide what we want to do with it. They said no budget. And then they they said tell us what we're going to do next year so we can tell you what the budget, so we can use the budget. And I said, well, that's, that's not how you build the future as a cto. And I left um.

Tsahi Levent-Levi:

When I left, I opened blog blogic me as a website just to write the things that I wanted to write about and for one reason or it became the place for people to read about WebRTC unintentionally, and that's the case even today, A few years later. I co-founded TestRTC with two other people, a company that does WebRTC monitoring and testing. The company was acquired by Spearline in 2021, and then Spearline was acquired by Sierra in 2023, which is where I am now with testing and monitoring, and other than that, I've got my courses on WebRTC and a bit of consulting around WebRTC. So that's where I come from to this specific space.

Mark Donnigan:

Yeah, it's great, I mean, you've got. I really like you know, I've heard you tell the story before. You know when Google first introduced WebRTC as a protocol. And we're going to talk, by the way, we're going to jump right in here to really what WebRTC is, because I don't think everybody on the surface you know, all of our listeners are saying, well, of course I know what WebRTC is, but you know it's a protocol, it's not actually, you know, an application, even though it gets talked about a lot. But yeah, I've heard you tell the story that in 2011, you read about it, you learned about it and you said the world's going to change in terms of video, and it certainly has. So why don't we start there? Can you tell us go ahead and assume that maybe not everybody is an expert in understanding the WebRTC protocol and what the components are?

Tsahi Levent-Levi:

Okay, so let's start from the beginning. We actually need to stop, because it's Israel and there is a siren now.

Mark Donnigan:

Oh, no, yes. So, if you're looking to continue about two or three minutes. But that's like yeah, yeah, well, you need to you, you, you, you take care of yourself.

Tsahi Levent-Levi:

I'm in the right room, but we need to close here and stuff yeah, yeah, no, you take care of yourself, okay.

Mark Donnigan:

um, well, this is live, and so you know this is live. And, uh, we on Voices of Video absolutely stand with Israel. So Saeed is joining us from Israel, and so we want to take this very serious. So I think, while we're waiting for Saeed to rejoin us, I might make a few comments. I'm actually joining.

Mark Donnigan:

For those that see me on video, you know that my background's a little bit different and even blurred. You can probably tell I'm in a hotel. So so, yeah, I am here in San Francisco where Demuxed just wrapped up, and if any of you were at Demuxed and we didn't get to, you know, to meet or shake hands or get introduced, then apologies, because we would love to meet all of you. But what an amazing conference. And if DeMux has been on your short list or on your list of must-attend conferences, I really can't recommend it highly enough.

Mark Donnigan:

You know it's two days of very interesting talks. It is an engineering conference, it is for engineers and it's, you know, wonderful time and networking. You know there's, like I said, a lot of really great conversations, but you know some really, really interesting themes that are coming out. You know, both at DeMux and other venues at the moment, and it actually plays into our conversation right now there's one very common theme and that is that anybody who isn't taking a very, very hard look or a sharp pencil to your operational cost, pencil to your operational cost. Well, it's coming, get ready. You might as well start now, and I think everybody's already well on the way. And then there is this whole move to live workflows and, of course, ultra low latency plays into this, and obviously that's where web rtc uh comes into play here. But, um, uh, this, this shift of uh live, and then reducing costs. Hey, saiyan, all right, hello good okay, that's live here.

Tsahi Levent-Levi:

Word, yeah, this is live.

Mark Donnigan:

I, I was, I, I I felt like a news reporter. I'm like well, let's see. So let's talk about the mox and you know, fill in there. So anyway, um, good welcome back um yeah, and we just started, so web yeah, right, so it's perfect web rtc okay.

Tsahi Levent-Levi:

So when you look at web rtc, the first thing is that it's two separate things by the way, it's not one. The first one is there is a protocol stack, which is a standard specification of what is WebRTC People are getting out of the room now and the second thing is that it's actually the implementation of Google itself of that protocol stack, and that implementation today is called libwebrtc, but it's just webrtc and this is what is currently implemented in all browsers. So if you use Chrome or Firefox or Edge or Safari, they all use the same implementation, with different trappers around that, but at the end of the day, this is the main implementation. There are others, but this is the main one. From the perspective of what webr tc is designed to do, okay, from my point of view and I come from video conferencing so weber tc is a kind of a media engine. Okay, we had media engine before. You have a video conferencing solution means I have media engine there somewhere that needs to handle the voice and video. That media engine is built and designed for real time, live stuff. I don't care about stuff that goes over VOD, things that are streamed and we can send it to you and if you get it a minute from now. That's fine. We're not talking about this thing. It's live. I need it below one second or things are going to be bad.

Tsahi Levent-Levi:

Okay, and the main difference between that media engine and everything else that came before it is that it has a standardized API. Okay, it's very important because up until then, what you had is different voice over protocols, where someone needed to go think about what the API surface need to be and then implement it and whatever. Webrtc came and said well, you know, there is an API. That API is JavaScript. We're going to plug it into the browser and this is how developers are going to use it. It's not. This is how developers need to implement WebRTC. It's how developers need to use WebRTC when they run inside the browser. These are like the capabilities that we're giving them.

Tsahi Levent-Levi:

So, first of all, it means that you don't have access to everything Like yes, you get packets over the network. Yes, they receive over UDP. No, as an application in the browser, you don't have access to them. If you have a native application, yes, but not in the browser. Why? Because nobody gave you that kind of an API surface.

Tsahi Levent-Levi:

And if you have problems with that, go to the W3C and explain why you need it, and go through that motion of that process and that mindset changed a lot of things, because now what happens is you don't need to learn voice over ip in order to build a voice over ip application. You just need to be a developer and a web developer, and there are a bunch of them, a lot more than voice over ip developers, and they're building a lot of different things, and some of the things that they are building are not even video conferencing, they're just streaming services or totally different other things that nobody thought about before giving them such an API, and that, for me, is what makes it so magical and gives it all of the power that it has today.

Mark Donnigan:

Somebody who wants to implement. You know, let's walk through the process of building with WebRTC, you know, thank you for that introduction, so we do. I'm sure in our audience we have some folks who are working in video conferencing, so I don't want to dismiss anyone who is but mostly, you know, our audience would be building. You know, let's say, direct to consumer type applications. You know, live streaming or, you know, maybe working in a very large social network where you're trying to, you know, give your users ability to communicate, et cetera. So how would you begin approaching, you know, building or implementing WebRTC into your solution?

Tsahi Levent-Levi:

I would start by using a third party, so I don't need to learn WebRTC too much.

Mark Donnigan:

It's too much of a hassle.

Tsahi Levent-Levi:

That's the main thing, but really. So, first of all, you need to decide where to use it and why. Let's say that I want to build the next netflix. Okay, I wouldn't touch webrtc with the long stick. Now, why is that? Because all of the content on netflix was pre-recorded, okay, and if I'm going to play it back, I don't care if you wait five seconds before your movie begins. I just don't care. It's fine, you won't care either, and if you care, then we can optimize it to two seconds and everyone will be happy. But there is no need for me to go down to zero. And now, if you go down to zero, you start losing quality.

Tsahi Levent-Levi:

Okay, and WebRTC is about compromises. What does it mean? I'm going, I'm using udp and not tcp on the network because I don't need retransmissions. Retransmissions are bad in webrtc. Why, if I'm sending you video and you don't get it, and then you I need to retransmit it? Once you will get it again, it might be too late to use it. So there are times and instances where user transmissions in webrtc for specific things, but they're not for the things that you use them in streaming. Usually there is no buffering in webrtc in the sense that you see in streaming videos, yeah, okay. So when you say I want live and I care more about the fact that this will be live, then I care about the quality of what I'm receiving. Let's say I'm betting on something in sports in, then live is more important for me than adding latency, because if I had latency, it means that someone's someone has an edge on me when he's betting. He knows things before.

Tsahi Levent-Levi:

It's like you know, this latency of two seconds becomes important, or five seconds. I want to hear the goal, along with all of my neighbors, not after them. Okay, I want to do this session with you, mark, and then we want to stream that to someone else. So at least the both of us needs to be in a need to be in a conversation that is live. So the first thing to ask yourself is where exactly are you going to place WebRTC, because there are different places to plug them in. So the first thing is well, the viewers do they need it live, yes or no? If they don't, don't use WebRTC. If they do, okay, that's, you know, we might. That's a good consideration of using WebRTC. Let's go to the inverse. The broadcasters Should they be on WebRTC? Well, maybe, if they are doing this kind of conversation, they should be at least between themselves.

Tsahi Levent-Levi:

But then someone has to go mix that and record that.

Mark Donnigan:

And then it goes where?

Tsahi Levent-Levi:

Yeah, that's right. Or let's say you know what? I want to have someone join from a browser. I don't want him to need to install OBS and I don't need him to install any other application. Just want him to open URL in his browser and magically is there Okay, and he can broadcast from the browser from anywhere. The only way to actually do that today is we're practicing. Okay.

Tsahi Levent-Levi:

So when I start looking at the solution, I would look at each and every component that requires video and I would start asking myself things like what's the latency? What is being used? Does it need to run in the browser or can I use something else? Would using a browser for the user with a camera or microphone be useful for me? Is that beneficial? Does it give me an edge? And again, if the answer is yes, I'll go do that and use WebRTC or microphone be useful for me? Is that beneficial? Does it give me an edge? And again, if the answer is yes, I'll go do that and use WebRTC. Okay. Usually I would use a third party, either commercial or open source. There are many of them out there. Self-development is nice, but you still need to start somewhere. Nobody starts by using WebRTC directly, not in these domains.

Mark Donnigan:

When you say third party you're talking about, like someone who's developed an SDK that is supported, that I can license and they'll help me. What do you mean by that?

Tsahi Levent-Levi:

There are three different alternatives. The first one is I am going to go and use a CPaaS vendor communication platform as a service.

Tsahi Levent-Levi:

I'm going to use Max, webrtc and streaming. I can use Twilio to do the video Vonage daily and Dolby and many other companies. What they do is they give me a managed service that I can use and that includes WebRTC and I can just go and embed that into my application. The whole experience around that and some of them, their main focus is going to be streaming. Dolby has a solution for that. It daily goes to, I think, up to 10,000 or 100,000 viewers 100,000 viewers. And then you have Stream Cloud, cloudflare that has their own solution Cloudflare.

Mark Donnigan:

Okay, yeah, there's LiveSwitch.

Tsahi Levent-Levi:

They have a solution. There's a lot of them. The other one is well, I'm going to go with the streaming vendors, but there's WebRTC solutions and I'm going to use them because, well, I know how to go with the streaming vendors. But there's WebRTC solutions and I'm going to use them because, well, I know how to use them. There you'll find Red 5, Pro, Woza, AntMedia, Nanocosmos these vendors where what they give you is the traditional or classic solutions for streaming that also have WebRTC support, so they know how to mix the two. And third alternative is to go it alone. I'm going to develop it and I'm going to use an open source platform. I'm going to use Janus or MediaSoup for that.

Tsahi Levent-Levi:

I'll build it from scratch or I'll use Pion and different types of usually media servers that would be able to get the traffic that I need and then forward it to wherever I need it, and then from that I can mix and match the solution that I need around that. Okay, now, if you go that route, you should really know WebRTC well.

Mark Donnigan:

I understand you mentioned about quality, that quality in WebRTC is more challenged. Talk to us about the codecs that are supported and then where those quality cliffs are. So why is it? You know there's a quality issue.

Tsahi Levent-Levi:

Codecs for audio. You've got Opus. Mainly there's G711, but everyone uses Opus For video. Today there's VP8, vp9, h.264, and AV1.

Mark Donnigan:

All of them are available.

Tsahi Levent-Levi:

The most common one is still going to be VP8 and H.264. You'll see a lot more VP9 these days and you see the beginnings of AV1, especially on the decoding side in the browser. Encoders are hardware-based and different in nature, so everything is there, or most of the ones that are interesting. Hevc there are noises from Apple of doing that in iOS. The main challenge there is going to be all the patents and royalties around that. So it seems that most companies in the domain of WebRTC are shying away from using HEVC and they would use H.264 only if they must. That's usually how things happen with WebRTC today.

Tsahi Levent-Levi:

The next part of your question was about quality and cliffs.

Tsahi Levent-Levi:

So I guess if the network is great and it never is then the quality is going to be great period.

Tsahi Levent-Levi:

Either that and you've got issues with the CPU and it's going to it and the stupid things are about processing power that is needed, but mostly you'll get issues from the network if the application is built correctly. Now the challenge with a network is that you don't really own or control it in any way. I'm doing that here from my desktop with my machine sitting right next to the soft switch that I have and that one is connected with Ethernet cable to this machine because I know the things that I need to use and it's fiber to the home and I pay dearly for that. That's not necessarily the case with all other people. This is me because that's my job, but mostly you won't find that people will be on their wi-fi, located far from the access point, trying to do video calls while they drive, yeah, on highway, over cellular, okay. And then the problem with these networks is that they have packet losses on them and latencies and jitter on all of the nice things that we like to say about these networks.

Mark Donnigan:

And what?

Tsahi Levent-Levi:

happens is this Again if I throw a packet at you, if you catch it, you can decode it, but if you can't, happens then okay, and if you go to the let's call it the traditional streaming way, I'm throwing a packet at you, but I'm not throwing it directly, I'm putting it inside tcp or htps, which is tcp, and tcp will make sure that, if I sent it to you, you're going to receive it, because if you don't receive it, I'm going to retransmit it again and again, and again, until you either receive it or we both decide that the connection got broken. Okay. So life is going to be easy. The only thing that's going to happen is a bit of buffering. I'm going to be stuck on your screen and then it's going to continue right where we stopped. You can't do it with live, so I'm throwing packets at you, okay, if you don't catch them because they didn't arrive. What happens then is that you need to do something about that, or we need to do something about it.

Tsahi Levent-Levi:

So one thing that you can do is say well, you know what, I missed that packet, but I'm going to assume that was an audio packet and I'll take the previous one that I received and just reduce a bit the volume. And you know, pray for hope for the best yeah, I'll skip that one or I didn't receive it on time. But I can wait 20 or 30 more milliseconds just to be sure, or I can go and say, well, I didn't receive it. I know that it's an important packet so I'm going to ask for that to be retransmitted again.

Tsahi Levent-Levi:

Yeah, okay, so there are a lot of mechanisms in there that are going to be there to try and fix things, but you're fixing them while you know that they're broken and it's like show must go on. I cannot just wait and pause everyone Like we had this.

Mark Donnigan:

you know this chat between us, and then the siren came up here and then a few minutes I was out, and that was the end of it, but you had to continue.

Tsahi Levent-Levi:

Yeah, okay, yeah, because this is live. So assume the same thing, but shrink it into like 50, 100 milliseconds of time. Yeah, yeah, hundred milliseconds of time, yeah, yeah, you cannot just I cannot lose that audio because then whatever I'm saying you won't understand or my video won't pass the way it should be. And again we have a challenge, because the tools that we have must be tools that run in real time and cannot use retransmissions too much I think most of our listeners are probably familiar with scalable video codecs.

Mark Donnigan:

But that's something that, if you haven't worked in WebRTC, is kind of an interesting concept and the concept you know and so you can actually explain it. Why don't you explain it Both for temporal and spatial? You know scalability, because it is interesting how how these bandwidth issues are handled, you know.

Tsahi Levent-Levi:

I think let's start with something simple in live in streaming but usually do is something called ABR, right, adaptive bitrate, whatever, which means I'm going to receive the video on the server. I have time, so I'll take that video, you know, and then transcode it into five, eight, whatever number of other streams that I want, each bit stream with different bit rates, and keep the qualities and capabilities. I'm going to segment it into two seconds or whatever, so someone can jump from one to the next. Now, webrtc doesn't work like that, not because it doesn't, but because WebRTC again is for video conferencing and in video conferencing I don't have the time to do these eight different bitrates and also I don't need to because this isn't going to a million people, it's going to the other three people. So I don't want to a million people, it's going to the other three people, so I don't want to invest so much energy for so little benefit.

Tsahi Levent-Levi:

Yeah, so what happened in webrtc is the first thing is I don't know what the bitrate is going to be like, so I'm going to just dynamically decide on the call while we're on this call, using zoom, which isn't webrtc, but it's going to use the same kind of a mechanism, zoom. Zoom is going to check on each side of this call how much bitrate is available for me and for you, and then it's going to play with that throughout the call, going up or down based on the network and the CPU capabilities and everything else. Now, just this movement of my camera change the amount of bits that needs to be sent over video and that might change how bit rates are going to be allocated for the call.

Mark Donnigan:

Okay.

Tsahi Levent-Levi:

So the first thing that you have is a bandwidth estimator inside WebRTC that transdynamically changes the bit rate. That's the first thing. Now the second thing which means that you don't get perfect quality at all time, because I don't have the network for that I might not have. The second thing is that you can use scalable video, svc. Okay, scalable video coding. It's not that common, by the way, today with WebRTC. It's there, but a lot don't use it. With scalable video coding, I can create a single encoded bitstream for video and then layer it as if it's an onion and each layer is going to add something to the video. I can say the first layer is a temporary layer with low resolution and 15 frames per second. In the second layer I'm going to add more frames, so go to 30 frames per second.

Mark Donnigan:

In the second layer I'm going to add more frames, so go to 30 frames per second.

Tsahi Levent-Levi:

The next one is going to add resolution and the last one is going to add like quality on that resolution. Anyway, now I'm going to take that video that I create that bitstream, send it to a server. Now the server can take that and open, peel that. That layers like an onion and he can use how many layers he wants once he's sending it to other participants. So now I'm gaining something like ABR on the server without the server needing to transcode anything.

Tsahi Levent-Levi:

So you can use less CPU but serve a lot of different participants that have different capabilities, either because of network or CPU power.

Tsahi Levent-Levi:

This is nice, but what's actually really used in WebRTC in a lot of scenarios is something that is called simulcast. In simulcast, what I'm going to do is something else. I'm going to create two or three different videos of the same source, each one in different bit rates, so one of them is going to be at a hundred kilobits per second, the next one is 500. And the third one with whatever's left. I'm sending all of these three to the server and the server will decide which one to send to whom. It's like a poor man's SVC, yeah, yeah, why do I do do that? Why do I need that? Svc doesn't work well with hardware encoders and decoders, which means that everything needs to be done on the c, on the cpu. But simulcast is something that hardware decoders and coders are fine with. That hardware decoders and encoders are fine with. It makes it simpler for them to do that, and the extra bandwidth that I'm paying for is not that high in a lot of different use cases.

Mark Donnigan:

There's a lot to WebRTC. You started a company doing tests and you know helping companies test. Talk to us about where you see the biggest. You know pitfalls or what are the common mistakes that people make in building their workflows or implementing, even if they're turning to a third party. You know what are some lessons learned that you can share.

Tsahi Levent-Levi:

I think the biggest one is not knowing or understanding WebRTC. And when you don't know it, you come with unrealistic expectations. Okay, I'll give an example. I had someone come to me and say I want to build an application that was before the pandemic even I want to build an application that was before the pandemic, even I want to build an application that allows people to go travel the world. I want them to be able to go. You know, see the himalayas. So the actual person out there goes into the himalayas with his ipad and the person at home can watch that from his comfort of his TV screen. It is so fine, the quality would be so great that it's better than Google Meet. And then you try and go to explain. Well, there is no network in the Malaise.

Tsahi Levent-Levi:

And Uber to See was created by Google, runs google meet. So how can you get a quality that is higher than that with something that is like it doesn't match? Okay, so, and a lot of this is an extreme, but there are a lot of these kind of unrealistic expectations yeah did they actually try and build it or did they listen to?

Tsahi Levent-Levi:

you? No, I'm. My kids call me Shatter of Dreams because I just print. That's my job most of the time. The other one most of the mistakes you'll see people do with TURN and with bitrate calculations. They just don't understand that Somehow TURN servers or NAT travers reversal is black magic for a lot and then when they try solving it, they solve it incorrectly.

Tsahi Levent-Levi:

So usually you will start. I get people come to me and say we have a problem. It doesn't work. Well, we're trying to optimize and and we're checking if we can use vp9 and you go like, okay, but what is the problem? Like, where did you start from? And the story it's a sub-story of well, we have one user on desktop, the other on mobile. We did a native application and the quality isn't good. Okay, why isn't it good? Is that the network, the CPUpu? Is it for a specific user in a specific location? Scenario? You don't rush.

Tsahi Levent-Levi:

It's like people rush to fix things before they understand what the problem really is, because they heard someone that somewhere that vp9 has better quality than vp8 and if I've got a quality problem with my video, it must be that it must be that I need a different codec go explain them that up until a few years ago mpeg-2 was still the dominant codec running their tv shows and they were happy with the quality there. So it might not be the quality, it might be something else, and just trying to, you know, let's go back to basics.

Tsahi Levent-Levi:

Let's see what the problem is, analyze that and then decide on what the correct solution is. So most of the time it's just it's that it's looking at figuring out what the actual problem is, not what solution doesn't work for that person, just understanding the problem before charging towards a solution.

Mark Donnigan:

That is gold, the advice that you just gave. Even you know the VP9 example, like you know. Oh I, you know I have a quality problem, so I need to implement a new codec because that's going to fix it.

Tsahi Levent-Levi:

You know I need a better codec. We're going to use AV1.

Mark Donnigan:

Yeah, av1, exactly, yeah, it's interesting. You know, obviously, netent, we have a hardware av1 uh encoder and so we're uh, in the middle of a lot of uh, av1, you know, migrations, or uh, it's one of the uh, you know, it's a major driver for us, um, but you know, we hear consistently that one of the very, very first tests that people do is they need to understand like, wait a second, if I'm going to adopt AV1, I need to make sure that it's at least on par with VP9. No, it needs to be better than VP9 in terms of, you know, bit rate, efficiency and the quality metrics. But you know, and in some ways that sounds like, well, that's obvious, right, but it's shocking how many valuations or engineers start running down a path and they haven't even started with the kind of first principles.

Tsahi Levent-Levi:

Yes, and I need a lot of CPU power to decode AV1, especially if what I'm looking for is high quality at high bitrate. Now, if you look, for example, today in video conferencing, where do you use AV1, especially if what I'm looking for is high quality at high bitrate? Yeah, Now if you look, for example, today in video conferencing, where do you use AV1? You use it in very low bitrates and you're starting to see companies experimenting with that for screen sharing. Why low bitrates?

Tsahi Levent-Levi:

Because, it takes so much CPU that it's easy to do on low bitrate and then you actually see a quality improvement. And why screen sharing? Because, text looks better in AV1 encoding.

Mark Donnigan:

Yes, so while you stepped away there for a few minutes, I was sharing a little bit from Demuxed, you know, since I just came back while I'm still in San Francisco, and there was a presentation yesterday from a company that showed a video conferencing application. It was screen sharing and they showed the difference between H.264, and it was a Wikipedia page and it was like H.264, like. So therefore, your point text, you know it was pretty much all text.

Mark Donnigan:

There were some images, but um, and I think it was around 900, maybe just slightly above 900 kilobits, um h.264, and then they showed av1 at like 170 kilobits, you know. So, um, yes, but you know, and it was pretty um remarkable the difference um yes, not only a bit rate, but you know it's. It's not well, why, why is it not a match? So so so, because in a way.

Tsahi Levent-Levi:

In a way, everyone has the tools that make it better for text period once you have that. You know there's no competition yeah but, the thing is, how the hell do I know beforehand that this is going to be the content that I'm going to encode? It's not as if I saw the movie at netflix. I know that this is you know this is. I don't know. It's a drama, it's um. You know it's a drawing, something that is drawn. How do you call that um?

Mark Donnigan:

you mean drama, oh, animated, yeah, animated movie. Or is that like connection?

Tsahi Levent-Levi:

I don't know that in advance, so you know I don't have the luxury of doing encoding three times in three different codecs and then deciding which one is best, deciding which one, yeah. Yes, netflix can do that and they are doing that Sure To reduce due traits. I can't it's live yeah.

Mark Donnigan:

Yeah, yeah, yeah, that's interesting.

Tsahi Levent-Levi:

You need to get everyone a lot more optimized.

Mark Donnigan:

Yeah.

Tsahi Levent-Levi:

In a lot of different, broader use cases to be able to work with it everywhere at all times. It's not that you can't use it today definitely you can and in some use cases it's the best thing ever but you need to know what it is that you're doing, if you're going to use that specific codec yeah and other products inside so I I want to.

Mark Donnigan:

you mentioned turn servers and it got me thinking that we haven't explained the path that a packet goes goes through using WebRTC from the moment that you know the photons hit my camera sensor and then they show up on your screen answer, and then they show up on your screen.

Tsahi Levent-Levi:

Why don't you real quick uh explain um you know it's complicated.

Mark Donnigan:

What does that look like? What is the traffic? Where are the toll booths? Tell us where the toll booths are. I think the answer is it depends, okay, and weber tc as a whole doesn't require anything.

Tsahi Levent-Levi:

It works peer-to-peer from my machine to yours directly. If we want to and if it can, okay. If it can't, we're going to use a turn server to relay that data. There are stand servers, there are a lot of things out there, okay, but at the end of the day, there are two things that are going to decide how these media packets are flowing. The first one is the architecture of your application and that you, that you can control because you own that, and that's a very important piece. Do you have media servers? If you do, then well, you go through media servers, and that's out of scope of WebRTC, although these media servers communicate in WebRTC protocols.

Tsahi Levent-Levi:

But, they're not defined in the standard. You're just using them because it makes sense and they're very common. Usually most sessions today, I guess, would be between a user and the media server and somewhere in between, especially in large calls or in streaming and stuff. So a lot of these sessions are going to be from a client to a server, and where the server is located is important. I want that to be as close as possible to the user. And then there are turn servers. Turn servers are there if I cannot reach the other side. The other side might be you or it might be a media server and if I can't reach it directly, I'm going to use a turn server. And the purpose of the turn server is to relay the media through the turn server to wherever it needs to go. Turn server can run media across UDP, tcp or TLS and it's kind of in WebRTC. Everything is kind of best effort. I'm going to try.

Tsahi Levent-Levi:

UDP. If it doesn't work, I'm going to try Relay. If it doesn't Relay over EDP, I'll try to Relay over TCP, and if that doesn't work, I'll try to do that over TLS. And what's going to be used? Well, I don't know. Let's try and see what happens in this specific call, in this specific scenario.

Mark Donnigan:

You mentioned media servers, so is there a trend towards media servers being used and where and why, and also explain the functionality that can be in a media server within a web rtc construct so, first of all, it's not a trend, it's a reality of life.

Tsahi Levent-Levi:

It's been from day one. And what do I mean? I see a lot of people that say that web rtc isn't good and we're going to improve and make WebRTC better. We have some streaming companies doing that. Oh, sure?

Mark Donnigan:

Yeah, you named some of them earlier when you were mentioning the platforms. Anyway, we won't rename them.

Tsahi Levent-Levi:

What they say is we make WebRTC better because WebRTC is not good. It's peer-to-peer yeah, and the answer is no, it's not. You can do whatever you want with it. If you want, you can use a media server. It's great that you're doing that, but everyone else in the industry is doing that. I cannot send packets from my machine to a million machines impossible.

Tsahi Levent-Levi:

I can't so what am I going to do? I'm going to send the packets to a server and that server is going to just go and route that packets to everyone, because that's the role of that server. Yeah, hence, in streaming, everything you're going to do is going to have media servers in there. In video conferencing for large groups, everything you're going to do is going to have media servers. If I want to record the session, most probably I want to record it and then play it back somewhere in the cloud, so I need to record it to an on server, that is, online media server or whatever, but I need a server yeah, you need a server, yeah okay.

Tsahi Levent-Levi:

Sometimes I would say well, I'm going to do it peer-to-peer, because it's only the two of us. I don't care about recording and I want to. I want to be cheap and not pay for the bandwidth over the servers.

Tsahi Levent-Levi:

Yeah, that's fine, yeah, but it's just one of the use cases where you can do this thing. It's not the major use case, it's one of them. So media servers are used everywhere. They're used to do group calls, they're used to do streaming to large amounts of people, they're used to do recording and transcriptions and a lot of other things that you need when you want to manage the actual, to host a meeting, the session or whatever you want to call it is it?

Mark Donnigan:

is it easy to build your? Own media server or is this where you get in to really wind up use?

Tsahi Levent-Levi:

a third party a lot of the courses that I have, I do do with Philip Henke. He's great. He's like the person with the most amount of bugs opened on Chrome, on WebRTC, everything, and we talked about that, about why would someone in his right mind after 2020 would go and write his own media server? And the only answer is because it's fun.

Tsahi Levent-Levi:

okay, there is no other reason to do that there are multiple alternatives today that are quite good channels, gtc, media, soup, ion. These are the main ones. Okay, they are all open source. You can download them, use them, change their code, modify them, optimize them, improve on them, whatever it is that you want, and they all have large communities around them. Each one is suitable for slightly different use cases and scenarios and they use different coding languages and whatever. Nobody cares, you can just go and use that. Yeah, so that's what I would do if I had to build something today from scratch. I wouldn't write code in C++ and start implementing WebRTC on a media server. I would go to one of the existing ones and just use that, and use the one that I'm most comfortable with and that the developers around me can use.

Mark Donnigan:

So we had a question. Come in, I'm going to ask it, so we will get to this. So let's see, vijay, if you're still on, we will get to it. But it's appropriate that we talk about scale of WebRTC, because I was smiling at your comment about commercial companies. Webrtc is bad, but we've made it better. Yeah, and then the next comment is it doesn't scale, but we scale. No one else can scale.

Tsahi Levent-Levi:

Yes.

Mark Donnigan:

So why don't you talk to us again, explain where the WebRTC doesn't scale comment comes from, and then what the solution is?

Tsahi Levent-Levi:

So I'll start with the solution.

Mark Donnigan:

Okay.

Tsahi Levent-Levi:

Have you ever used Google Meet Mm-hmm, does it scale?

Mark Donnigan:

I mean I use it for one-to-one or six people or something, so I don't know.

Tsahi Levent-Levi:

It scales. It scales well, it doesn't. It's like it scales, like zoom doesn't really matter yeah, they have millions of users across the globe and nobody people are complaining, but they're complaining like everything else.

Mark Donnigan:

They complain about yeah, sure, sure, sure. They complain about zoom. They complain about Zoom, they complain about Teams. You know, we complain about Zoom.

Tsahi Levent-Levi:

Exactly, google Meet, at the end of the day, is pure WebRTC. Webrtc scales. Now you started by explaining at the beginning that, look, webrtc is a protocol. Yeah, do whatever you want with it. If you want it to scale, make it scale. If you don't want this to scale, don't invest the time in making it. That's it, okay. Scale. If you don't want this to scale, don't invest the time in making it. That's it, yeah.

Tsahi Levent-Levi:

Okay, where does that comment come from? You go into a room. There's a product manager, three marketing guys and the 10 sales people and they won't give me something that I can tell the customers. So we'll buy from us. What do we tell you? Okay. And then you get these stupid comments and things that companies say because they need to say something, it doesn't scale. We help you, okay. And then you get these stupid comments and things that companies say because they need to say something. It doesn't scale. We help you scale. You know weber tc was never tested. If you want to test it, come, use test rtc. I can say that, but that would be stupid. Okay, because weber tc was tested. But if you're building your own application, you probably have to test your application, your application yeah, you can use to test your application.

Mark Donnigan:

Your application, yeah.

Tsahi Levent-Levi:

You can use someone else and you can do it manually. It's up to you to decide how, but we're the only ones doing testing. Yes, sure, yeah, okay. So this is where it comes from. I don't think they don't do it on purpose. No, it's like just how companies think and work, where what they want to do is FUD. If you're uncertain, you end out and that's fine.

Mark Donnigan:

So explain again how does someone go from…. So let me say this I'll ask a question.

Tsahi Levent-Levi:

It's my turn. Okay, does HLS scale?

Mark Donnigan:

Well, yes, because….

Tsahi Levent-Levi:

Oh well, yes because oh, it's dependent upon the network and it's dependent upon the network servers and you need to buy transcoders and build the application properly.

Tsahi Levent-Levi:

You need to put cdns in place and then it's case yeah on its own is written on a piece of paper somewhere in ITF documentation or whatever. It doesn't scale. You can copy that HTML document and put it somewhere else, but someone needs to build the application that would scale. Someone decided what type of servers to use, how to use transcoding, how big to make the chunks. Someone did all that. Now, webrtc isn't any different. Webrtc scales no, it doesn't.

Tsahi Levent-Levi:

But if I want to build an application with WebRTC, I need to build an application that would scale for my audience. Not more than that. Not less than that. If I'm going to have a target audience of 100 people in a single session, I'm going to run different types of servers and if what I need to do, I need to scale to millions in real time, it's going to have a very different impact on how I'm going to architect my solution. And what the media servers are going to look like on the network Okay. And what the media servers are going to look like on the network Okay.

Tsahi Levent-Levi:

I remember the first time I talked to a company that wanted to build a kind of a social networks, less messaging solution. What they said we want to start by thinking about 100 million users and more. When you look at that kind of scale. You cannot use the same techniques that you use when you want to build signaling solutions for a dating application for the country of Israel that have 10 million people and luckily 100,000 are going to use that solution. It's a totally different problem with a totally different architectural solution, and I can say that because in TestRTC, for example, each year we need to change the way we architect the solution. Why? Because we need to increase our scale considerably versus the previous year, and that means taking the next step or the next leap in how you need to do things and work to reach that kind of scale. So does WebRTC scale? Yes, it does, but you need to put an effort into that.

Mark Donnigan:

Yeah, where is the cost in scaling? Because you also hear this comment. I've I've certainly heard it is it? Hey, webrtc is great. You know, for people, um, who maybe have a decent understanding, they say they, they agree with you. Webrtc is great. Hey, it's perfect. You have 50 participants. Maybe you want a hundred on a on a session, maybe you want a couple hundred, no problem. But you want like a thousand, you want 10,000, you want 50,000. It's too expensive, it is expensive so. I've heard that, so where is that cost?

Tsahi Levent-Levi:

So I think you can put it into three buckets. Okay. The first one is it is too expensive because we're still early in the game of WebRTC. Okay, costs will go down because today you pay for the investment of me doing that implementation that was never there in order to optimize it to run at such scales, and over time we're going to reduce that. Price points Like, 20 years ago, using H.264 was ridiculously expensive. Today it's cheap as hell compared to that. So give it time five, five, ten more years and the technology would be commonplace everywhere in terms of how you architecture. Now all of the best practices are going to be in place.

Tsahi Levent-Levi:

The second thing I think is remember when we talked about real time. Yes, so when I go from 10 seconds to five, there is a a cost associated with that. Right Now, if I'm going to spend the energy from going to five close to zero, the price goes up because I need to invest more energy into the tools and the solutions that I'm using to get to that level. Okay, to that some second friction that I'm looking for. If you look today at video conferencing, it's really easy to do because you've got already 30 years or so of experience in doing that. It's still hard, but it's really easy compared to what Okay, compared to cloud gaming that also use WebRTC.

Mark Donnigan:

Now why.

Tsahi Levent-Levi:

Because in video conferencing I can use 200 milliseconds and that's fine. I talk, you'll barge into what I'm saying and we'll manage. Okay, 200 milliseconds latency, it's fine. 300 also okay, 400, we will still be okay. Now, if I'm on a game cloud game and that's 100 milliseconds and that's a first personperson shooter, I'm dead. So then I need not 200 milliseconds, I need to just push it down to 50. And you see the energy that people invest in building cloud gaming and putting on the tools that are not really used when you do video conferencing. Okay, used when you do video conferencing, okay. So the faster you want a solution to be, the lower the latency needs to be. The more energy you need to invest, the higher the price point is going to be, just because of the realities of life. So if you're saying, well, it's too expensive, yes, but you wanted it real time, you didn't want it. Five seconds, exactly, yeah.

Mark Donnigan:

So either you pay for it yeah, and this is an interesting observation. Um, you know, there's a very well-known industry analyst who, um, you know, I think some of the industry are tired of hearing him say this, but, um, it's true, and you know his observation is is that, as much as we all like to talk about ultra low latency and he's obviously not referencing, you know, like, like gaming really matters or could really matter, but just in terms of entertainment distribution? There know, I know I'm going to get some hate mail from from somebody you know who's building a platform for ultra low latency sports or whatever. That's great and you know, maybe we'll get there, but but, as you said, as you push those latencies down, the costs just go up so high into the consumer. You know it doesn't matter.

Tsahi Levent-Levi:

We see that needed elsewhere, though, and that would be on the production floor.

Tsahi Levent-Levi:

Yes, yes, that's where you start seeing solutions that are interesting because first of all, I want this discussion between us to be in real time and as quick as possible to actually go out there of, of course, yeah, and if we're talking about TV-level production, maybe I want to do that remotely and have the producers sit somewhere else and then using WebRTC to just go over the streams and decide which camera to take and how to merge the screens and whatever is something that you do with WebRTC, while the stream goes to other participants, to the viewers over a choice or whatever.

Tsahi Levent-Levi:

So it's not that there do with WebRTC while the stream goes to other participants to the viewers over at your house or whatever.

Mark Donnigan:

So it's not that there is no room for WebRTC but it's probably not in the sight of the viewers, but as a distribution to mass viewers there's not. Okay, well, we're coming to a close and I promise I'd answer, or read ask this question to close and I promise I'd answer or we'd ask this question. So the question was broadly speaking, where do you see the industry going in terms of simulcast adoption versus SVC?

Tsahi Levent-Levi:

You know, this goes back to a few minutes ago. Yes, I think we're going to stay in the world of simulcast for at least three or four more years. At the very least it's not going to change for at least three or four more years. At the very least it's not going to change. I think that we might see more SVC when AV1 becomes commonplace. Yes, but only if AV1 SVC will be good enough and we'll have support for hardware encoders and decoders that make sense which means it is useful for real-time solutions and not just for streaming.

Tsahi Levent-Levi:

Yeah, otherwise we'll keep on using simulcast, yeah.

Mark Donnigan:

So, working in the codec and encoding space, I can confirm what you just said about AV1. Av1 does have much more comprehensive tools, as that's why you're pointing out AV1 for SVC and I do know I can't comment just because I'm not really on top of some of the open source projects, but I do know of one commercial company that has an AV1 encoder that they've been focusing. They've been focusing quite a lot, you know, on the video conferencing use case and exploiting those SVC tools.

Tsahi Levent-Levi:

So what I said is slightly different, because what you need is for Intel, nvidia, qualcomm, arm, for these companies to have an AV1 encoder and decoder with SVC support that is suitable for video conferencing. If you have that then SVC in WebRTC will become commonplace, Otherwise it will be simulcast.

Mark Donnigan:

Here's another question that came in. So can you use WebRTC to transport audio only?

Tsahi Levent-Levi:

I can also use it to transport video only. I can also use it to transport video only. I can also use it to transport only data no audio, no video. Yeah, that's right, there's a data, yeah there's a data channel. Yeah, yeah, it's good.

Mark Donnigan:

Yeah, and so for this person that asked the question, you know, reach out. You can reach out directly to Saeed or to us and we'll put you in touch if you want to chat more about that. Well, this was a wonderful conversation. Thank you again for sharing this amazing body of knowledge that you've accumulated. So, thank you to the listeners and certainly, whether you're watching live or on the replay, as they say, feel free to reach out to us or reach out directly to Saeed. He's an amazing resource and make sure you sign up for his newsletter, by the way. All right, well, thank you again and thank you to all the listeners, and until next time, keep encoding and streaming video using WebRTC.

Tsahi Levent-Levi:

Thanks for having me, Mark. This episode of Voices of Video is brought to you by NetInt Technologies. If you are looking for cutting-edge video encoding solutions, check out NetInt's products at netintcom.

People on this episode