
Voices of Video
Explore the inner workings of video technology with Voices of Video: Inside the Tech. This podcast gathers industry experts and innovators to examine every facet of video technology, from decoding and encoding processes to the latest advancements in hardware versus software processing and codecs. Alongside these technical insights, we dive into practical techniques, emerging trends, and industry-shaping facts that define the future of video.
Ideal for engineers, developers, and tech enthusiasts, each episode offers hands-on advice and the in-depth knowledge you need to excel in today’s fast-evolving video landscape. Join us to master the tools, technologies, and trends driving the future of digital video.
Voices of Video
What Does it Take for Video to Reach Your Screen?
Get ready to dive into the fascinating world of video streaming with our guest Alex Zambelli, Sr. Platform Manager at Dolby Laboratories, ex Technical Product Manager at Warner Bros Discovery. In this engaging episode, we unravel the complexities of video distribution at scale, highlighting how codecs underpin modern streaming experiences. Alex draws from his extensive background at Microsoft, where he worked on groundbreaking technologies that are now household names, to offer listeners unparalleled insights into the industry.
We explore the critical differences between live streaming and on-demand content, discussing the unique challenges and pressures that come with real-time broadcasting. Alex recounts his experiences from monumental live events such as the Olympics and NFL broadcasts, offering valuable lessons learned from the frontlines. He also shares the intricacies of integrating digital rights management (DRM) systems, emphasizing the balance between ensuring content security and maintaining a seamless user experience.
As technology continues to evolve, so too do the strategies around codec adoption - especially with new formats like AV1 on the horizon. Alex discusses the challenges and considerations for streaming providers in adopting new codecs based on audience demands and device capabilities. This episode promises to enrich your understanding of the streaming landscape, sparking discussions about the future of content delivery.
Join us for this deep dive where innovation meets practical experience, and don’t forget to subscribe, share, and leave a review!
Stay tuned for more in-depth insights on video technology, trends, and practical applications. Subscribe to Voices of Video: Inside the Tech for exclusive, hands-on knowledge from the experts. For more resources, visit Voices of Video.
Voices of Video. Voices of Video. Voices of Video. Voices of Video.
Speaker 2:Welcome to Voices of Video. I'm Jan Ozer. This is where we explore critical streaming-related topics with experts in the field. If you're watching and have questions, please post them as a comment, on whichever platform you're watching, and we'll answer live if time permits. Today's episode is all about distribution at scale, and Alex Zambelli, who's Technical Product Manager for Video Platforms at Warner Brothers Discovery, is our guest.
Speaker 2:I've known Alex at least 15 years, going back to his history with Microsoft, and we'll start there, where he was a Codic evangelist and a producer of events like Olympics and NFL football, and we'll hear about some of the experiences there and then we'll walk through the various points in his career where he got to Warner Brothers.
Speaker 2:There's a lot of stops that are worth chatting about, and then you know I'm known, I think, as a codec theorist, right, you know I do a lot of testing and I render conclusions, and that's useful in a lot of ways, or at least I hope it is but it's not real world and Alex just has a ton of real or at least I hope it is, but it's not real world and Alex just has a ton of real world experience that he's going to share with us today.
Speaker 2:Things as high level is where the industry needs to go to to make it simpler for publishers like Warner Brothers to focus on content as opposed to compatibility as opposed to, you know, compatibility and issues as deep diving. As you know. What's his percentage of BBR, you know. Is it 200% constrained BBR, 300% constrained BBR? And, particular to what I'm interested in, when does a company like Warner Brothers look at adopting a new codec, and I think Alex is going to talk about the decision that they're in the process of making, which is, you know, whether to integrate AV1. So Alex just has a ton of real world experience in live event production at huge scales, as well as, you know, premium content encoding and delivery, you know, with some of the biggest names in the industry. So I'm way excited to have Alex joining us today. Alex, thanks for being here.
Speaker 1:Jan, thanks so much for having me Real pleasure and I'm looking forward to the next hour talking to you.
Speaker 2:Yeah, we don't get a chance to do this that often. Let's dive in. I'm not intimately familiar with your CV. Did you start in streaming at Microsoft, or was there a stop before that?
Speaker 1:I did start my career at Microsoft, so that was my very first job out of college actually. So this was back in 2002. And I started out as a software tester. So I started as a software test engineer in Windows Media Player and I worked on both Windows Media Player and then the codec team at Microsoft as a software tester for about five years. It was during that second phase of my software testing role there, working on the codecs, where I started working with the VC1 codec, which at the time was a new Codec for Microsoft in the sense that it was the first Codec that Microsoft had standardized. So there was a Codec called Windows Media Video 9, wb9, and Microsoft took that through SMPTE. It's basically standardized, and so that became VC1. Some folks may recall that that was basically one of the required codecs for both HDDVD and Blu-ray at the time, and so that's what put it on the map.
Speaker 1:During that time where I was testing the VC1 encoder, I started interacting a lot with Microsoft's external customers and partners. That then transitioned me into my next job in Microsoft, which was technical evangelism. I ended up doing technical evangelism for VC1 for a few years and then my scope broadened to include really all Microsoft media technologies that were at the time available and could be used for building large online streaming solutions. And so when I started at Microsoft, you know, working in digital media I mean these, you know in 2002, it was still, you know, mostly dominated by physical media. So we're still talking about CDs, dvds, you know, blu-rays.
Speaker 1:By the time, you know, I transitioned into this technical evangelism job, which is around 2007 or so, streaming was really starting to pick up steam and so from that point on, really, you know, until to this day, my career has been focused on streaming, really, cause that has become the dominant method of distribution for digital media, and so I mentioned that, starting around 2007 or so, I started doing technical evangelism for a whole bunch of different Microsoft Media Technologies. At the time, serverlight was a technology Microsoft was developing that was compared to the Flash. It was seen as a solution for building rich web pages, because everything was still primarily online through websites and browsers at the time. Mobile applications haven't even started picking up yet. Really, the primary way of delivering streaming media at the time was through the browser, and this is where Serverlight came in. It was a plugin that allowed both rich web experiences to be built but also really great premium media experiences as well, and so that included even things like digital rights management, so using PlayReady DRM to protect the content, and so on.
Speaker 2:How did that transition to actual production and your work at the Olympics and with the NFL?
Speaker 1:Yeah, so at the time, microsoft was partnering with NBC Sports on several projects. The first one that I was involved with was the 2008 Olympics in Beijing, and so NBC Sports had the broadcast rights the Olympics still does. They wanted to basically put all of the Olympics content online for essentially any NBC sports subscriber to be able to access, and that was, I think, a first where that was like, really you had to wait for it to be broadcast on either your local NBC station or one of the cable channels, and so if it wasn't broadcast in LiveLinear, you could never see it. It wasn't available, and so NBC Sports had the idea to put all of that content online. Put all of that content online the very first version of the NBC Olympics site that we built in 2008,. We're still using Windows Media for live streaming, but was starting to use ServerLite and what at the time, was actually the very first prototype implementation of adaptive streaming at Microsoft to do on demand. And then the next project we did with NBC Sports in 2009 was supporting Sunday Night Football, and for that we built a fully adaptive streaming-based website. So that was the origins of Microsoft's smooth streaming technology. So Microsoft had taken that prototype that was built during the 2008 Olympics and essentially productized that into Smooth Streaming.
Speaker 1:We had both live streams in HD, which was again breakthrough at the time, like to be able to do HD at scale. Now we take it for granted. But in 2009, that was really seen as a big deal. And then 2010, vancouver Olympics that's when really, we kind of went, you know, full on smooth streaming. Everything was basically available on demand and live and smooth streaming, and so, yeah, those are some really uh, everything was uh basically available on demand and live in-suit streaming and so uh. So, yeah, those are those are some really, uh, I would say, groundbreaking events that we did. Um, uh, we ended up being nominated for a few, uh, sports emmys, technical emmys at the time, um, I don't remember, uh, which years we won or didn't win, but uh, but yeah, like it was. I think recognized by the industry is also like pushing the MLO.
Speaker 2:I'm remembering and I could I don't want to mix you up with another technology, but I'm remembering either Monday or Sunday night football with a player that had four different views that you could kind of page through. Was that you guys?
Speaker 1:That was us. Yeah, that was us. Yep, that was Sunday night football. So, yeah, we had. Basically, you could watch multiple camera angles simultaneously, and one of the cool things about that is that we use smooth streaming to do that, where it was actually a single manifest that had all four camera angles in the same manifest, and so switching between the camera angles was completely seamless because it was similar to switching bit rates the way you do in Dash or HLS today. So it was a very cool solution that, actually, I don't think we've even rebuilt it since then. It was a feature that we developed in 2009 and then sort of lost to history.
Speaker 2:Did you actually go to the Olympics or were you working back in the plumbing in Redmond?
Speaker 1:We were on the backend side of it, so I did get a chance to go to one Olympic event at the Vancouver Olympics, since they were close to Seattle where I live. But other than that, we spent most of those projects in windowless rooms and data centers, mostly in Redmond. Sometime in Las Vegas, because we were working closely with Ice Cream Planet at the time as well, who were based out of Las Vegas. Spent a lot of time in New York as well, at 30 Rock, because NBC Sports was still at the 30 Rock location at the time. So, yeah, it was a fun time.
Speaker 2:What were the big takeaways? You know, if you met somebody on a plane and they ask gosh, I'm doing a live streaming event, that's huge. What did you learn from the Olympics? What are the high level things that you took away from that, that you've implemented throughout your career?
Speaker 1:One of the perhaps obvious I would perhaps obvious take of it takeaways was that you know live streaming is is hard in that it's it's it's not on demand, like everything you know about on-demand streaming. You kind of have to throw that out the window when you started working on live streaming because, uh, you're dealing with very different issues. You're dealing with real-time issues, uh. So even something as simple as packets getting lost on the way from your origin encoder to your distribution encoder and dealing with packet loss, and then dealing with segment loss on the publishing side and figuring out how do you handle that, and handling blackouts and ad insertions, and so everything's under a lot more pressure. Right, because you know if, if you're doing on-demand streaming, and if there's something wrong with the content, if there's something wrong with you know the origin or any part of your delivery chain. You sort of have a little bit of leeway in that. Like you know, you got, you got time to address it. Hopefully you'll address it very quickly. But if the content goes down for a few hours, it's fine. People will come back later, whereas with live you don't have that luxury. You really have to be on top of it.
Speaker 1:My memory of it is that every time we were doing these events it was all hands on deck. I mean, we had everyone from Microsoft to NBC to Akamai to Ice Cream Planet, like all the different companies that are involved in these projects. We would just have everyone on calls ready to go fix whatever needs to be fixed in real time, because that was the nature of it. So that was a big learning lesson. There was that live is not on demand. You have to really give it a lot more focus, give it a lot more attention than you would necessarily to on demand.
Speaker 2:Does live ever get easy. I mean even events like what we're doing today. It seems like there's always something that breaks, or there's always the potential for it. You never feel comfortable with it.
Speaker 1:I think that's a great way to describe it. Like it's just, you know, you're never comfortable because, yeah, something could go wrong, and then you can't just say, well, you know we'll fix it. You know, sometime in the next 24 hours you have to fix it right now, right. And so it's like, yeah, if our Zoom link went down right now, like we'd be in trouble right.
Speaker 2:No backup for that, so you jumped from the frying pan into the fire. I think your next stop was iStream Planet, where you're doing live events all the time.
Speaker 1:So tell us about that. At the very end of 2012, I left Microsoft and I joined iStream Planet. And iStream Planet for those not familiar with the company, so that was a startup out of Las Vegas started by Mio Babic, and they, you know, built a reputation for themselves, as you know, being a premium live event streaming provider, and at the time, they wanted to get into LiveLinear and they wanted to also start building their own technology. And so 2012 was when Mio started a software engineering team in Redmond, and so the next year I joined that software engineering team and what I worked on was the very first live encoder that was built in-house that I just planted. One of the ideas at the time was to build it all on commodity hardware Again, something that we now take for granted, because now we're accustomed to things running in the Cloud and so we assumed that, yeah, of course you can go spin up a live encoder in the cloud and it's running on just, you know, commodity hardware that's there.
Speaker 1:But 2012, 2013, that was not the case. Right, it was mostly hardware based encoders that you had to, you know, actually put in a data center and maintain, and so the idea that neo had was like, let's run it on commodity hardware, let's build the. Let's build a cloud-based live encoder, and so I worked in that product for about four and a half years and 2015,. If my memory serves me correctly, I think it was 2015 or 2016,. Ice Cream Planet got acquired by Turner, and Turner was part of Warner Media, and so Ice Cream Planet became a subsidiary of Warner Media, and so that was a pretty nice ending to that story as well.
Speaker 2:Real briefly if you can, I'm trying. So we had Silverlight here, then we had Flash here and somehow we ended up with both of those going away and I guess it was the whole HTML5 thing. Then we had Flash here and somehow we ended up with both of those going away and I guess it was the whole HTML5 thing. And that brought HLS, and Smooth is in there. But when did you transition from VP or VC1 to 264? And how did that work?
Speaker 1:When Serverlight launched, originally the only video code that could support it was VC1. And then I think it was a third or fourth version of Serverlight where H.264 support was added, and I think Flash added it around the same time. I think it was literally like one month after another. So the challenge with basically building any streaming solution in HTML around that time so kind of again going back to 2007, 2008, uh, uh, timeframe, um, the challenge was that HTML was just not ready. Um, there's basically no APIs in HTML that would allow you to do streaming with the level of control that that was needed. Um and so, um, there, you know, there were some workarounds. Uh, example, apple went and when they came out with HLS as their streaming protocol, they baked it into the Safari browser, and so if you use the video tag in HTML in Safari, you could basically just point it at an M3U8 playlist and it would just work. But that was, you know, exception rather than the rule. I mean, most other browser implementations, whether it was, you know, chrome or Firefox or in an Explorer at the time, did not do that, and so there was this kind of challenge of well, how do you stream? And so we're, we're basically flashing and server light, I think brought to the table at that time was an opportunity to really kind of leapfrog HTML to basically like just take advanced it even if it, you know, was a proprietary plugin but advanced technology to a point where it was usable. And so one of the innovations that several I brought was the concept to a point where it was usable. One of the innovations that Servolite brought was the concept of a media stream source, which today now exists in HTML. When you go build a solution in HTML today that's a streaming solution, you're using the media source extensions and the encrypted media extensions portions of the HTML spec. At the time that was not yet in HTML5. So what had that approach of? Well, we're not gonna bake in any particular stream protocol into the plugin. We're gonna basically open up an API that allows you to go handle your own downloading of segments and parsing of segments and then you essentially just pass those video and audio streams into a media buffer and then the plugin goes and decodes and renders that and handles the rest.
Speaker 1:And then another crucial part, I think, of what Serverlight brought to the table was DRM, because that was something that, again, html just didn't have a good solution for content protection. The reality of the industry that we work in is that if you want to provide premium content to audiences, you have to protect it. Generally, content owners studios will not let you go stream their content just in the clear, and so it was a big deal that Serverlight could both enable streaming but also enable content protection of the content. And then Flash ended up doing the same with Flash DRM, adobe DRM as well, and so around.
Speaker 1:I think it was 2012, 2011, if I remember, where both Serverlight and Flash kind of went away and were replaced by HTML, and it was because by that point, html had matured enough where that was feasible and there were still some growing pains there. I remember there was a period where it was kind of like we were neither here nor there, but I would say like 2014, 2015,. Like HTML5 had all the needed APIs to enable basic stuff like implementing Dash and HLS and streaming in the browser and protecting it with DRM. So that's where we are today. Yeah, it took a while to get there.
Speaker 2:Real quickly. What do you do at WarnerMedia? So I'm hearing were you a programmer or were you a live video producer? You started testing, which is so. What's your skill set?
Speaker 1:So I mentioned that earlier when I started my career. I started in engineering and then transitioned to technical evangelism, uh, by the time that I uh moved over to ice cream planet, so my job at that point became product management, uh, and so I've been a product manager, uh, since since then, so for the past 10 years. Uh. So after ice cream, uh, I went to hulu and I was a product manager for the video platform at Hulu for five years, and then my most recent job, so for the past two years, I've been at Warner Brothers Discovery also, product managing the video platform here as well. So what would my responsibilities are as a product manager is?
Speaker 1:I focused on the video platform itself, so I focus specifically today. I focused on mostly transcoding packaging. So for the most recent launch of Max, which is the new service that combines Discovery Plus and HBO Max they just launched last week. So I was the product manager for the VOD transcoding and packaging platform there. That involved essentially defining the requirements of whether the different codecs and formats we need to support, what the workflows should look like, how do we get content in from the media supply chain, what are all the different permutations of formats we need to produce. You know what kind of signaling needs to be in the manifest that players would be able to distinguish between HDR and SDR. So all those types of technical details like those are part of my job.
Speaker 2:Let's do a speed round of some technical encoding issues. Your answers will appear in my next book. Where are you on encoding cost versus quality? And that would translate to are you using the placebo or the very slow preset? And I don't know if you use X.264, but do you use that to get the best possible quality for bit rate, irrespective of encoding cost, or do you do something kind of in the middle? I'm sure you're not in the ultra-fast category, but real quick. Where are you in that analysis?
Speaker 1:So, yeah, we currently do use X.264 and X.265 for the VOD transcoding at Warner Bros Discovery. So we typically use either the slow or slower presets for those encoders, though one of the things we have been discussing recently is that we perhaps shouldn't necessarily use the same preset across all bear rates or even across all content, and so that's an idea that we've been exploring where you know, for you know, if you look at your typical encoding ladder, right, you got, you know, let's say, you know 1080p, or you know 2160p at the top, but you know, at the bottom of your ladder you'll have, you know, 320 by 180, you might have a 640 by 360, right? And so then the question becomes well, why use the same preset for both those resolutions? Right, because, like, why use the same preset for both those resolutions? Right, because, like you know, x264 very slow, is gonna take a lot less time on your 640 by 360 resolution than on your, you know, 1080p resolution.
Speaker 1:And so that's one of the ideas that we've been looking at is like, okay, we should probably apply different this, different presets for different resolutions, different complexities, and then not all content is necessarily the same, in the sense that it's not equally complex, right? So perhaps not everything requires the very slow preset. And then not all content is equally popular. If there's a particular piece of content that's watched by 20 million viewers versus something that's watched by 10,000 viewers, the one that's watched by 20 million probably should get the more complex preset, the slower preset, because whatever extra compute you spend on that is going to be worth it, because it will hopefully translate to some CDN savings on the other side and so yeah. So hopefully that answers your question.
Speaker 2:Question when did you? You talked about x.265, that's atbc. When did you add that and why? Or were you even there? Did warner add it before you got there?
Speaker 1:uh, yeah, so, uh, hbo max had already been using hbc uh, and so this was uh. So we, you know, obviously continued using it for Max as well. On the Discovery Plus side, we had been using HEVC for some 4K content, but there wasn't a lot of it and so it was really mostly all H.264 on the Discovery Plus side. But with Max we are using, obviously, h.264 still and we are using HEVC as well for both SDR and HDR content, and so right now, for example, if you go play something on Macs, on most devices it's actually going to play back in HEVC. So, even if it's SDR, it will be 10-bit HEVC and then, obviously, if it's HDR, it will-bit HEVC and then, obviously, if it's HDR, it will definitely be HEVC.
Speaker 2:How many encoding ladders do you have for a typical piece of content?
Speaker 1:So the way we define, when you say how many encoding ladders, you mean sort of like different variations of encoding ladders or do you mean like steps within the ladder?
Speaker 2:Different variations of encoding ladders.
Speaker 1:Literally looking at the spreadsheet right now, and I think it's about six or eight different variations right now, and so what we've tried to do is build an encoding ladder where, depending on the source resolution, we don't have to necessarily have different permutations of the ladders and so we have sort have a UHD ladder where, depending on what the source resolution is, that determines where you stop in that ladder, but doesn't change the ladder necessarily itself. Where the permutations come in is things like frame rates. So if the source is 25p or 30p or 24p, that's going to go and use a different ladder than if the source is 50p or 60p, because that is one of the things we've done for Macs. That wasn't supported before, for example, is high frame rates. So previously everything was capped at 30 FPS, and most of that was due to the fact that there wasn't really a lot of source. Content on HBO Max was capped at 30 FPS, and most of that was due to the fact that there wasn't really a lot of source content on HBO Max, for example, that required more than 30 FPS.
Speaker 1:But now that the content libraries of Discovery Plus and HBO Max are combined, there's a lot more reality TV on the Discovery Plus side. A lot of that is shot at 50 FPS if it's abroad, or 60 FPS if it's US, and so we wanted to preserve that temporal resolution as much as possible, and so we've started to support high frame rates as well, and so we have different encoding ladders for different frame rates. And then, of course, there's different encoding ladders for SDR versus HDR, and even within course, there's different encoding ladders for SDR versus HDR, and even within HDR we have different encoding ladders for HDR10 versus Dolby Vision 5, for example.
Speaker 2:What about for different devices? So if I'm watching on my smart TV and then I transition to my smartphone, am I seeing the same ladder, or do you have different ladders for different devices?
Speaker 1:At this moment they're the same ladders for all the devices. We might deliver different subsets of the ladder for certain devices, but that's typically capping on the high end of the ladder. So if, for example, some device cannot handle 60 FPS, or if it cannot handle resolutions above 1080p, for example, then we might intentionally cap the manifest itself that we're delivering to that device. But in terms of different bear rates and different encodings, we're not differentiating it yet between different devices. So I'll give you my personal take on that question, which is that in most cases it's not really necessary, in my opinion, to have different encoding ladders for different devices, because your 1080p should look great, no matter whether you're watching it on a iPhone or Apple TV, and so having two different 1080p encodes doesn't necessarily make sense.
Speaker 1:I've definitely heard people say well, perhaps on the lower end of the bear rate ladder, where you have your lower bear rates, lower lower resolutions, that's where you need to have differentiation. But again, in my opinion, there's no harm in delivering uh, you know, 100, 200 kilowatt per second, uh bear rates in a manifest to a uh smart tv, because most likely it's never going to play it, um, and so you know, you can put in the manifest. You can deliver it to the, to the TV or to you know streaming. Stick in, you know, in, in vast majority of cases it's never even going to touch that bit rate, it's just going to skip right over it, go straight for the you know HD and the UHD. And the only times you might ever see that low bit rate is, you know, if something catastrophic happens to your, happens to your network and the player struggles so badly and needs to drop down to that level.
Speaker 2:What's your VBR maximum rate on a percentage basis? When we started out, it was CBR, so your max was 100% of your target. Where are you now with your VBR for your premium content?
Speaker 1:VBR for your premium content. So we've taken an approach with X264 and X265 of relying primarily on the CRF rate control, but it's a CRF rate control that uses a bitrate and buffer cap. So when you are writing your command line in FFmpeg, right, you can set the crf target, but you can also specify a vpb buffer size and a vpb max rate. And so we are doing that, and the reason behind that is we want to make sure that we're controlling essentially the codec level at each resolution each period and that we're controlling essentially the codec level at each resolution each period and that we're sort of keeping the peaks also constrained that way.
Speaker 1:And so I can give you an example where you know if it's something like, let's say, you know HEVC and it's you know 1080p, you know you might want to stay. You know at codec level four rather than codec level four one, because you know four one might, or that one actually like maybe it's not as big of a deal, but, for example, like what, if you're choosing between level five and level five one, right, there's certain devices that might not support five one, for example. And so in order to stay under codec level five for HEVC, you have to maintain you have to stay under certain buffer sites, right, and so that's how. That's what ends up driving a lot the actual caps that we set.
Speaker 2:Circling back. I mean CRF gives you a measure per title encoding as well, so is that intentional?
Speaker 1:Yeah, that's part of it. Yeah, is that with CRF right? Really, when you specify your VPB max rate, you're just specifying your highest average bit rate really for the video. And so as long as you're comfortable with that max rate, then you know you can also count on CRF probably bringing your average bitrate below that max rate most of the time. And so if we said, for example, 10,000 kilobits per second as the max rate, most of the time the CRF target is really going to bring in that average bear rate much lower, around five or six megabits. And so that is a way of kind of getting per title encoding in a way and achieving CDN savings without sacrificing quality right, because depending on the complexity of your content, it's either going to be way below your max rate or it's going to hit against the max rate. And then at least you're sort of putting out you know you're capping the highest possible bear rate that you'll have for that video.
Speaker 2:That's a pretty creative way to do it. What's the impact of DRM on encoding ladder, if anything? So I know there's a difference between hardware and software DRM and there are some limitations on content you can distribute with software-based DRM. So can you encapsulate? We're a bit running short of time, but can you encapsulate that in, like you know, a minute or two.
Speaker 1:The way most of the content licensing agreements are structured, you know, typically under the content security chapter there's requirements around what kind of essentially security levels are required to play back certain resolutions and then often what kind of output protection is required. And so typically what you'll see is that something like WideLine L1, which is a hardware-based security level of Widevine or hardware-based protection, and then on the PlayReady side something like SL3000, which is also the hardware-based implementation of PlayReady. Those will be required for 1080p and above, for example. So a lot of the content licensing agreements will say unless you have hardware-backed DRM on the playback client, you cannot play anything from 1080p and above. Then they'll have similar requirements around each level. They'll group your resolutions typically in SD, hd, full HD, uhd, and each one of those will have different DRM requirements in terms of security levels. Also requirements around HDCP, whether that needs to be enforced or not, whether it's HDCP1, hdcp2. And so what that essentially means in practice then is that when you're doing your ABR ladder you have to define those security groups based on resolution and you have to assign different content keys to those groups, and so your video streams up to, let's say, 720p might get encoded with one encryption key, and then you know, between 720p might get encoded with one encryption key, and then between 720p and 1080p gets a different encryption key, and then everything above 1080p gets another encryption key and audio gets a different encryption key. And so by doing that we essentially accomplish that at playback time, when the licenses are being requested by the players for each of those bear rates because they're using different keys, you can now associate different playback policies with each key, and so you can say well, this SD content key, for example, has a policy that doesn't require HDCP to be enforced and doesn't require hardware level of protection, whereas the HD group or the UHD group might require those. And so that's really something that we do today in response to the way the content licensing agreements are structured, and so in the future that might change.
Speaker 1:My impression is that we're actually moving in a in a uh direction of more DRM rather than less, less DRM, uh. So like, even as recently as you know, three, four years ago, like some some uh studios, some content owners were still allowing certain resolutions to be delivered in the clear uh, like SD, for example, um, and a lot of that's kind of going away, where now essentially it's like look, if you're going to do DRM, you might as well do DRM across the board, because it actually kind of makes it less complicated that way. And so and one of the things I've also noticed is that, like when it comes to HDR for example, it's the strictest requirements you know for all of HDR, and so, even with HDR, you have an encoding ladder that ranges from UHD all the way down to 360p or something, and the requirements and the agreements are well, you must use hardware-based DRM and you must use HDCP 2.3 for the whole HDR ladder, and so it seems that that's the trend of the industry, is that we're actually moving just towards using DRM for everything.
Speaker 2:What's the difference between hardware and software Hardware? Is that a browser versus mobile device thing? Or where is software DRM and where is hardware?
Speaker 1:So the difference is in the implementation of the DRM client itself.
Speaker 1:And so if you basically want to get the highest security certificate from either Google or Microsoft for their DRM systems, you essentially have to bake in their DRM client into the secure video path of the system.
Speaker 1:So that means a type coupling with the hardware decoder as well, so that essentially, when you send a video stream to the decoder, once it goes past the decoder, there's no getting those bits back. So essentially, once you send it to the decoder, there's no getting those bits back right. So essentially, once you send it to the decoder, at that point it's secure decoding and secure decryption. Well, first, I guess, secure decryption, then secure decoding, and then it goes straight to the renderer, right, and so there's no API call that you can make as an application that says, now that you've decrypted and decoded these bits, like I, you know, hand them back to me, and so that's typically called a secure video path or secure media path. And so that's what you get with a hardware-based DRM. Software-based DRM, you know, does either some or all of those aspects of decoding and decryption in software, those aspects of decoding and decryption in software, and therefore there's a risk that somebody could essentially hack that path at some point and get those decoded bits back and be able to steal the content.
Speaker 2:So if I'm watching 265 on a browser without hardware support, I'm likely to be limited in the resolution I can view if it's premium content because the publisher says I don't want anything larger than 360p, going to software.
Speaker 1:Exactly yeah, and today, for example, if you're using Chrome, for example. So Widevine DRM is available in Chrome, but only L3, which is the software-based implementation of Widevine. Drm is available in Chrome, but only L3, which is the software-based implementation of Widevine. And so oftentimes if you're using Chrome, you actually get worse video quality with some of the premium streaming services than if you're using Edge or Safari, for example. Or Safari, for example, because both Safari on Mac and Edge on Windows do support hardware DRM, because they are just more tightly integrated with the operating system and so they're able to essentially achieve that secure video path between the browser and the operating system and the output.
Speaker 1:So let's jump to the packaging, because you're in the HLS, dash or CMAP camp these days jump to the packaging, because you, uh, you and the hls, dash or cmf camp these days, both uh, so at uh, both uh wonder birds discovery and then my previous job at hulu uh, we, we've been using both hls and dash and, interestingly enough, actually like even distributing in it. Uh, just like, the split between those two is almost identical, so, so we use HLS for Apple devices and we use Dash for streaming to all other devices. What's common to them is the CMF format, and so one of the things that I kind of get a little bit annoyed about in our industry is when people refer to CMF as a streaming protocol, and I always kind of feel like I need to correct them and say no, no, it's not a streaming protocol. Because, you know, cmf is really two things. Right, like CMF is, on one hand, a standardized version of what we, you know, frequently call fragmented MP4, right, the ISO-based media file formats, and what the CMF spec did is basically just define look, if you're going to use FMP4 and HLS and Dash, here's, you know, the boxes you need to have and here's how common encryption gets applied to that and so on, and so it's really just a more kind of buttoned down version of. You know what we have always called FMP4. And so in many cases, like you know, if you have been packaging, you know either Dash or HLS. You know in FMP4 media segments you're most likely already CMAP compliant, you're already using CMAP.
Speaker 1:But the other thing that CMAP is right, like the CMAP spec, also defines a hypothetically logical media presentation model and so it essentially describes what really kind of, when you read through the lines, will sound a lot like HLS or Dash without HLS or Dash. It's really defining kind of here's the relationship between tracks and segments and fragments and chunks and here's how you sort of address like all those different levels of the media presentation. And so you can then think of HLS and DASH really being kind of the physical manifestations of that hypothetical presentation model. And there's a really great spec that CTA authored, so I think it's CTA, I think 5005. That is the HLS-DASH interoperability spec and it's heavily based on CMAP and using kind of CMAP as the really the unifying model and then really describing how both HLS and DASH plug into CMAP and how they really kind of describe the. You can describe the same concepts in both and so it's almost like HLS and DASH are just programming languages that are describing the same. You know sort of pseudocode.
Speaker 2:I want to come back to some other topics, but one of the topics important to you is is the CTA part of the organization that's going to make it simpler for publishers to publish content and just focus on the content development and not the compatibility, Because it seems like that's a pretty compelling issue for you.
Speaker 1:I hope that CTA will make some efforts in that space. I think a lot of what they've been doing is trying to improve the interoperability in the streaming industry, and so I think it does feel like CTA Wave is the right arena for that. One of the issues that I think today makes deploying streaming solutions really complex and challenging is that we have a lot of different application development platforms. Just before this call, I kind of went and counted the number of app platforms that we have at WBD that we just developed for Macs, and it's basically about you know a dozen or 16 different application development platforms. Now there's overlap between some of them. So you know Android TV and Fire TV are kind of you know more or less the same thing with, you know slight differences. But at the end of the day, you know you're looking at probably like at the very least, at the very least, half a dozen different app development platforms and then, worst case scenario, you're looking upwards of 20 or so app development platforms, especially once you start considering set-top boxes made in Europe or Asia that might be like HBBTV compatible and so on, and so that's a lot of complexity because the same app needs to be built over and over and over again, right, in different program languages, using different platform APIs, and I think as an industry we're kind of unique in that sense. I'm not actually aware of any industry other than streaming that needs to develop that many applications for the same thing. You know if you're working in any other, I think, industry, you know if you're working in FinTech or you know, or you know anything else, right, you typically have to develop three applications a web app, ios app and Android app and you're done right. And so it's kind of crazy that you know, in our industry, industry we have to go build, you know, over a dozen different uh applications.
Speaker 1:But the practical challenges that then brings, when it comes to uh things like encoding and packaging and so on, is that, uh, it's, it's hard to know what the devices support. Um, because there is no spec, there is no standard that essentially allows that specifies APIs, for example, that every different device platform could call and expect standardized answers, right? So when we talk about media capabilities of a device, like, what are we talking? We're talking about we need to know what decoders are supported right For video, for audio, but also for images, for text, right Timed text. We need to know what different segment formats are supported.
Speaker 1:You know, is it CMAP, is it TS? Like, what brand of CMAP? Right, cmap has this nice concept of brands but nobody's really using it, right, like you need to. Like you know, for that concept to be useful, you need to be able to query a device and say well, what CMAP brands do you support? Manifest formats? Right, there's different versions of HLS. There's different profiles of Dash. There's different DRM systems, right, and so all these are all things that we kind of need to know if we want to place something back on a device and play it well.
Speaker 2:So how do we standardize the playback side?
Speaker 1:Probably one of the key steps I think we need to take is, I think we need to standardize device media capabilities detection APIs, and there has been some efforts in W3C of defining those types of APIs, in HTML, for example, but again, not every platform uses HTML.
Speaker 1:When it comes to Roku, when it comes to Media Foundation and other different media app development platforms, we need essentially the same API really to be present on every platform. And then, once we have APIs standardized in the way they detect media support, we need to also have a standardized method of signaling those capabilities to the servers, because if you want, for example, target specific devices based on their capabilities, the next question becomes well, how do you express that? How do you signal that to the backend? How do you take action on that? How do you do things like manifest filtering based on that? So there's a I think there's a lot of space there for standardization, there's a lot of room for standardization, and so, yeah, I'm hoping that you know CTA way, where one of the other industry organizations will take some steps in that direction.
Speaker 2:Final topic is going to be AV1, or new chaotic adoption. So you know when you you're in charge of choosing which technologies you're going to support. When does a technology like AV1 come on your radar screen from us? I mean, you've heard of it since it was announced, obviously, but when does it come on your radar screen in terms of actually supporting it in a Warner Brothers product?
Speaker 1:The first thing I typically will look at is device adoption, because that's really, I think, the most crucial requirement is that there has to be enough devices out there that we can actually deliver media to with a new codec. That makes it worthwhile. Because there's going to be costs involved in deploying a new codec right. First cost comes from just R&D associated with investigating a new codec, testing it, measuring quality, then optimizing your encoding settings and so on, right and so that's both time and then also either manual or automation effort that needs to be done right To be able to just understand what is this codec, is it good? Do I want to use it? Right. And then, if you suddenly decide you want to deploy that codec, there's going to be compute costs associated with that. There's going to be storage costs associated with that. Then in some cases, there might be licensing costs as well. If you're using a proprietary encoder, maybe you're paying them. Or if you're using an open source encoder, well, you still might owe some royalties on just usage, and you're pretty familiar with that. I read one of your recent blog posts and so I know that you've spent a lot of time looking at royalties and different business models that different codecs now have. So in order to justify those costs, in order to make those costs actually worthwhile, there needs to be enough devices out there that can be reached by that new codec.
Speaker 1:And so the first really question is what percentage of devices that are, you know, active devices on a service, are capable of using that codec. And you know, interestingly, like this kind of goes back to that previous question that you asked, which is, you know about device capabilities and you know how do we basically improve those things. So, without good, healthy data coming back from players, coming back from these, you know apps that tell us what's supported on the platforms, it's hard to plan. You know what your next critic is that you want to deploy, like right now.
Speaker 1:For example, if I wanted to estimate the number of AV1 decoders out there, my best resource would be to go study all the different hardware specs of all the different devices out there and figure out you know which ones support AV1, for example, or VVC, or you know LCVC, and then try to kind of extrapolate from that data. Okay, what does that mean? You know how do we project that onto our particular active device base. And so, yeah, it's not straightforward today, but I'm hoping that if we can improve the device capabilities detection and reporting then we can also get to a point where we can just run a simple query and say, okay, tell me what percentage of devices that the services seen in the last week supports AV1 decoding, and specifically maybe AV1 decoding with DRM support or AV1 decoding of HDR, right, and so it's like there's even nuances within just you know, beyond just which codec is supported.
Speaker 2:What kind of pressure do you get if any from you know, your bosses or your co-workers about new codecs? Because you know we love to talk about them. We read about them all the time. But are people pounding on you and saying you know where's AB1 support, where's VBC, when's VBC, or do they not care? Is that not part of what they're thinking about?
Speaker 1:I would say there's not a lot of pressure from leadership to support specific codecs. I think they're more interested in, probably, cost savings and looking at things like how do we lower CDN costs. But one of the things that I usually always explain to them is that it's not a perfect one-to-one relationship between deploying a new codec and cdn cost savings, for example. Um, like, even if you save, for example, 20 on your encoding bitrate, for example with a new codec, that doesn't necessarily translate into 20 of cdn cost savings.
Speaker 1:Um, because in some cases, you know, like, if somebody's on a three megabit connection speed, for example, right, somebody's on 4G, and the most they can get is, like you know, three megabits per second, you being able to lower your bitrate from 10 to six megabits per second is not really going to impact it, right, they're still going to be pulling the same amount of data, and so that's why it's not a clear one-to-one mapping. But, yeah, I would say, like most of the demand for new codecs comes from that aspect, right from that direction, rather than somebody saying, well, we have to support VVC because it's the latest, greatest thing out there, right, like, generally, like that's not the case. If anything, I'm usually the one that's, you know, pushing kind of for that and saying like, well, you know, we really should be moving on from HD64 and moving on to the next generation of codecs, because at some point, right like you, just do have to leave old codecs behind and slowly deprecate them as you move on to the new technology.
Speaker 2:Do you have, I mean, do you have, a sophisticated financial analysis for doing this, or do you, um, or you know, do you do the numbers on an envelope, kind of thing?
Speaker 1:it's. It's more an envelope kind of thing right now. Uh, it it is. Uh, yeah, it would be something that would be based on, you know again, like number of uh you know devices supported and then comparing that to kind of average bear rate savings and comparing that to compute costs and, and you know, potentially licensing costs associated with it. So, yeah, it is. It is a, yeah, sort of a back of a paper napkin kind of calculation at this point, but I think the I think the factors are well known. It's really coming up with the data that you know feeds into those different variables.
Speaker 2:A couple of questions. What about LCEBC? Are you doing enough live? Or is that even a live versus VOD kind of decision?
Speaker 1:With LCEBC, I don't think it's even a live versus VOD decision, I think with LCEBC, I think what's interesting with that codec right is that it's an enhancement codec. It's a codec that really piggybacks on top of other codecs and provides, you know, better resolution, better dynamic range, for example, at bit rates that would typically be associated with lower resolutions. You know more narrow dynamic ranges, and so you know, the way LCVC works is that there's a pre-processor part of it that essentially extrapolates the detail that is then lost when the video is scaled down. So you can start with a 1080p video, scale it down to, let's say, 540p, encode this 540p and then, with the LCVC decoder on the other end, it can now take some of that sideband data and attempt to reconstruct the full fidelity of the 1080p source signal, and so that concept works the same, whether that the baseline codec that you're using is H.264 or 265 or VVC or AV1.
Speaker 1:And so I think that's what's interesting about that codec is that it can always let you be a step ahead of whatever the latest generation of codecs is providing. And then the other nice thing about it is that there's a backwards compatibility option there, because if a decoder doesn't recognize that sideband data. That is specific to LCBC decoding. It'll just decode your base signal, which might be half resolution or quarter resolution, and so I think in ABR I think it can be very applicable in ABR because typically you have a lot of different resolutions in your ladder right. So it's like if you could potentially deliver that you know 360p resolution in your ladder, at 720p, for example, to a LCVC decoder, then you know why not.
Speaker 2:We've got a technical question here. Are you able to deliver one CMAP package using one DRM, or do you have to have different packages for Apple and the rest of the delivery platforms?
Speaker 1:Yeah, that's a great question.
Speaker 1:So right now what we do is we encrypt every CMAP segment twice, once with CBCS encryption mode and the other one with CTR CNC encryption mode, and so the CBCS encrypted segments, those are the ones that we deliver to the HLS, to Fairplay devices, and then at the moment the CTR segments are the ones that we then package with Dash and, with you know, are used with both PlayReady and Widevine.
Speaker 1:That said, both Widevine and PlayReady have introduced support for CBCS a while ago it's actually, I think, been probably over five years at this point and so theoretically we could deliver those CBCS encrypted segments to all three DRM systems and it would work. The challenge at the moment is that not all devices that are Widevine or PlayReady clients have been updated to the latest version of PlayReady or Widevine, because in a lot of cases there are hardware implementations and so without basically firmer updates from the device manufacturer, they're never going to be up to date with the latest DRM client, and so we're kind of waiting to see when those last CTR only Widevine and PlayReady clients are going to kind of be deprecated, right Slowly, kind of move out of the life cycle Once you know the vast majority of the PlayReady and Widevine clients out there are CPCS compatible, then that opens up the path to using CPCS in segments everywhere.
Speaker 2:Final question AV1 this year or not? What do you think?
Speaker 1:I think probably not this year. I would say I mean, I think you know we might do some experimentation. I think you know some just some research into like encoder quality and you know optimization this year with AV1. But I wouldn't expect, you know, deployment of AV1 this year, not because of lack of support, because I think the support is really starting to be there in significant numbers. So I know that I think it's the latest. Either Samsung or LG TVs, for example, now include 81 decoders as well, and so that's always.
Speaker 1:I think often people will look at mobile as kind of being the indicator of you know codec adoption. Right, and especially Apple people will be like, okay, well, you know, if Apple adopted in iOS, then you know clearly it's here. But when it comes to you know, like premium streaming services, so when it comes to you know whether it's Max or Hulu or Amazon Prime or Netflix, most of that content is watched in living rooms and so really like, the devices to watch are smart TVs and connected streaming sticks. So once those devices have support for a particular codec, then in my opinion that's really kind of the big indicator that yeah, it might be ready.
Speaker 2:We're running over, but this is a question I need the answer on. But what's the HDR picture for AV1 and how clear does that have to be? Because it seems like there's a bunch of TV sets out there that we know play Dolby Vision and HDR10, or 10 plus with HDDC, dolby Vision and HDR10 or 10 plus with HDDC? Do we have the same certainty that an AV1?
Speaker 1:compatible TV set will play AV1 and HDR. I don't think that certainty is there yet and I do need to do some more research into that particular topic because I've been curious about the same thing. So I think some standardization efforts have been made. I can't remember off top of my head if it's CTA or some other.
Speaker 2:The ATR 10 plus is now a standard for AV1. I just don't know if TVs out there will automatically support it and then automatically doesn't work for you. You've got to make sure you've got a test, yeah.
Speaker 1:And then you know, with Dolby Vision, right, it's sort of like, well, until Dolby says so, then it's not a standard. And so, yeah, I mean I think that's an excellent question Is that there's nothing from a technical perspective that should be stopping somebody from using AB1 or UBC or any other new codec with HDR, Because there's nothing specific to the codec that HDR needs, because there's nothing specific to the codec that HDR needs, and so it's really just a matter of standardization, a matter of companies implementing that standard. So, yeah, I'm with you on this one in that it is sort of like one of those where, yeah, it should work, but until it's been tested and it's been tested on many different devices it's not a real thing, right?
Speaker 2:Listen, we are way out of time, Alex. I don't think we've ever done this for an hour, but it's great. I really appreciate you spending time with us being so open and honest about how you're producing your video, because I think that helps everybody. And thanks, this has been great.
Speaker 1:Absolutely. Thank you so much for having me and, yeah, this has been really great. Like, I feel like we could probably keep talking for another hour or two and I think we'd have still plenty of topics to discuss. I was I was taking some notes while we were doing this and I think I have notes for another hour at least.
Speaker 2:OK, we'll talk to you about that and I'll see you at IBC. You're going to go to that show at IBC You're going to go to that show. Yeah, I think I'll be at IBC, so I'm most likely to see you there.
Speaker 1:Cool, take care, alex. Thanks a lot. All right, thanks so much. This episode of Voices of Video is brought to you by NetInt Technologies. If you are looking for cutting edge video encoding solutions, check out NetInt's products at netintcom.