The Future of Video Encoding with ASIC Technology Artwork

Voices of Video

Explore the inner workings of video technology with Voices of Video: Inside the Tech. This podcast gathers industry experts and innovators to examine every facet of video technology, from decoding and encoding processes to the latest advancements in hardware versus software processing and codecs. Alongside these technical insights, we dive into practical techniques, emerging trends, and industry-shaping facts that define the future of video.

Ideal for engineers, developers, and tech enthusiasts, each episode offers hands-on advice and the in-depth knowledge you need to excel in today’s fast-evolving video landscape. Join us to master the tools, technologies, and trends driving the future of digital video.

All Episodes

Voices of Video

The Future of Video Encoding with ASIC Technology

December 12, 2024 • NETINT Technologies • Season 2 • Episode 4

Unlock the secrets of cutting-edge technology with Dylan from Semi Analysis as we explore the revolutionary world of application-specific integrated circuits (ASICs). Discover how Google has transformed its handling of massive video content across YouTube and Google Photos by shifting from traditional CPUs to custom silicon solutions like the Argos Video Coding Unit (VCU). This episode reveals the strategic importance of ASICs in increasing efficiency and cost-effectiveness, especially in the realm of advanced video codecs such as VP9.

Ever wondered why tech giants are investing heavily in custom silicon projects? Learn how ASICs are emerging as the optimal solution for high-quality video encoding, particularly in live content scenarios. While GPUs and FPGAs have their strengths, their limitations make ASICs more appealing for tech leaders like Meta and ByteDance, who are seeking to enhance customer service and reduce costs. We dissect the challenges and opportunities within the video encoding market, highlighting Google's success against Amazon's constraints affecting platforms such as Twitch.

Venture into the complex processes of video processing and the potential future of platforms like Twitter under Elon Musk. Our discussion expands into the evolution of Video Processing Units (VPUs) and the shift towards seamless integration with existing systems through computational storage structures. By understanding the advancements in VPUs and their role in video content management, this episode offers a glimpse into the innovative future of video processing and the strategic moves driving tech companies forward.

Stay tuned for more in-depth insights on video technology, trends, and practical applications. Subscribe to Voices of Video: Inside the Tech for exclusive, hands-on knowledge from the experts. For more resources, visit Voices of Video.

Speaker 1: 0:07

Voices of Video. Hello Dylan, thank you for joining me today. So can you please start by introducing yourself a little bit about yourself and also your role at Semi Analysis.

Speaker 2: 0:28

I am the chief analyst of Semi Analysis, and Semi Analysis is a market and semiconductor supply chain research firm and consulting and we focus on, you know, from the manufacturing process technology all the way through to design IP and ASICs, and strategy as well, all surrounding the semiconductor industry.

Speaker 1: 0:52

So today we are going to discuss about the development of the ASIC projects in large technology companies and also the driven factor behind that massive ASIC investment. And in fact so, dylan. So you come on our radar screen when you wrote about the very famous Google Agos article. And why did that technology intrigue you, and can you give us a little bit brief, a brief about your thoughts about that or your fundings?

Speaker 2: 1:27

So Google's created a few pieces of custom silicon you know, ranging from the you know, the tensor processing unit for AI, which you know is very famous and everyone knows. But one of the lesser known pieces of silicon that they've sort of created for their own in-house needs is Argos. It's a completely new kind of application specific circuit right it's called.

Speaker 2: 1:51

It's a video coding unit or a video processing unit, vpu, vcu and the main idea behind it is that Google has very unique needs or at least you know they have a unique scale with regards to how much video they ingest and then serve, or photos they ingest and serve, and so you know they'll get video in all sorts of formats and resolutions from their consumers uploaded to. You know a number of their IPs, like YouTube, google Photos, google Drive, and they have to stream it back out to these users. But you don't just store the raw file and then send the raw file right, because what if someone wants the highest resolution and someone else wants they have a limited bandwidth or limited data per cycle? You know cycle, so they you know they have to store it in many different formats and at the main time. You know, at the same time, you know storing all this data across. You know the you know billions of hours of YouTube that are out there would be incredibly expensive on a cost basis, right Of just storage of streaming right. Data streaming is expensive, a cost basis right of just storage of streaming right. Data streaming is expensive. Networking is expensive, storage is expensive.

Speaker 2: 3:10

So they created a new ASIC called the VCU Argos and the whole purpose is to encode video. Right, and before they encoded video as well, but they did it with x86 Intel CPUs, right, they did it with, you know, intel Skylake and before that you know intel sky lake and before that you know prior generations of intel cpus, the. The problem with this is that you know these, these, this, the cpus, are much less efficient, especially when you start moving to more advanced video codecs uh, for example, vp9 and av1 that you know, save you know, save you on storage, save you you on bandwidth but involve a lot more processing to encode the video, and so these CPUs start to hang up in terms of performance. You start needing so many more CPUs. It's actually a delta of millions of CPUs that Google needs if they were to encode everything with just CPUs, which is incredibly costly.

Speaker 1: 4:08

There's also one element about the innovation products, right? So it's not only regarding cost, it's also regarding how they can serve the customers with a new appearance, new features, like, of course, the state already starting shutting down, but without the vc, I believe they couldn't even start.

Speaker 1: 4:31

That that is again yeah so so yeah, yeah, so uh, I think you also gathered many very interesting uh data in your article, talking about different solutions and also the cost per different solution, right? So could you give us a little more insights about those findings? So there's a very interesting talk about the software is eating the world, but in fact, I think for the video industry, it's not only eating the world, eating the CPU too, right? So maybe you can talk a little more about that part.

Speaker 2: 5:11

There's some interesting data that sort of we could point to. With regards to you know why the CPU is being eaten and Google Argos VCU, right, versus a Skylake CPU, the CPU is five times slower and it uses way more power to encode VP9, which is Google's you know, video codec for the entirety of YouTube and it has been for you know, many years, video codec for the entirety of YouTube. It has been for many years and it uses significantly more power. So, using their performance, even if you assume Google servers are used 100% always, which is very, very hard to do no one gets 100% utilization. All the YouTube, google Photos, google Drive, you know video. If you just assume it's all 1080p, 30fps, and you do H.264, which is a decade old or even older encoding technology, that's 900,000 Skylake CPUs. You know that's incredibly costly. Now, if you switch to VP9, which saves you a lot of capacity and bandwidth when you're streaming video, then then all of a sudden you're at four point two million Skylake CPUs. You know each Skylake server CPU, each Skylake server was, you know, 15, 20, 30, $40,000. You know it starts to add up to you know billions of dollars. And then if you, you know, and that's just the 1080p 30 FPS, right, you know, most people's phones can shoot 4K 60 FPS. Or, you know, a lot of people record at higher resolutions. So if you use 4K 60 FPS as the assumption, then H.264 is 7.2 million CPUs and VP9 is 33.4 million CPUs, right? So this is this is getting to the point where it's just literally impossible to get that many. Right, if you think about it, there's there's about in 2022, there's about 30 million CPUs shipped total. So we're talking about the entire capacity of the whole world just for YouTube encoding, right, not even serving the video, not even like any of the search or algorithms or comments or likes or any of that stuff. No, just encoding the video would require the entire world's capacity of CPUs. So the situation is very dire and that's why Google made their VCU. And as we look forward, right, you know, capacity of video is just continuing to grow. You know, in fact, more people are uploading video than ever before with the advent of short-form video content, you know, in the form of TikTok and Instagram reels and YouTube shorts and so on, right? Or Twitch, right, more people are streaming, right, you know.

Speaker 2: 8:11

With this stuff, you know, rising in popularity, being able to store it becomes even more costly. You know you need to get that file size down, and so the industry has rallied around AV1, which is a new video encoding software, or codec, and it dramatically reduces the size and capacity of files while maintaining the video quality. But the issue is that it's so costly to compute. You know I mentioned these numbers of 7 million and 30, 33 million.

Speaker 2: 8:47

These, these numbers would, would you know, more than double, right, you know, if I think about YouTube with AV1, you know, you know it's hard to estimate because CPUs are quite efficient. But even with the most current generation CPUs right, not Skylake, which is really old, but you know, the most current generation CPUs it would still be something on the order of 70 million servers. So that's, that's an incredible amount of compute. That is not, that's not even possible for the world to make. So YouTube wants to move to, you know, vp9, which they've already1 with their second generation version of that chip. You know they have to use their, you know, in-house expertise to design custom silicon for this purpose.

Speaker 1: 9:31

That's a truly amazing number and not feasible at all. So they have to find new solutions. In fact, that's also what we hear every day from our customers. So they tell us if we thought the new solution, they couldn't even run their business. They want to or they need to right.

Speaker 1: 9:51

So it's literally the, if not not possible. In fact, when we talk about the Google, the YouTube, when we talk about the AV1 or the new Kodak, we know there's a famous quote that they have a 500 hours video uploading per minute for the YouTube, right. That's a few years ago, but that only talked about the new ingest videos. They didn't talk about how many videos they have. Once they have the new codec involved, they need to have every video re-encoded in, right. Think about that. They themselves couldn't even count that number. That's just.

Speaker 2: 10:32

That's an important point, right? The numbers I was saying is just what's uploaded today, right, what's uploaded each year. But you know we have 15 years of history, right? You're going to want to save that video and crunch down on that storage so you don't have to buy more storage because storage isn't improving really in cost much. So you know that's a great point.

Speaker 1: 10:53

Yeah, exactly. And since you also mentioned the Twitch, before, so last September, you were highly critical of Twitch cutting their revenue splits to producers compared to YouTube right. So why side Could you provide more insights?

Speaker 2: 11:14

Last September, twitch made a very sort of controversial move, if you will, by changing their revenue splits with their partners. So this was in, you know, I think late September or October they cut the revenue split from 70-30 to 50-50, which is significantly less than YouTube, which has a 70-30 split. And you know, these cuts targeted, you know, their larger content creators, because Twitch was, you know, in a bad place. Right, they have all this video uploaded to them and they have to distribute it, but they couldn't make, you know, enough money to support it. Why? Because their infrastructure was behind where Google's is. Google has a superior infrastructure due to their use of their Argos chip, which enables them to give content creators 70% of the revenue that they generate rather than 50%. And at the same time, youtube also provides higher quality video, higher resolutions, higher bit rates, HDR, those sorts of things, even on live video, which Twitch cannot. Twitch does not offer that because they can't with their CPU architecture. So Twitch needs to move to an ASIC, but they don't have those in-house design capabilities, whereas YouTube has been, you know, and Google have been designing their own ASICs for a handful of years for this problem, and so many of Twitch has had some really big impacts.

Speaker 2: 12:57

Like sure, they were able to cut the revenue split from 70-30 to 50-50. But some of their biggest content creators moved to YouTube. They switched to YouTube streaming and they brought, you know, not everyone, not all of their viewers switched over, but many of their viewers did switch over to YouTube live. And so, you know, amazon and Twitch, you know they faced a big financial problem. Right, they kind of they either had to, you know, go down this route of cutting you know the revenue splits or of they either had to, you know, go down this route of cutting, you know the revenue splits or losing. They lost some streamers. So, either way, they were in a lose-lose situation because of their inferior you know lack of basics and their inferior hardware and server infrastructure.

Speaker 1: 13:39

Yeah, and also think about if they want to on that base. Think about if they want to on that base to create more unique or new appearance to the customers. That would be very challenging with current infrastructure. Right yeah, more interactive contents or model, higher quality videos or new formats of the service, but that that's really challenging. Yeah, yeah, so, uh, yeah. So we mainly talk about the x86 and also the ASIC right now, but in fact in the industry there are still some other solutions like GPU or IPGA. We call it a solution, but really do you have any other insights about the hardware approach that we should discuss here?

Speaker 2: 14:24

There's some other approaches in hardware out there in the industry. For example, there's some Xilinx FPGAs that maybe target this market a little bit. There's some Intel FPGAs as well, and then there's GPUs from NVIDIA and kind of from AMD and now Intel as well, that they sell into this market and they all, you know, claim they can do video encoding, and yes, they do do it a bit more efficiently than CPUs, but there's some major limitations To integrate these into your infrastructure. You know there are some difficulties with regards to the software. You know you can't just are some difficulties with regards to the software. You know you can't just, you know, put these in and expect it to work right away, because your users send you all sorts of you know, video right, all sorts of format, whether it's vertical or horizontal, or different resolutions, different bit rates, different frame rates, and these solutions are typically a little bit more stringent in what they can take, you know, or they take a lot of software work to get them to work for these more complex, you know, or for these like varied workloads and use cases, and so when you look at this right, like you know, you look at Xilinx FPGA or an NVIDIA GPU, you know you might get better throughput than a CPU, but you're still you have a lot of software work.

Speaker 2: 15:51

And then, furthermore, you know, when you look at an NVIDIA GPU, right, how much area is dedicated to the video encoding? You know, less than 10%, actually. Most of the area is dedicated for other forms of compute, right, the general purpose. You know GPU type of. You know graphics processing or render pipeline or AI and ML. And you know similar occurs with the FPGA, which is not dedicated to video encoding. You know it's not. It doesn't have any area dedicated specifically for video encoding, but it is a less flexible architecture.

Speaker 2: 16:27

And so what that ends up resulting in is, yes, you get some improvements, but you have to give away some in software and you end up with probably a more efficient infrastructure in some capacities, but you're still not bending this cost curve in an order magnitude, right? You're still spending a ton of money on encoding video and, furthermore, the availability and cost of each GPU and FPGA is significantly higher than a CPU, right? Intel's average sales price for a CPU is for a data center CPU is somewhere in the $700 range and AMD is like $1,000. You know, that's last year's data, whereas NVIDIA, you know their GPUs are significantly more expensive. Xilinx FPGAs oh gosh, they're expensive, they're, you know, they're $10,000, right is a more reasonable number for a high-end nvidia gpu or fp or xilinx fpga, right, not, not, not a thousand. So it's, it's. You know, you get, you get. You get a lot better cost, uh, you get a lot better throughput per chip, but then you end up paying more per chip and you have this inflexibility.

Speaker 1: 17:36

Um, so there's some, there's some problems in in that front I think eventually, when you talk about the, the best solution, the video industry you have to consider different factors and the cost per stream or cost per customer. That is extremely important and to my knowledge, I think the ASIC is the best way to drive the cost down, and that's one thing. Another one is that many people talk about the video as so interesting, so attractive, right, but to the industry people, sometimes they also say that the video is the ugly animal. There's so many things that could go wrong, especially when you talk about the live contents.

Speaker 1: 18:23

It really need a very focused area. It's a very focused area, and try to improve the quality or try to improve all the features. Try to serve the customer better. You have to be that's your first priority, otherwise it's just a mediocre or it's not suitable for the high end of the video industry. That's why I think they need to be a focus and I didn't see that in the GPU or in the IPJ companies, right, the video for them is still a small piece and there's no focus.

Speaker 2: 19:00

Yeah, the lack of focus is important, right, because the main market for data center GPUs is not video, it's AI and machine learning. The main market for FPGAs is well, there's not really a single main market for FPGAs, but it's certainly not. Video is anywhere in the top of that list, right, yes, they can use it there, but when you look at where they're adapting their next generation FPGAs for, it's more so for, you know, 5g signal processing or AI or networking, right, it's not for for video, video encoding. So so this, these, these, these products are going to make compromises, right, they're better than a general purpose CPU, but they're still not changing the cost curve, as Mina mentioned earlier, in a significant way.

Speaker 1: 19:54

We use IPG a lot in our company. We use that for our design. So the IPG is perfect for the small volume and very unique solutions. You need to quickly adapt not quickly enough, in fact, but you still have the flexibility to adapt the hardware structure and to study the new features and do simulations that are good. But once you want to have it on scale or try to economically make sense to serve real customers, it's not possible. Right, it's not served for that purpose at all. Yeah, yeah.

Speaker 1: 20:39

So, Dylan, you're also deep in the change of the silicon industry, right? So what insights do you have about different strategies, right, that the company is employing for their customer purpose-built silicons?

Speaker 2: 20:55

I mean in this industry there are projects, right. I mean in video encoding, right. You know Meta is known to have a project working on this. You know they're not, you know, anywhere near as developed as Google has, you know, with their Argos chip. You know ByteDance, the owner of TikTok, also has a project in this space. It's unknown but it's believed they're. You know they're not, you know, functional with it yet, but we're not quite sure. It's kind of a black box. But they're certainly working on it.

Speaker 2: 21:29

And you know you look around the industry at many other you know of these major companies that have, you know that aren't necessarily semiconductor companies. I mean, everyone's making their own chips, right, apple, google, amazon. You know Microsoft is working on some. You know all of these companies are working on it. But you know, in the video encoding market, only Google has really brought it to bear successfully. And you know. You know you would think. You know, hey, amazon, they have some of the best custom silicon in the world, right, they have Graviton, they have the Nitro DPU. Their server infrastructure is really efficient because of these products.

Speaker 2: 22:04

But in the video encoding world they haven't deployed anything that enables them, right, which is why Twitch still has such stringent limits that make it unattractive to some content creators and had some switch to YouTube, right? Is that? Well, they delete your videos after a certain amount of time. Right, amazon can't afford to store them because they're not encoded to a high quality at a small file size. Right, you can't stream at a very high resolution because, again, they can't afford to encode it in real time at a high quality and low cost, whereas YouTube can't.

Speaker 2: 22:44

And so, you know, google has been very successful in the market, which has kept them as the leader in most video content, even today. Right, they're gaining some share versus TikTok. Tiktok is actually, if you look at the growth over the last, you know, six months, they've been effectively flat in terms of watch time, um, whereas, whereas YouTube shorts, you know, it's still smaller, but it's growing significantly. And that's you know why? Because, yes, youtube has a lot of users, but YouTube is also paying their content creators on shorts and they're paying them a significant amount, more than TikTok is.

Speaker 1: 23:19

Why? Because because Google has a more efficient infrastructure right.

Speaker 2: 23:22

And it all comes down to, you know, these custom built chips that Google's developing right, their infrastructure is just more efficient, enabling them to, you know, do more with less, and so you know, that's, that's, it's, it's, it's, it's a strategy that's worked really well for them. That maybe is is, you know, others, others want to emulate, but they haven't been able to to emulate it yet.

Speaker 1: 23:50

It's very interesting. So what you're saying is that the, the efficiency of the, their infrastructure, also enables their to have a better business model on the upper layer.

Speaker 2: 24:01

It feeds through, which a lot of people don't realize is that it always feeds through. The business will always depend on the infrastructure below it and you might not realize it, but this is why TikTok has almost no monetization for their customers, because they have to capture it all and it's believed that TikTok is not even making much money at all. You know, despite YouTube being a very profitable enterprise, right, you know. And even YouTube shorts and and metas. You know saying yeah, they're going to, they've said on earnings calls. You know short their reels, their Instagram reels and Facebook reels, which is their short form. Video also doesn't make money yet, which is which is a significant deal. Right, because Facebook.

Speaker 2: 24:46

You know they're working on an ASIC, on a, on a, on a.

Speaker 2: 24:49

You know a specific video encoding ASIC, but they don't have it yet today, and so you know, between, you know being able to monetize there, but also you know just the cost of each video that's uploaded and serving it.

Speaker 2: 25:05

They're both at inefficiencies, and so you know Meta specifically said that they hope to be able to be profitable on Reels next year, but you know they're not today and you know that, coincidentally, lines up with them saying they're, you know they're not today. Um, and, and you know that, coincidentally lines up with them saying that they're, they're, they're, you know, with, with rumors about their ASIC being ready, you know, later this year. Um, you know, you know, maybe you know, and that's if the ASIC works properly on their first shot. You know there's, there's a lot of chance that their ASICs don't work, um, on the first shot, because there's always problems with the semiconductor industry. So one could correlate the fact that their ASICs should be ready later this year and them saying they'll be profitable next year on reels, as that being direct evidence that their platform is gated and their profitability is gated by their lack of semiconductor expertise with their own in-house solution.

Speaker 1: 26:03

So I cannot share the customer name yet, but I can say the thing is going to be a change for this year. So one of the biggest the short video or social media company also adopting our solutions and they are already seeing they have cut 80% of their operation costs because of this. So things will change. And yeah, then, talking about the big companies, they try to, if you talk about that, the potential they can have more to have ASIC to change their infrastructure. Is that the same for Twitter? Right? Just really interested. They didn't have anything for that yet, but since Elon Musk is talking about they want to build everything in one app and talk about 4K live streaming or HDR on the Twitter as well. Do you think that makes sense for them to have that ASIC solution too? Right, it seems obviously right.

Speaker 2: 27:04

This is an interesting one, right Like. So, if you look back in the history of short form video, right, you know it wasn't TikTok that made it popular, it was Vine. And then Twitter bought Vine all those years ago and then they shut it down. Why did they shut it down? Because, infrastructure wise, it's just not profitable to serve video. And now, you know, elon Musk bought Twitter and he's, you know, floated the idea that they're going to bring back money and in fact, they've been testing, you know this, this tab with short form video sometimes.

Speaker 2: 27:35

So, you know, the question is, you know, what are they going to do for? For, you know, for hardware infrastructure, right, like, they use a lot of on-premises infrastructure currently for Twitter, but the problem is, you know, that's adapted for serving text and you know, doing that as efficiently as possible. How are they going to move to video, which is a you know orders of magnitude more video? Uh, you know volumes of data, right, orders of magnitude more volumes. So how are they going to, how are they going to solve that solution? And it's you know. Well, you know they just bought the company. And you know, silicon timelines development timelines take years, um, they take multiple years to to come to fruition. And and it ends up, you know, even if they wanted to launch, you know, a short form video content, they may not be able to do it, you know, at any reasonable cost until you know, three or four years from now, if they develop their own in-house solution. So, so there needs to be a solution in the public market for them. Right, and and and, furthermore, right, Like, if you think about it, right, you know, just just just, meta and TikTok, bytedance, and you know, and you know every other company serving a lot of short form video. You know Amazon, or long form video. Right, you know these three companies. You know there's there's a few more in China as well. Like, like Tencent and so on. Right, like, these companies all are serving tons of video. And then you add Google as well, and you know, that's, that's five companies that are already serving tons of video today. If all five of them develop their own ASIC solution, that is, that is, hundreds of millions of dollars. You know each, you know of. You know at least, at least $100 billion of non-recurring engineering expense at each company, right, so that's $500 billion, right, poof, you know.

Speaker 2: 29:30

So this is why you know semiconductor industry is important, right, like people always talk about hype, you know everybody's going to make their own silicon. Yes, they're going to make their own silicon where they can, where they have the volume to support it. But what if you don't have the volume on day one? Or you know, you look across the industry, there's five players. That's five hundred million dollars. Ok, let's divide that across how many units you need? Ok, maybe you only need, you know, one hundred million, one hundred million dollars. Maybe you only need a million units, right, you know, that's OK, that's one hundred dollars per unit that you're spending on non-recurring engineering.

Speaker 2: 30:02

And that's not even talking about the cost of the chip, the cost of the memory, the cost of implementing, the cost of software. It becomes. It becomes too much for each individual company to to build. So so this is why you need a merchant silicon solution, right, that that can say, hey, we'll, we'll do that development once and we'll actually we'll develop it better because we know your needs and we know your needs and we know your needs, right, we're uniquely suited to each company's needs because we we communicate with all them. So we're more flexible. We have a more robust software.

Speaker 2: 30:35

Uh, that's more flexible. It can. It can take more forms of video. Uh, maybe today, meta's goal or Twitter's goal or Tencent's goal is long form video that's shot only horizontally at certain resolutions. But what if, all of a sudden, they want to do vertical video that's at a different resolution? Or maybe they want to add a feature that they didn't have before. Well, now they need to go back to the beginning of the silicon timeline and implement it and wait three years and then have the silicon come out, and now they can do that efficiently. Otherwise they'll do it in a very expensive way. So this is where merchant silicon that's more flexible kind of comes in. They can take this $500 million and bring it down to hey, we're the only ones spending it, and now we can sell a million units to you, to you, to you and you.

Speaker 1: 31:22

And that all of a sudden to you, to you, to you and you, and that all of a sudden makes more economic sense. That's a very good point. So, every company they face, they have the needs, right, and we already discussed that but they always need to make the decision, make or buy, right? So right now, it seems that seems a good solution, right? Good candidate for that, but let's figure out. I think it's a good solution, right? Good candidate for that, but let's figure out. Yeah, yeah. So we talk about a lot of what have existing here and what, dylan. So you have seen so many companies, so many technology solutions, right? So from your perspective, what's the next customer silicon data for the video processing? From your perspective? What's the next customer silicon for the video processing? From your perspective?

Speaker 2: 32:04

Of course, you know this is a very loaded question because, of course, there's only one company that's making merchant silicon for video ASICs, and that's NetApp, of course. But you know, as far as the custom silicon market, right, like, ok, fine, I'm uploading video, but but you know, and I'm encoding it into a new format, but that's like one you know one use case, uh, but it turns out, you know, if I'm on youtube, um, you know, it's great because I can, I can look at the captions. So how am I generating these captions? Um, it's great, because, you know, I don't just let let them people upload videos of, you know war, or you know, uh, you know other other bad items, right, you know, like, like things that you wouldn't want to show kids. You don't just let people upload that stuff, right, they prevent that. And so how do they do that? Right, a lot of this is AI algorithms, right? You know how do you do captioning. Well, you run a, you run a, you run a voice to. You know you run a model that a a voice to you know. You know you run a model that that can convert voice to text, um, and then you'll run, maybe, another model that converts that text and adds the correct punctuation and capitalization and so on and so forth. Um, and then you'll also, you know, to make sure it's safe, you know, for for everyone, or make sure people aren't uploading illegal content and sharing it on your website, because now you're illegally at fault.

Speaker 2: 33:21

You know, if they do that, you, you have to scan every single video. Well, of course, there's so much video no one can look at it. So again, you're utilizing ai, uh, you're, and you're doing detection, like, hey, are there guns shooting in this video? Um, you know, are people dying? Is there a lot of blood here? Right, you know, you know, are there? You know illegal acts happening? Are there drugs? You know all these sorts of things. Ok, if those things are happening, we'll review the video further and maybe we can do a quick pass, you know. But we can do that, you know, after, later. But you know, every single video needs to be scanned, it needs to be captioned, it needs to be hey, what's it? What content is in the video? You know, sure, they put search title up. You know, they put a title, but there's a lot of other content right.

Speaker 2: 34:13

Well, what if someone says you know, they want, you know, whatever the video you know they're looking for videos of I don't know? You know, whatever it is, they're looking for a video of it. Maybe it's a machine, maybe it's a, maybe it's a tutorial on math, but the title doesn't have that word. But it still shows up in your search. Why? Because when the video is uploaded, they're actually running an algorithm that pulls out metadata. It sees oh, what are the main topics of the video? Oh, this is about metadata. It sees oh, what are the main topics of the video? Oh, this is about, you know, a car, and it's not just about a car, it's about Toyota, and it's about the fuel economy and it's about the reliability of the car, and blah, blah, blah. So now, when I search reliable cars, you know the video that says, you know, the review of Toyota Camry now shows up.

Speaker 2: 34:57

right, that's the beauty of you know, YouTube and some of these other video platforms, you know, making content discoverable, or hey, this content, you know, pulls out all this metadata. You know you were watching this video, by the way, you know why don't we suggest this video to you? Well, how am I generating that metadata about every single video? So these are operations I'm doing on every video that I'm also doing besides video encoding. And do I encode the video and then run them on a CPU, or encode the video and then run them on a GPU? Do I have to, you know, do I have to do multiple passes?

Speaker 2: 35:32

It's very costly, right? Especially when I think about memory and networking costs. You know reading and writing multiple times. Reading and writing multiple times, the innovations that Google's adding in their next generation Argos and, I'm sure, some of these other custom silicon projects from Meta and TikTok, ByteDance sorry, these innovations. They're adding some AI processing on the chip. They're adding some general purpose CPUs, just a small amount, so you can do some of these operations at the same time as you're encoding the video, and so you're not reading and writing data multiple times.

Speaker 2: 36:06

You're not wasting money on networking. You know all of these costs are being saved because you're putting them on the video encoding ASIC. So now it's not just a video encoding ASIC, really, it's a video processing ASIC. And video processing involves a lot more than just encoding it's. It involves that that detection of, of illegal content involves. You know what content can I advertise with this video? Hey, what, what, who, what are, what are some related topics that might work with this video? What's in the video that they didn't say is in the title? You know, all of these sorts of things captions all have to also be processed, and that's what the next generation of video processing and encoding ASICs will do.

Speaker 1: 36:48

Yeah, so that's an interesting point as well. So when we have the first generation product, we call it video transcoders. Then we have the second generation Codra products. We rebrand it as the BPU Video Processing Unit. It's really to answer a lot of the questions or the interesting points you raised that you did on that conversation. So we have many features already checked. But yeah, it's for the customer to find and how to use that.

Speaker 1: 37:18

But there are so many things that we can do for the video part right Like how to identify the contents, use the contents or even to work more generally with the AI part right. There's a so broad range of the things we can do now.

Speaker 2: 37:36

What was the feedback from customers? You know when you're first generation and you know as you're designing a second generation. What was the feedback from customers when you're first generation and as you're designing the second generation? What was the feedback that made you decide, hey, we need to add all these features. That's obvious today, but this is a couple of years ago at least, where you had this input and decided, or a few years ago. So what was the feedback you got from customers that made you drive towards that decision?

Speaker 1: 37:58

When they have the first generation. In fact it's a half solution we have the limited or not Compared to CPU is quite efficient, but compared to what's ideal that the VPU is, still have some room to improve, right. So the customer feedback normally on certain areas one is they want even higher density, higher performance and also they want new codec. So we added the AV1. And to answer the performance, we increased the peak performance from 4K60 to 8K60 and added the new codec. And also they want more scaling feature because they have, like the ABR lighter. They need one resolution in and a split where the scale down to different resolutions right that we add very powerful scalers. And also they want to understand the content in the video.

Speaker 1: 38:58

Just like you said, they want to know what happens in video and to fully utilize the value of the video and also to prevent some of the the quadra. So, and also there are some other parts we are not fully utilizing yet, like the audio processing as well, and also we have relatively powerful 2D engine that can and the DSP can do a lot of programming as well to have more flexibility on the fly to add new features and serve for the new requirements. That's a feedback we got from first generation and put to the second generation. In fact, we are also in the feasibility phase or third generation. We are also open to ideas and suggestions. That's why I'm also asking you is there any something that your customer are expecting to see in the future? So we can continue to improve that? And video is a really focus area.

Speaker 2: 40:11

That's super interesting. So you know your first generation solution, yes, it encoded video, but it was very, you know, stringent in what it could encode, right, you know only certain target resolutions, which was fine for you know, at that stage. But as you went forward to the second generation, customers demanded, you know, yes, this is an ASIC, but give us flexibility. And you know, furthermore, you needed to add some of these functions. Like, you know a little bit of AI processing so you could caption a video or you could detect if it's illegal content, or you know a little bit of AI processing so you could caption a video or you could detect if it's illegal content, or you know these sorts of things. And so you added that with your second generation and you're improving that in your third, you're making the chip you know the chip much more flexible, right, you know you can just one video. You can put it in multiple formats, you can ingest it in almost any format and put it out in almost any format. You know you added support for AV1, which is, you know, still not deployed heavily yet, but it's going to be deployed heavily, right, I mean, everyone's adopting it. You know Netflix has said they're going to adopt it. You know, I believe YouTube has said they're going to adopt it. You know there's a lot of firms that have said they're going to adopt AV1.

Speaker 2: 41:21

There's AV1 support in a lot of devices, right. Every Intel, every new Intel CPU, every new AMD CPU, my new. I just got a new Qualcomm phone. You know they all have support for AV1 decode right, but encoding is very much. You know, even where there is encoding support on the newest, you know, nvidia GPUs or Intel GPUs, it is very limited in terms of what level it can support and how much throughput it is, because this is such an intensive operation and you know the main purpose of that chip is not encoding. So you know you've added support for that as well and you've raised the throughput and resolutions and flexibility. Can you talk about, like, the software implementation of it, of this, right? You know what are some pain points, because I've heard a lot of pain points in the industry about, you know implementing ASICs into a process. You know implementing ASICs into your workflows, into a distributed system where I'm getting video from everywhere and I'm exporting video everywhere, right? Can?

Speaker 1: 42:25

you talk a bit about that.

Speaker 1: 42:27

In fact there are different scales or different layers of the issue, talking about a software solution, a total solution for the video or generally for the ASIC solutions. So the first and most people didn't notice is how it can work with the host system or different operating system, different kernels. It's a really painful process for ASIC or not only ASIC, for any hardware to work with different operating system. They keep upgrading right the kernel always has different versions. When you spend hundreds of millions to develop a driver and there's an upgrade with a different version, you need to redo that again. That's why we start from the beginning.

Speaker 1: 43:17

We are designing a totally new approach. We call it the computational storage structure. So or VPU is on top of the existing NVMe driver. So whatever the operation system or kernels, as long as they support NVMe SSD they can support us. And to adapting to operation system or kernel version from to a new one. To us it's only a few days work and mainly it's for the testing. We want to make sure everything is right for that and we have several forms of a few hundred servers. Try to run the 24 by 7 to test, to make sure it's mature for that system. But for us it's really easy to do and we can even plug and play to the system. They can recognize the cards and start to use it using right. That's the very fundamental layer that is different compared to all the other solutions, which is much, much advanced.

Speaker 2: 44:33

Everyone uses NVMe server SSDs, right? Nvme being the protocol for, you know, solid state drives. That's pretty much the dominant one, right, you know, for the last, at least you know, five to 10 years, right, it's been the dominant protocol for.

Speaker 1: 44:49

SSDs.

Speaker 2: 44:49

And so you know you're sort of piggybacking off of that infrastructure, right, saying hey, you know our ASIC, we're just an NVMe device, right, you read, you write to us and then you read from us, and it just so happens when you write to us you send us the unencoded video or video that's poorly encoded, and we output the properly encoded video, um, and so you know every, almost every. You know you know every x86 and even arm cpu supports nvme, right, um, you know my, my laptop has nvme. Uh, you know, actually, iphones people don't know this. The iphone nand is actually communicated with the SoC over NVMe. Of course you're not going to get into an iPhone, but NVMe is everywhere. So it's an industry standard that you support and it's in every host system. So that's really cool and it makes the ease of use a lot better.

Speaker 1: 45:47

Yeah, so we support x86,. We support ARM architect servers. We also support IBM 10.9 series of CPUs as well. We support Linux, windows, mac and also Android systems. So whatever you have, as long as you use the SSD the MMSSD you can support it.

Speaker 1: 46:09

So that's easy. And beyond the fundamental layer, on the software layer, we are working with the open source system like the FM pack and theStreamer and to fully seamlessly integrate to that and in fact most of the customers are using these two framework for their reading flows. So whatever they have built on software using FM pack or GStreamer or similar solutions, they can easily to work with us. It's just to recompile the FM pack with our library, that's it. When you run the code you just change the pointer from software to us, then you finish, then you can keep your current workflow as it is. That's just so easy. Of course, there are some of the challenges that we can see with the FM package. Fm package was designed for the single thread.

Speaker 1: 47:15

It's not for the very intensive, massive and parallel processing. So for the bigger guys like the top cloud companies someone you already mentioned they will skip the FM package layer or API directly. Then they can truly utilize the whole potential of the hardware. But that's an advanced layer of how to use that.

Speaker 2: 47:39

You know, ffmpeg is sort of the industry standard, right Like if I wanted to encode, I use FFmpeg on my Windows, right Like if I wanted to encode. I use FFmpeg on my Windows, right Like if I wanted to encode some video that I had. So this is not only for, like you know, an application like the cloud, right Like this could also, you know, proliferate down to you know, some security applications where I have hundreds of cameras or dozens of cameras in a building. You know, instead of having to encode them on you know so many CPUs. I could use this. You know maybe one or just two net in ASICs and I could encode all of those videos at once and it would work on, you know, even a Windows system, that's. So you don't have to upgrade a lot of your infrastructure. You plug in an ASIC and you know again, you know almost every CPU. Even, you know, going back, you know even desktop CPUs. For a decade supported NVMe. So it's not a big difficulty to get this up and running.

Speaker 1: 48:33

Yeah, exactly, that's also one of the use case that our customers start to using. Whatever they have the cameras, all the style, or the new ones, they can all accumulate the video streams to the server with or cards. They can transcode that and also analysis them in the same time, right then then then they can uh compress it and just either start start that on local or stream out to the call data center for further analysis or for the first responders to watch the video live. That's a very typical use case.

Speaker 2: 49:15

So your second generation has some AI processing right, and the third generation has even more. So, you know, as you mentioned, you know you could take the video from the cameras, you could encode it. Or you could just run inference and say, hey, there's somebody on the screen, or hey, the screen is, you could encode it. Or you could just run inference and say, hey, there's somebody on the screen, or hey, the screen is, you know, changing. Okay, we'll store this video, but we won't store the rest of it. Or hey, there's somebody that looks suspicious on the screen, we'll send this to. You know, we'll alert the authorities automatically. We'll alert our you know, our reviewer automatically, rather than, you know, than having to wait and saying, oh no, what was stolen? Oh well, we can look back at the video. No, no, we can preempt this.

Speaker 1: 49:55

Yeah, and there are some, in fact some of the features that people didn't have a chance to use or didn't realize the value of that yet. Like we talked about the mme protocol, right, but by designing that way, we also can have a box of our cars just connect to the host through the mme or fabric. That is, a pool of the results can be shared by not only one host but the whole data center. They can access the VPU, access local results. By doing that they can further improve the efficiency of their resource utilization. So that's a pretty big thing, I'd say, for the hyperscalers and I think that will be very valuable to the customer, especially if you have the video encoding, decoding part. There. You also have AI part, also shared with the whole data center scale. That is just, I think, is a hidden jewelry that nobody really has a chance to use yet.

Speaker 2: 51:06

So the flexibility is there. There's some you know, unique use cases, especially on like retail, and you know manufacturing. You know where you're gonna record a lot of video and you want to. You know you want to take actions based on what's in the video. So you know you look at smart cities and you know we talk a lot about the content as a consumer earlier in the show. Right, you know we talk a lot about the content as a consumer earlier in the show, right, you know what happens with YouTube, what happens with. You know Instagram reels and TikTok and all of these. You know forms of Twitch, you know video streaming and all these sorts of things. But you know we didn't talk about like. You know this has use cases beyond that in the smart city and in the manufacturing and in industrial. You know every traffic light could potentially in the future, have cameras on it, right, and you can, instead of encoding, you know, and you can stream it all and have maybe a single encoding chip serve multiple. You know a whole store or a whole block.

Speaker 1: 52:04

Yeah, and there are, in fact, some other applications that the normal people may not notice, like the virtual desktop infrastructure. Normal people not using their own local machine. They're using the servers a few miles away or hundreds of miles away. That's all. It's video streams streamed to your device. You can work from hotel, from home or whatever place with the highest performance of the server on the data center. What you need is just a mouse occupied and also a bigger display. Okay, yeah, yeah, I think the time is over. So thank you, dylan, it was great to have you here and also hear your valuable insights about the industry. Thank you very much.

Speaker 2: 52:57

Thank you as well, alex. I look forward to chatting with you again. This episode of Voices of Video is brought to you by NetInt Technologies, if you are looking for cutting-edge video encoding solutions.

Speaker 1: 53:11

check out NetInt's products at netintcom.

People on this episode

Voices of Video

Voices of Video

The Future of Video Encoding with ASIC Technology

People on this episode

Mark Donnigan

Jan Ozer

Anita Fejter

Dylan Patel