Hyperscale for Video | Stop Asking GPUs to Be Everything at Once Artwork

Voices of Video

Explore the inner workings of video technology with Voices of Video: Inside the Tech. This podcast gathers industry experts and innovators to examine every facet of video technology, from decoding and encoding processes to the latest advancements in hardware versus software processing and codecs. Alongside these technical insights, we dive into practical techniques, emerging trends, and industry-shaping facts that define the future of video.

Ideal for engineers, developers, and tech enthusiasts, each episode offers hands-on advice and the in-depth knowledge you need to excel in today’s fast-evolving video landscape. Join us to master the tools, technologies, and trends driving the future of digital video.

All Episodes

Voices of Video

Hyperscale for Video | Stop Asking GPUs to Be Everything at Once

October 16, 2025 • NETINT Technologies • Season 3 • Episode 28

What if video finally got its own processor, and your streaming costs dropped while quality and features went up?

In this episode, we dig into the rise of the Video Processing Unit (VPU) - silicon built entirely for video - and explore how it’s transforming everything from edge contribution to multi-view sports. Instead of paying for general-purpose compute and GPU graphics overhead, VPUs put every square millimeter of the die to work on encoding, scaling, and compositing. The result is surprising gains in density, power efficiency, and cost.

We look at where GPUs fall short for large-scale streaming and why CPUs hit a wall on cost per channel. Then we follow encoding as it moves into the network, building ABR ladders directly at venues, pushing streams straight to the CDN, and cutting both latency and egress costs. You’ll hear real numbers from cost-normalized tests, including a VPU-powered instance delivering six HEVC ladders for about the cost of one CPU ladder, plus a side-by-side look at AWS VT1/U30 and current VPU options.

The discussion also covers multi-layer AV1 for dynamic overlays and interactive ad units, and how compact edge servers with SDI capture bring premium live workflows into portable, power-efficient form factors.

We break down practical deployment choices such as U.2 form factors that slide into NVMe bays, mini servers designed for the edge, and PCIe cards for dense racks. Integration remains familiar with FFmpeg and GStreamer plugins, robust APIs, and a simple application layer for large-scale configuration.

The message is clear: when video runs on purpose-built silicon, you unlock hyperscale streaming capabilities - multi-view, AV1 interactivity, UHD ladders - at a cost that finally makes business sense. If you’re rethinking your pipeline or planning your next live event, this is your field guide to the new streaming stack.

If this episode gives you new ideas for your workflow, follow the show, share it with your team, and leave a quick review so others can find it.

Key topics
• GPUs, CPUs, and VPUs - why video needs purpose-built silicon
• What 100% video-dedicated silicon enables for density and power
• Encoding inside the network to cut latency and egress
• Multi-layer AV1 for interactive ads and overlays
• Multi-view sports made affordable and reliable
• Edge contribution from venues using compact servers
• Product lineup: U.2, mini, and PCIe form factors
• Benchmarks comparing CPU, VPU, and AWS VT1/U30
• Cloud options with Akamai and i3D, including egress math
• Integration with FFmpeg, GStreamer, SDKs, and Bitstreams

Download presentation: https://info.netint.com/hubfs/downloads/IBC25-VPU-Introduction.pdf

Stay tuned for more in-depth insights on video technology, trends, and practical applications. Subscribe to Voices of Video: Inside the Tech for exclusive, hands-on knowledge from the experts. For more resources, visit Voices of Video.

Mark Donnigan: 0:07

The catchy title is Unleash Hyperscale Streaming. Uh let's jump in here. So before we start talking about VPUs, I I thought it might be helpful to talk about the hardware alternatives. And I think everybody is familiar with GPU. Uh it turns out that GPUs are actually not dedicated for video. In fact, only about 15% of the chip surface area is dedicated to video, the rest of it is to graphics, hence GPU, graphics processing unit. So we looked at this and we said, you know, there actually is a new category of silicon that needs to be created, and that is called the VPU. We call it the VPU, the video processing unit. And what's special about the VPU is that it's 100% of the silicon of the chip is dedicated for video processing. And this is what differentiates from GPU, which only has 15%, and CPU, which has 0%. It's general purpose compute. So VPUs are the only silicon that are that are purpose-built for video. Um to date, we have 150,000 in the field that are installed and working in networks. And uh we have encoded over a trillion minutes of video. So uh VPUs are you know are here and uh advancing quickly into the market. Now the question is uh brand new category, but who else is innovating in this area? And it turns out that Meta and Google also are building VPUs. Uh we did beat them to market. Uh Meta launched their chip in 2019 for Facebook and uh Google uh for YouTube was 2021. These chips, of course, are only dedicated to Meta and uh and to uh Google properties, so they're not available for, shall we say, the rest of us? Uh very important. All right. Now, what I thought might be interesting again before we get in and we talk about uh, you know, literally the silicon, talk about the product, is to look at some use cases. So, where would you use a VPU? How does this product even fit into common streaming workflows? And uh the first one that I want to highlight would be infrastructure as a service. And we have partners here in the booth. Um our partners will often deploy in the network custom silicon so that the video encoding function actually is integral into the network. So the old model, we believe, of where you have silos, you have a location that you do all your video encoding in, uh, that actually is shifting to now the encoding functions happening inside the network. A lot of advantages here. One of the major advantages is cost. Uh, this is an example of an Akamai pricing plan, and I'll point out that the small plan is 42 cents an hour for 30. That will give you 32 live streams, 32 channels of 1080p, AV1, HEVC, and or H264. So any mix of codecs for 42 cents an hour, which absolutely crushes any of the alternatives that are out there in terms of cloud-based infrastructure. Another use case that uh one of our uh major service providers is actually deploying in the market is multi-layer AV1. Multilayer AV1 is very interesting because it gives you the ability to have dynamic graphics on top of video that's all a part of the codec. And so this is uh the what you're going to start seeing more interactive advertising units, and it's being powered with the AV1 Kodak, and then silicon is a way to do that, specifically VPU, uh, in a way that's very, very cost effective. Another use case that is very common is live contribution encoding. We're all familiar with obviously this function, but what's unique about this use case is now directly in the um stadium or in the location where the video is is being originated from, you can create the ladders. And so now rather than having the stream up to the cloud, you have the additional latency and the cost of getting the stream into the cloud, transpoding it, and then distributing it to an origin. Now you can distribute directly from the venue to the CDN. And this is becoming a really, really common way that live workflows are being built for especially broadcast, high-quality sports streaming, uh, premium applications. We are showing a server over on the side here. Uh, it's called the um uh Quadra Mini server, and it includes SDI capture, um, is very high performant because it has a VPU built into it. It's very compact, very small, and it's perfect for this edge live contribution enclosing. And then there's a use case that's uh becoming very popular in sports called multi-view. And this is where you might have four channels that you're watching on a single display. Could be on a TV, it could be on a tablet, uh, a number of services, uh including YouTube, uh Disney ESPN, they're offering this uh application. This is very hard to do in software because it's very expensive because you end up having to run for each channel or each viewer stream four complete encoding workflows. But again, when you're doing this in the VPU, a single VPU, even at 42 cents an hour, can handle this. So multi-view now becomes not only possible, but economically feasible. And this is something that a lot of services are really being attracted to because the consumer loves it. It's a it's a pretty cool use case. Let's talk about products. So the uh cooperate VPU family, this is our second generation, and we have commercialized this chip in four different form factors. So starting on the left is our T1U, happens to be our most popular, it's our most flexible product. It's a U.2. You can see it looks like a SSD hard drive. Uh, you can stop over at our stand here and take a look, touch it, feel it, see what it looks like. Um, that is a u.2 form factor. You just open up the front of a server with a open u.2 slot, literally just plug it in. Because these all use NVMe, there's no driver configuration, there's no complicated setup, it's just recognized. Uh, if you have, if you're using an FFM peg-based workflow, GStreamer, or any other commercial application, we have plugins, we have all the APIs that interface. The next form factor is the T1M, and you can also look at those, see how small they are, very, very small. This is going in ultra-compact servers that are going at the edge of the networks. Uh, one of our service providers is building in tens and tens of thousands of these at the edge of the network in their architectures. And then there's a T1A and the T2A. Um, these are your traditional PCIe uh add-in card format. You know, it's what you would know as a PCIe card. Uh, there's a one-chip version and a two-chip version, hence the 1A and the and the 2A. Let me just back up here. Sorry, I also give you some uh performance specs because that's really important. So on the uh single chip, so the T1U, T1M, T1A, you can encode 32 simultaneous 1080p 30 streams, uh, any mix of codec, AB1, H E P C H264. If you say, well, what if it's P60? You just simply divide by two, so it gives you 16. If you were to say, um, what would I get for a 4K? 4K is fully supported, UHD, you just divide by four, so it'll give you eight, and uh 8K, it would uh uh give you four uh and or or two, sorry, and 8k. So uh this is a very, very, very capable solution, extremely high density, and by the way, draws 20 watts, so it does all this in a 20-watt power envelope, which is unheard of. All right, now servers. So the beautiful thing is I already mentioned that we use NVMe, which allows them just to be plugged in and uh be able to work seamlessly in any NVMe enabled server. Uh, because the host machine is not uh actually really doing anything more than just managing uh data over the bus. There's no computational load that's on the host. You can run a very mid-level, uh, I would even say entry-level machine. Do not need an expensive server. This is our product called the Quadra Video Server. It's available in ARM, available in x86. We have various AMD chips uh that we uh processors that we support. So, based on your requirements, all of this is available. And of course, we also support um you know whatever machine you might might be using. This is the Quadra mini server. I uh referenced this earlier when I was talking about the edge uh um you know, transcoding type application, and you can look at one over here, very compact. Uh, this is you know ideal for you know mobile broadcast, any sort of application in stadium where very small footprint is required. All right. Now the question then is what about the cloud? Because the cloud is important in I think modern workflows, not everybody wants to operate their own infrastructure. So there's some very good news about the cloud. Um, we are in Akamai, and Akamai uh has a booth uh facing the aisle over here, so you can talk to them uh about their support. Also I3D. We're in i3D. And this is a comparison of what you can do performance-wise using CPU on cloud and then using VPU in the network. So just to make it apples and apples, we decided to uh run the testing. In fact, GNL is featured over here on the side. Um, GNL uh ran all this uh data and did the study for a project they were working on. And the question was what for the same cost per hour or similar cost per hour, how much of a benefit can you get running VPUs? And so on the left here, you see at 52 cents an hour, there is a CPU instance that there is a uh CPU instance, um, a very um, you know, I would say kind of middle, upper middle configuration, 32 gigs around, 16 CPU cores, a very capable server. And in software using X264 or X265, you could get one ladder, so four AVR renditions. Um, and actually, this is HEBC, so this was this they were running X265, of course. Um, you could get one on that whole machine. So you basically pay 52 cents an hour per channel, is how you would think about this. Now, the uh VPU accelerated instance is um just one penny more an hour, so 53 cents, so effectively the same cost. But with this, you could get six ladders of HEVC. So there's a six times or six X advantage. And again, what's important to note here is that this is you know cost normalized. So obviously, you know, it would be possible to go pay a whole lot more, get a CPU that could do maybe two ladders, maybe you'd be lucky to get three, but you'd be paying a lot more than 52 cents. So this is cost normalized, and you get a 6x advantage. And here's a little bit of a further breakdown of how the costs uh play out. Uh and again, for this particular study, it's on Akamai, but we encourage you to talk to I3D. We have some other partners who'd be very happy um to you know talk about their cost advantages. Uh but this year on the CPU, um, there's a couple uh ways to to read this. So there's a CPU plan, and uh this is just single channels. So this is not an APR ladder, this is just channels. So think of it as a single screen, uh 1080p, and you could get on the CPU on this particular instance, it would cost $72 per month, and you could get two channels. So if you're just running two live channels, um, you do it software, you're gonna pay $72 a month. Now compare that to the BPU instance, you would get 18 channels, so 9x advantage, but you pay $15. So you have uh, you know, a uh more than a 4x cost advantage here, and you have a 9x advantage on density, just absolutely tremendous. Now the question is okay, but what about AWS? AWS has an accelerator as well, it's built on the Xylinx uh FPGA board, it's called U30. So what about that one? VT1? Well, you can look here and say, okay, you do it, definitely beats software, so you get eight channels compared to two, so that's good. It's less money, roughly just call it ten dollars less, so that's good, but it's not as good as VPU. So here we give you 18 channels, you get eight. Here we're $15, you're $62. So really, really super compelling. Uh the advantages of VPUs are just so overwhelming that most people who test and evaluate just move immediately into adoption. Now, uh a couple things on egress, because then the question can often be okay, but I'm running portions of my workflow, you know, uh maybe on other public clouds or somewhere else. So I do have to take into account the egress costs, and that's 100% correct. So the good news is that on Akamai, it's five dollars per terabyte, and uh on AWS it's 90. So it's a whole lot more here. So also when you move to Akamai, uh the savings are so overwhelming. All right. Wrapping up now, so we talked about the hardware, and hopefully, I've given you a really nice, compelling um reason to test and evaluate and explore it further. But then the question is what about software? Um, these are engines, and that's often how we think of them. You know, they're engines, but at the end of the day, you need a car, you need all the other parts of the car. So we have uh a solution. First of all, um we have plugins for FFmpeg, uh GStreamer, we have a very robust API, we have an SDK for people who are doing really low-level integration. All of that is available, but we also have an application called BitStreams. And um, Bitstreams, you can also get a demo, it's uh set up over here, and our FAEs will be happy to um talk to you further about that. Um, but Bitstreams is just a really nice, easy-to-use interface. I pulled a couple screenshots just to you know give you an idea of how beautiful it is. Have some very, very uh easy configuration. It has templates, it has everything that you would expect and you would need to make it really easy to use VPUs in your workflow. So, with that, I'll uh wrap up there and say thank you for joining.

Mark Donnigan: 18:29

This episode of Voices of Video is brought to you by NetInt Technologies. If you are looking for cutting edge video encoding solutions, check out NETINT's products at netint.com.

People on this episode

Voices of Video

Voices of Video

Hyperscale for Video | Stop Asking GPUs to Be Everything at Once

People on this episode

Mark Donnigan

Jan Ozer

Anita Fejter