Voices of Video

Cores Galore: Video Processing Without the Computational Gymnastics

NETINT Technologies Season 3 Episode 15

ARM architecture is revolutionizing video processing with power-efficient processors that deliver predictable performance without the computational gymnastics required by traditional x86 systems.

• Ampere builds ARM-based processors with massive core counts (up to 192 cores) focused on sustainable computing
• Traditional x86 architecture struggles with video workloads due to multi-threading causing unpredictable performance
• Single-threaded cores in ARM processors provide predictable execution crucial for video processing
• ARM processors consuming just 1 watt per core enable 320 simultaneous 1080p30 live transcodes in a single 1RU server
• When paired with VPUs for encoding, ARM cores can handle complex tasks like de-interlacing, MPEG-2 decoding and AI captioning
• Real-world implementations show dramatic improvements in density and efficiency for video streaming platforms
• Ampere processors are now available in over 25 cloud providers worldwide including Oracle, Google, and Microsoft Azure
• The video industry is seeing rapid adoption of ARM architecture due to performance, predictability, and power savings
• Software compatibility has significantly improved with modern compilers optimized for ARM instruction sets
• The combination of VPUs and ARM CPUs enables entirely new workflow capabilities previously impossible or prohibitively expensive

Discover more about NetInt's video processing solutions at netintcom.

Stay tuned for more in-depth insights on video technology, trends, and practical applications. Subscribe to Voices of Video: Inside the Tech for exclusive, hands-on knowledge from the experts. For more resources, visit Voices of Video.

Mark Donnigan:

Voices of Video. Voices of Video. Voices of Video.

Sean Varley:

Voices of Video.

Mark Donnigan:

Okay, well, good morning everybody. Welcome to this new, exciting edition of Voices of Video. Sean, I was speechless. I don't know what happened there. In my intro, hey, hey that's right. That's right. My coffee's still kicking in.

Sean Varley:

No in all.

Mark Donnigan:

yeah, that's right. In all seriousness, though, really happy that you're all here and listening for this super exciting edition of Voices of Video. I'm here with Sean Barley, who is Chief Evangelist at Ampere, so, Sean, welcome.

Sean Varley:

Thanks, Mark. It's really great to be here. Thanks for inviting me onto the Voices of Video podcast on to the Voices of Video podcast.

Mark Donnigan:

Yeah, that's right. You know, we're still trying to figure out what to call these things. I mean, it's a podcast, but yet we don't publish it as an RSS feed. Does that make it a podcast? It's a videocast. I think that's appropriate. Oh, that's great. Hey, well, you know, you and I have had a chance to talk on a couple other occasions. I believe we have done is it two or three webinars now?

Sean Varley:

together.

Mark Donnigan:

Yeah, exactly With NetInt and Ampere.

Mark Donnigan:

And so you know this is going to be a chance to sort of, you know, break away from the slides and just have a conversation. You know two guys talking shop, but I'm really excited about it because the arm in the data center and of course I won't spill the beans, so you're going to get to, you know, talk about all that and in great detail. But we are seeing it come on super strong in the video industry, in the ecosystem, video streaming platforms, anybody who's operating video platforms at scale. They are either seriously looking at switching from x86 to ARM. They're seriously looking at switching from x86 to ARM. They are, or they are in the planning stages, or they're on their way to doing it or they've done it.

Mark Donnigan:

It just feels like those are the options there and it's not to say there aren't workloads that won't stay on the other architecture, fine, just like there's workloads that stay on software encoders and don't go to ASICs. But the percentage is getting pretty small, it isn't shifting. So, with that rambling intro there, why don't you tell us first who you are and dive in and, for those who don't know Ampere, give an intro to what you guys are doing and why we're even talking today.

Sean Varley:

Sounds good. Sounds good, yeah, and thank you, because I think you hit kind of a trend obviously in the industry, but we'll talk about that, I'm sure, in great detail. So I am the chief evangelist of Ampere and I've been at Ampere for over five years. The company is just on the other side of six years old, so most of the span of our existence. And I came and did a long stint with Intel, right, did 25 years with Intel, and that's kind of common around these parts and Ampere does have quite a few Intel people, but we are really not Intel. You know. We have a lot of people here that you know have been gifted, frankly, with experience from being at Intel but also are very fully aware of how we wanted to be different essentially, and so that's kind of a nice atmosphere.

Sean Varley:

Ampere was really kind of founded to really begin addressing what we thought was a major issue in data center computing.

Sean Varley:

What we thought was a major issue in data center computing, which was the fact that we kind of had this sort of runaway power consumption situation right, where every generation, gen on gen, you were getting processors that were pushing the envelope of power consumption to heights that essentially're forcing and are forcing. Now we are firmly in that range forcing people to kind of rethink how they actually cool their data centers, how they actually cool their process, how they build their service. So we are really about sustainable computing. We wanted to provide high performance but low power, and, on the way, we wanted to make sure that we started to address. What we saw was the new echelon of applications Applications that are highly elastic, very kind of microservice-based. Cloud-native is the term, and that's what we ended up kind of labeling the products as well. Cloud-native processors, and so that kind of architecture that serves those kinds of applications was what we also wanted, because that was the future, is the future and even really kind of the current present of code and how software is being written.

Mark Donnigan:

Yeah, yeah, yeah, it's fascinating. You know, I think we all, and certainly you know, coming from your former employer, you live through that era of where processors were just getting faster. Right, you know number of cores on the die were increasing as well, but it was all about speed. You know how much fat, you know what's the clock rate of the new, you know they got faster and it's been super clear for you know anybody who you know studies it or works in the space that when you have more cores, you can do a whole lot more work. And so I think what we're excited about and what our customers are excited about and what the hyperscalers are waking up to actually that's not even a fair statement, because they were never asleep to this but when you have 96 cores is when you have 96 cores, when you have 128 cores, when you have what I think your your latest is up to uh, 190, was it 192 cores or something? Is that yeah, 192, I mean?

Mark Donnigan:

I mean the work that you can do is just I, I mean, it's breathtaking. And then with the VPU, so when you're offloading the heavy lifting of the video, encoding the transcoding, well, now you've got all this CPU just sitting there saying use me, it's like I'm here, do something with it. So it's an architecture for video anyway. That is exciting and, you know, really unlocks a lot of value.

Sean Varley:

That's exactly right. Yeah, you hit on a number of things. You know, going back a couple of decades, you know Moore's Law was around and we had really this era of Denard scaling is what they call it Denard scaling, what they call it dinard scaling, frequency scaling, right, and that's how you were getting this sort of 2x performance at half the half the price. And um, uh, you know really the uh geometries of, you know, silicon fabrication technology were were, you know, decreasing, um, and so this was allowing this, this, this, this Denard scaling, where frequency was able to go up and you were getting this massive performance gain. But the problem that really came out of that if we can get really techie for a minute is that the power consumption goes by the frequency square, and so when you start to actually get, actually I think it's voltage square, so that actually gives you times frequency.

Sean Varley:

So then, essentially, what you're getting eventually is this runaway power situation what you're getting eventually is this runaway power situation and you're starting to read of what you know thermally can be wicked off of the processor without going to the more exotic cooling technologies, right, and? And actually that phase started to to a tail off, but it's somewhere around 2012-ish. And then you know, now we're still decreasing geometries, but the frequency scaling has really kind of slowed down significantly, and the way that we're getting performance now is, as you stated, to actually build, build processors with more cores. Those cores can be more performant per core, or you can scale out, which is the way that we view the industry, which is, add more cores that are very power efficient. That is part of what we're kind of the evolution of of processing that we are pushing. And to maybe address the video question, you know how does this really kind of um benefit video? Well, we, we are building these processors, uh, with single threads, right, so the core thread. This means that everything runs to completion and its neighbor in the next core, over uh or across the die, is running to completion on its own thread as well. And they're all doing these things predictably right, because all of the resources are fairly balanced within the process, not not a lot of noisy neighbor effect that comes into disrupting the execution of any given process. Now you contrast that to the DX86 ecosystem.

Sean Varley:

I saw a question from the about how to? How does that compare? Contrast to amd and intel amd and intel, uh, employ um, multi-threading, right? Multi-threading is a really interesting feature for non-cloud native workloads that that don't need unpredictably, that don't need to have low tail latencies, that don't need to have low jitter in the case of video, because in threading, what you're doing is interrupting the processing of any given core and you're iterating a second process to go take advantage of maybe some weight state that got stuck into the code, right, and that could be a measure of efficiency or or a feature for efficiency.

Sean Varley:

But in practice, what that really ends up doing is is making everything herky-jerky right, especially for these cloud native workloads that are dependent on predictable execution. Yeah, and that's where this sort of difference comes in. Now, there's many other differences, but that is one fundamental difference. Yeah, and I think that's, you know, one of the things that is actually really good for video, because, as you know, mark, all of the video processing out there wants to do so with very, very low jitter, right? Yes, high quality, low jitter and, and so this type of processing is really highly uh to that type of um you know kind of application outcome.

Mark Donnigan:

So if I'm listening to what you just said, I'm going to interpret it back. Call it in layman's language, or simpler terms, and you know, please confirm, if this is correct, what you basically just said. So the benefit for me as a video engineer would be that today I have to design all of my software applications with Headroom, so I have to be very mindful of what machine I'm going to run this. You know this code on right in production and I'm going to say, hey, we really don't want to load that processor more than about 60%. Or, you know, sometimes we're even seeing where it's like 50 percent, um. But what you're saying is is that with arm and because the architecture of these massively parallel number of cores, you probably don't load the machine to 100, although maybe you'd say, well, you could.

Mark Donnigan:

but you load it up a whole lot more is that is that a layman's you know understanding of like a benefit there? Absolutely?

Sean Varley:

yeah, thank you for putting it in layman's terms. What, uh? What that means in our, in our parlance, when you have single threaded cores, is that at 100 utilization, that means you're using all the cores that we put in the processor, which is a good thing right, that's, that's where you're going to get the most efficiency right.

Sean Varley:

Yeah, um, and so that that kind of um loading paradigm is very different in the x86 industry and, as you said, the x86 industry has sort of trained people to not load their processors exactly, exactly, yeah, yeah, it's like best practice, right?

Mark Donnigan:

You know, oh, you know I don't want to load it because something bad could happen, which, frankly, as you described the way the architecture works, that there's absolute truth in that. You know this is. This is super fascinating. I was listening to you know I'm an avid podcast listener, so you know that's why I do things like this. I might as well create one too, so, but I was listening to a founder of a company that they basically developed a way to measure and then optimize, with the net result of reducing public cloud computing bills. You know is what it is, so you know they can take your, you know your GCP data, your Azure data, your AWS data, your you know whatever, and not only you know tell you where you might be running machines that you shouldn't be. You know you're wasting, but they also can control.

Mark Donnigan:

But what was this just jumped out at me was that the founder made the statement. He said you know, what we're finding actually is that now people have largely optimized their compute instance. You know footprint, so you know the old days. You know five, six, seven years ago, you'd hear the situations where people had machines that were just on running and doing nothing and no one knew about it. Those days are over. Devops has done a pretty darn good job of making sure that I only spin up a machine when I need it, and then I'd spin it down as quickly as I don't, and you know all that. But he said we still can optimize up to 50% because people are only utilizing 30 and 40% of of of an instance. They're only you. So it's this. Now.

Mark Donnigan:

He didn't go into detail as to all the wise. You know perhaps some. You know some. One reason is this, this buffer, and there's probably some other architectural things, but that just really jumped out at me Like Wow. So then how do you optimize? Like you know, you can't turn off the machine because it is being used. Almost like modernization from a processor perspective, clearly, energy consumption, power, net zero by 2030 initiatives all of these things across the industry must be good for your business and driving, you know, driving a whole lot of interest. But what else are you guys seeing? You know, as you're looking across, you know across the ecosystem of these massive computing architectures, you know what are some of the trends, what's happening, what are people, you know know, worried about thinking about even beyond just you know, the, the green and the energy savings, are there other things that?

Sean Varley:

yeah, definitely. So I want to come back to your, to one point you made there with the, with the kind of um optimization in the cloud, because this kind of comes back to the architecture that we were talking about. You know, multi-core architecture and the way that cloud is run. It's sort of on purpose that people are doing are only using 30 percent of an instance. Right, and that, and part of that's because of the way instances are provisioned. Right, they're provisioned in binary, binary chunks. Right, you can get a two core, four, four core, an eight core or a 16 core, 32 core and um, and that's sort of like very limiting, especially if what? If? Your workload, you know it really is optimal at 18 cores but you gotta, you gotta write a 32 core uh, that's true, that's right that worked, then that's right.

Sean Varley:

you're leaving a lot of you know, of capacity on the table, and this is very bad from an industry standpoint, from a sustainability perspective, a efficiency perspective, all of that. And so I think what you were saying is that you'll see more and more of this happening, where there are companies that are helping people to optimize, and then you will also see service providers starting to offer service enhancements or features be able to tune, for example oracle is actually pretty famous for this or it allows you to essentially dial up by one core, or one OCPU is what they call it. You know your actual instance, so you can have an 18 core instance if that's what your workload is most optimally needing.

Mark Donnigan:

Yeah, yeah.

Sean Varley:

Wow, yeah, there's cool features coming around this.

Mark Donnigan:

That's right, that's right.

Sean Varley:

But, to come back to your question. You know what else is happening. You know kind of um, you know beyond, uh, you know kind of the, the core architectural thing. What you see is this recognition and you hit on it very in the in the intro right. Um, that arm architecture is, is very, is very power, but it's also got a lot of qualities that are needed in many different types of workloads, and some of those qualities are this predictability of execution. In our case, we also have a lot of very pipelined cores. So what does that mean? Again, contrasting to x86. X86, it used to be.

Sean Varley:

If you bought an Intel processor, you wanted to get the biggest L3 cache you could get. That was a thing If you really wanted to optimize for performance. You got the biggest L3 cache. Well, these processors are very small L3 cache. Why? Because we put all the cache close to the core. We put it in L2, right, l2 is right next to the core and we get big caches and we come straight out of memory and we pipeline all the data, all the instructions, into those caches. Right, this is again an operation for execution and really ends up being a higher ipc gain instructions per cycle. Um, you know, for these types of work and and so that's. That's also contributing to that predictability value that we talked about earlier. But some work like databasing. This is a boon.

Sean Varley:

You know, our, our most recent products that just came out, the Ampere one processor, which is our own custom core. It's still our our uh instruction compatible, right, but the we actually um increase the l2 caches even more. Um, you know, from a ratio perspective and the ultra generation what we've seen is for things like databases. This is a huge performance boost. We're getting, you know, 60, 70% better results over the latest AMD processors in these workloads because of those types of features. Wow, so you know there's there's a lot of of um, there's a lot of innovation. Um, still to silicon, you know, um, I think a lot of people are kind of coming back now and saying, oh, you know, silicon's getting exciting again yeah, that's uh, we're seeing that provider.

Mark Donnigan:

Yeah, you know, net it net. It was founded um in in 2015 and um, you know, it's not that. It's not that there was never interest in silicon in video encoding, video transcoding, but certainly let's just say that you know 2016, 17 and in late 2018, when our very first chip shipped and you know we were actually in market. You know it's sort of like. You know in certain circles like, oh, you know, that's good luck guys. You know like that's cute what you're doing. You know good luck and it is just, I mean, wholesale shifted. You know, as a result of a lot of the trends we've already touched on in this discussion, of a lot of the trends we've already touched on in this discussion. You know just the need for higher densities. You know the cost aspect. You know the operational cost. You know energy savings and let's be candid generative AI and you know the focus now on. You know all these massive inferencing workloads which, by the way, we're going to talk about, because I know that you've got some cool features around AI, but we'll get there. But that's now shone, shined, shown a spotlight on silicon in a whole new way and so just across the ecosystem. Now it's no longer sort of like, hey, that's cute, it's like, hey, we really need to now have a serious conversation. You know, like you know, we've been talking for a couple of years let's get some of this stuff, you know, in a POC and let's start. You know, let's talk about how we're going to deploy it.

Mark Donnigan:

So I assume you guys are experiencing a lot of the same, because you also have this incumbency issue right with the other architecture which, frankly, software has been written on for 50 years now, or how long? I mean a long time, right, a long time. And so you've got this incumbency, just like we have the incumbency of software solutions running on commodity CPU. That's our, you know, that's what we have to compete against. So how is that? How is that switch going? You know, are people finding it easy? Medium effort, hard effort, are there any workloads that you find are easier or faster to transition to ARM than others? I'm sure that I know listeners have that question right now. They're listening, they're going yep, we're interested, we would love it. But what about all this software I've got that runs on x86?

Sean Varley:

Yeah, yeah, yeah, this is sort of the rub, right, I mean in, in, um, in any architectural transition. Uh, you have uh, this body of prior art, I like to call it right, yeah, it's a good way, yeah I like to call it that, because it is more than just the code right. It's also compatibilities between various packages right.

Sean Varley:

If you go look at the Java ecosystem as one microcosm, right of all, Java ecosystem, you know, grew leaps and bounds over the last, you know, 15, 20 years and became a very preferable way to write, to write programs, because it was, it was very portable, right, you get it across. So Java has always run, even from almost day one, on the ARM architecture. But the problem was is that in the old JD case, which is the, you know, the just in time compilertime compiler for actually executing Java written code is not optimized whatsoever for the ARM architecture. You can run some Java program, say Cassandra or Solar or Elastic, any of these types of open-source, Java-based, you know, very popular applications, and it would run poorly on an older JDK, right, and there are people scratching their heads going well, you know, ARM sucks.

Sean Varley:

But, the problem was that the JDK hadn't been tuned Right, and all of these things we've been talking about for the last half hour require tuning in lower level code Right.

Sean Varley:

And compilers and the way that the compilers are built have to be kind of built up, to be, um, you know, kind of built up. You know, luckily, like uh, we we've gotten to a point now where the compilation of uh arm instruction set code has gotten very, very good. You know, the gc llvm compilers of most modern versions are now very, very good at rendering this code um, but uh, microcode, I should say right. But the problem still is that there's even, you know, another layer of optimization above that, and so what you can think about it at is these layers of optimization. And as layers of optimization improve, the execution of the whole package gets better and better, and and so, to answer your question most directly, like what can people assume? It's hard to say right, if you start with something newer, you should have a pretty good experience. Yeah, right, but I say newer, I mean, has it been compiled or has it been released in the last year and a half? And we're kind of going to that level of granularity here.

Mark Donnigan:

If it was released four years ago, all bets are off right, it's not going to necessarily run the best on this architecture.

Sean Varley:

That goes for almost every element of the stack, whether you're talking about some low-level library, some math library that is used in compression or transcode or anything. If you want to get into the transcode industry, transcode algorithms and software like this incumbency, you were talking about fighting against basic perspective. Those transcode algorithms have been optimized with inline assembly right, and that's something tell enabled very early on um, and so what we've had to do is catch up to that right, yes, and still actually optimize with micro um assembly uh assembly um microcode for ARM architecture, and it took three to four years for us to get those kinds of optimizations into the very common Transcode library.

Sean Varley:

Yeah, you know, there's all of these layers of optimization that have to come and filter through for that top-level application to get its best environment to run.

Mark Donnigan:

Yeah, you know, I heard an interview that your founder, renee James, gave, I don't know, three or four months ago, I think this last spring, and something she said really stuck out. The interviewer was asking her about just the whole journey of starting Ampere and of course she came out of Intel obviously. Well, not obviously. She came out of Intel, so she knows how to build processors. And she said something to the effect of you know, hardware is hard and have the best you know hardware engineers, the best hardware team in the world that know how to build silicon, but 50 percent of our engineers are software engineers. You know, and and and.

Mark Donnigan:

The whole focus of her comment was was that you know we could build this absolutely amazing world best chip out there, but if we don't enable the application side of it, it's going to be hard right. And so clearly you guys have, even by her saying that 50% of your engineering team are focused on the software side, really shows your commitment to not just building this incredible processor and this architecture and innovating there, but making it actually possible to use it. You know, which is? You know, and, again, we know that struggle very well, because we find ourselves even now, you know, expanding more into the application side out of necessity know, need people want to use the vpu, but they're saying, guys, I, I need the full framework. You know that this can plug into, so bring me that and you know I'm in.

Sean Varley:

So yeah, yeah, compatibility and, uh, the way two things actually work together has been the bane of of technology is, from from day one, right, like when you wanted a wheel to a to an axle.

Mark Donnigan:

Yeah, right, oh man, this thing doesn't really work.

Sean Varley:

Um, you know people had to go and and and start working with it. You know they had to start like uh, you know, changing a few things around, so that things actually did fit together yeah, yeah same thing and um thing in modern technology and what you see is companies you know by necessity having to really work on software as a core competency.

Sean Varley:

Yeah, that's right A well-known fact was that even at Intel, the Intel employed a vast, large, large team of software engineers. Software engineers, yeah, and they were actually even another dynamic. You described the first dynamic very, very well. This was a necessity that you had to do to kind of make code run best on your architecture. But the other thing that happens with that is it ended up becoming a research and development boon and Intel was such a large you know kind of force in technology and computing for so long that they were able to invest into research and development projects that were able to advance the art of you know many, many open source projects, essentially underpinned by Intel for decades.

Sean Varley:

Right, yeah, now coming up now maybe to fast forward to now where you've got more architectures kind of coming in, um, to sort of you know start to take on some of this. The market share in computing right, it doesn't employ nearly as many software engineers as intel. Um has or had um, and it's partially. They're able to do that because of Intel. Right, because Intel did all of the analytics enabling ARM is kind of a fractious industry, frankly. Right, you've got ARM a licensing company primarily. We could get into kind of subtopics around that, but probably not relevant for our discussion. You know kind of subtopics around that, but probably not relevant for our discussion. Yeah, the licensees right are all you know, maybe focused on smaller markets right. So you might get this practice piece where Marvell went and did something or Broadcom went and did something.

Sean Varley:

Yeah, and here's trying to do something and they're all sort of like not core right.

Mark Donnigan:

Yeah, yeah.

Sean Varley:

I understand. So, that's sort of like this, this picture of the, of the, the state of moving code. That's right.

Mark Donnigan:

That's right, moving code. Well, you're, you're, you're, you're really going to like this update, because I, I don't think you know, because the news is like fresh breaking in the last, I don't know, maybe seven days or so, but, uh, we have a customer who, uh and I, I don't want to say it's our first customer who's moving their entire stack to arm, because maybe there's somebody out there that we're not even totally aware of, um, but it's the, it's the first that we're aware of who has moved their entire transcoding framework, all the open source components, everything that's needed in their application to stream video, is going to be running on ARM and Ampere.

Mark Donnigan:

And this is. You know, we're really excited about it because, you know, obviously our products work on both architectures. We have to play in both domains, right, but the advantages are so overwhelming on the ARM architecture because of all of the things that I referenced and we've talked about at a couple different junctures that you can do on the CPU when your video encoding is being offloaded to the VPU, so now all of a sudden you need to do at scale deinterlacing. So this particular service, one of their requirements is they operate basically a virtual MVPD service, so like a virtual paid TV linear service, just think of it that way and I think they have on the order of about 300 channels somewhere in that range. That was their benchmark, and so not only did they need very, very obviously high quality, this is broadcast content, it's licensed. There's a lot of sports. They operate in Latin America primarily, but I think they're serving their content in other regions, so you can imagine soccer or football is.

Mark Donnigan:

Americans to Americans, soccer to the rest of the world football to the rest of the world football. You know that quality is very important. But they have a lot of interlaced video coming in. Well, the VPUs do not support, our VPUs do not support de-interlacing. So what do you do about that? Well, you can do it on the CPU, but oh boy, you know, that's 300 champ, de-interlacing 300. I don't think they need all 300 de-interlaced, but you know, like that's a compute load.

Mark Donnigan:

And then what do you do if there's a need for an MPEG-2 decoder, which is often the case, because this is coming off satellite, so the contribution feed is off satellite and that's even why it's interlaced, you know, because that is the standard. It's's even why it's interlaced, you know, because that is the standard. It's old MPEG-2 and it's interlaced video. Well, now you need to run a software decoder. And then, you know, you start thinking about other things like what about closed captioning? You know closed captioning? Well, you know, we're going to get to talking about the very well-received demo that Ampere and NetEnt did at NAB, running Whisper OpenAI's model, and so this company looked at all of these technical requirements and on x86, it was sort of, dare I say a dream that you could do all this on the CPU, maybe theoretically possible if you were willing to buy a $50,000 machine, you know, or something.

Mark Donnigan:

But all of a sudden, now you know, with this massive core count, not only is it possible, it's incredibly cost effective, and so they're super excited about this, and so hopefully we'll be able to come back and do a case study with them, and maybe we can even that'd be a cool webinar. You know we should, we should do something. We'll bring them on. You guys can come on we'll. You know we'll, just, you know we'll, we'll do a case study about it, because I think the industry really you is looking for those signposts that says, hey, it's safe to switch. You know safe meaning like this can really work. You know it can work in production at scale. So, yeah, so let's talk since I just referenced about what you can do with AI, and I do want you to share with everybody your very interesting paradigm of inferencing on CPU instead of automatically going to GPU. But let's start with the Whisper demo that we ran at NAB, so I'll let you describe what we showed and talk a little bit about that.

Sean Varley:

Yeah, thank you. This is an interesting kind of dynamic where we put together the best of several different products. We have this very, very efficient transcoding that's coming out of the NetEnt VPUs that is delivering high quality, very low latency transcoding, and so that, offloaded from the main general purpose processing, is a boon because you now can take the headroom that you've cleared out of the main general purpose processor and use it for something else. And you described this customer that's been doing that with deinterlacing, you know, coming off the satellite and stuff like that. So this is one thing that you could do. But in the demo that we did at NAB, we said, well, like what if we do actually offer closed captioning for some number of the streams that are being transcoded? You know, live with this with this system. And so we put together this demo and we got, you know, in the neighborhood of 30 to 40 streams that can be then closed captioned real time by utilizing the optimized inference technology that we've created for Ampere CPUs.

Sean Varley:

And, if we can dive a little bit into that, we have a lot of vector units in our processors. There's two vector units per core, which is kind of different again from the x86 ecosystem. X86 ecosystem has AVX, right, avx 512 was the kind of standard for many years that an intel created. That's a much larger uh width. Wise bit, wise 55, 12 bits is what that refers to. Um vector unit. Well, ours are smaller, right, we have 128 bits, but we got two of them and per core. So that means what you do is you spread the load, like we were talking about earlier.

Sean Varley:

Um, you now have many more units that you spread the computational work across. In order to that, you need again software, right, you need. This allows you to kind of computationally do more with that architecture, and the same thing exists for AI. So if you're breaking up a graph, a model that has been built like Whisper has for voice to text, and you can spread that across all these, these vector units, and you can get very, very good performance for inference and um. And and this allowed us to kind of provide this, this demonstration that we did at neb, where we were. You know live um transcribing, uh, you know video even. You know that we were taking really live and boom, right, but also you know video even you know that we were taking really live in the room, right, but also you know you're being streamed off of satellite, in the case that we talked about earlier, or wherever, whatever the source is. Yeah, and this is this is sort of another example of the way to combine these technologies together to get the best outcome from a cost perspective.

Sean Varley:

Really right, because you kind of thought it earlier. At the end of the day, we can all kind of geek out on some of this technology, right, and we can get really excited about how things are done, but at the end of the day, business people who are counting the the pennies, are happier right, because they're all getting done for a lot less cost. When you consume power, it basically adds up to less cost. Yeah, upfront acquisition performance of these, these things, like you were saying 300 channels, 30 streams, live transcoded in one box, that's a, that's a lot of performance in a little package.

Mark Donnigan:

That's right, that's right.

Sean Varley:

We're of them, right. So there's a cap, you know, so that all adds up to money at the end of the day.

Mark Donnigan:

Yeah, yeah, yeah, absolutely Well, that's, yeah, it's, it's just, it's super exciting and you know we're really happy to be partnered with Ampere and be working with you guys and be bringing you know this transformational you know the word gets thrown around innovation, transformation, all these.

Mark Donnigan:

You know it's cool, right, everybody oh yeah yeah, right, everybody, oh, yeah, yeah, but, um, you know, I think we've all been in environments where the words are used but you're kind of like actually not so much, you know, it's actually not so transformational or it's not really so innovative. In this case, the words innovation and transformation are apropos. And then some, you know it, it we really are creating a whole new way to build video workflows, to build them more efficiently, to be able to do more work with less and at the same time, geek out on the tech. You know, it is exciting and it's cool. I've had the opportunity a few times to hear, you know, renee talk about just her genuine passion and excitement about this whole new way to architect the CPU. You know, and it always strikes me because I think, wow, here's someone who is I think she was president of Intel, right, you know, and really, you know, could have just retired and run off into the sunset, I guess, and really could have just retired and run off into the sunset, I guess. But yet it's so driven by seeing a better way, certainly a new way, a more efficient way, and that's how we operate at NetEnt. We wake up every day so excited to really bring breakthroughs to the market.

Mark Donnigan:

So let's end on this, sean. A couple of questions have come in, so we'll get to those, but let me give you an opportunity to just put a bow on this whole discussion and share your thoughts Totally open-ended. What do you wish for the videoended? What are you, you know? What do you wish for the video industry? What do you see? Trends, you know? Just leave us with, you know, a parting, a parting gift here.

Sean Varley:

Yeah, yeah, yeah, it's okay, thank you, yeah, because I think this is this is super interesting, right? I mean, I come from a background of networking, right, and in networking, essentially you're really just moving data from one place to another, right? Yeah, networking is very, very sensitive to latency. If you don't get something done on time, then it holds up the whole chain. And when you're processing video or even audio, because these are real-time events that are being recorded and then transmitted over wire or wherever, it's all real time, right, and and you're interacting with people, and so this is really interesting and super exciting, because that's exactly kind of the paradigm that I have spent the vast majority of my career doing.

Sean Varley:

And so one of the things that struck me about, uh, you know, starting to talk to engineers and customers that were doing video processing, was this the gyrations, the gymnastics it's a good way to put it To get the computing platforms that they had been using for years to to actually behave themselves for the workloads. And and this comes back to you know, I well, I said well, you know, say more. You know, I'm, I'm interested like how does, how does this work? You know, I said oh well, you know, there's entire companies that have essentially built their life, you know, curating images and containers and things like this for video and audio processing, that essentially prep the platform for use. And what do they do? They turn off hyper-threading first, right away they turn off all of the power management features. Why? Because it's going to, you know, a lack of predictive.

Sean Varley:

Uh, you know these delays, that's right yeah, that's right and they turn off turbo, right, um, because turbo is going to have like one core operating at certain frequency and another core operating at a lower frequency and those two things are going to operate in, you know, miss, synchronized right and um, and you know many other uh features that we could talk about. But and then I kind of like the light bulb for me went on. I'm like, well, you don't have to do any of that, that's exactly how our processors build right, exactly yeah yeah, yeah, exactly you get it.

Sean Varley:

You get it for free with us, yeah exactly, you don't have to do those gymnastics anymore, right, and, and, uh, and, and it gets even better, right? Uh, we were talking to ideas.

Mark Donnigan:

Uh, the norsk um, yeah, yeah, that's right, yeah, you're kind of very popular in video processing, yeah, norsk is is written on erlang another kind of very popular in video processing.

Sean Varley:

Norsk is written on Erlang Another kind of geek out thing.

Mark Donnigan:

Erlang is a language that there's like 10 people in the world that know it. Adrian Rowe is one of them.

Sean Varley:

Yeah, exactly.

Mark Donnigan:

Shout out to Adrian there. Shout out to Adrian exactly. Shout out to Adrian there.

Sean Varley:

Shout out to Adrian exactly, but it's a language that is built for parallel processing and it's built for things that run side by side. That is the video and audio workload in spades, one of the reasons probably why they chose it for their application. But you know, this is so. This kind of foundation for this industry is is just fundamental to you know, being able to kind of advance the state of the art, and I'm excited. This is what kind of gets me up every day. I'm excited because you're going to be able to do this while you actually do a service, a solid for the planet. Yeah, that's right. That's right the more we can efficiently do these things. You know greening, streaming and what the organization is doing about. You know kind of lowering the um, the power consumption per stream. Um, you know, our technology is is basically built to do that, right, yeah, and you're doing a solve for the planet by doing this and um and eliminating all of this wasteful uh kind of computational uh gymnastics.

Mark Donnigan:

Yeah, yeah, a hundred percent. That's that's awesome. Uh, love it. Wasteful kind of computational gymnastics. Yeah, yeah, a hundred percent. That's that's awesome. Love it, okay. Well, sean, you know we're, we're, we're a little bit over. Even we could have kept going, and you know, this has been awesome.

Mark Donnigan:

I know that a little bit of a plug, it's a, it's a it's. It's a couple of months out still, but IBC is coming up. It'll be here before you know it. And so I know that Ampere is sponsoring a panel session with NetEnt, so we're participating. Also, speaking of Norsk, norsk is participating and GNL, a really fabulous systems integrator and doing a lot of very large projects across Europe. They're going to be hosting. Alex is going to be moderating the whole thing, so it's going to be good.

Mark Donnigan:

I think Supermicro is going to put in a little appearance. I don't know if they're on the panel, but I you know they'll be. Yeah, I know that their name will come up a few times too, so, yeah, so let's transition real quick here to a couple of the questions, and we've actually hit on most of these, so just for time's sake, I won't kind of go over them again, but I'll throw this out. We sort of answered this one, but the question and this is very broad, but how is the future streaming going to be simplified? So I'm going to throw that to you, because I think you have some thoughts on that, which you've even talked about in your last conclusion there.

Sean Varley:

Yeah, actually I see it as a very evolutionary thing. It's simplified by many of those things I talked about Getting a platform that you don't have to micromanage and that is sort of built to do this parallel processing, because a lot of things are happening whether you're doing tracks or you're doing audio tracks plus video tracks. You know this is the game in video processing, right? So it's inherently parallel processing and so that type of workload fits on top of this architecture very, very well. And as the software libraries and things like that come up, and maybe video, the video industry is slightly behind some other industries in this regard, but I don't think very far behind, you know there might be. You know where you're doing a lot of this work through already optimized binaries, right. So the applications have been compiled, the optimization has been done for you. All of this is going to make video processing that much simpler, that much deeper at the end of the day, to actually render to a customer.

Mark Donnigan:

Yeah, that's awesome, I totally agree. So, yeah, I would have answered in a similar way what is the average power consumption? By the way, this person actually wants to know per one gigabit per second of streaming, and I know that you have a lot of data, so I'll let you answer from the CPU perspective and I can give some numbers on the VPU.

Sean Varley:

Yeah, good question. I think I'll start from a very foundational place. Our cores consume about one watt per core and um, you know, full tilt, right? That means all things on, you're gonna, you're gonna, um, probably consume in the neighborhood of about a watt for that core, so you kind of add them. How many cores does it take to process your video stream?

Sean Varley:

yeah, um and most of them is going to take probably one core or maybe some small number that only you can count on one hand, uh, number of cores, probably, depending on how you're doing other things like your transcode are you doing in software, is your, is your library, uh, and your algorithm that you're using? Um, you know newer, more complex, you know all these things. 81 takes more processing than yeah, and h264.

Mark Donnigan:

Yeah, exactly, yeah, yeah, yeah, absolutely so, yeah, um, I mean, that's incredibly impressive. Though if you're saying that you know sort of on average, the benchmark could be one watt per core. Wow, I mean, you know that's pretty impressive. So now for us on the VPU side. One benchmark that I think is easy to understand is we have a Quadra video server, which is a 1RU server. We have an Ampere edition, so Ampere Ultramax 96 core, and that with 10 of our VPUs.

Mark Donnigan:

So it happens to be the T1Us, but it's the little U.2 form factor that draws about 450 watts, roughly somewhere in that range, give or take, but 450 watts. Now what's super fascinating is that that configuration again in this little 1RU it's not even really a full-size pizza box, but it's kind of the pizza box, as people affectionately call them that can do 320, 1080p30, live encodes or transcodes some format VP9, h.264, or HEVC. It would decode it and then transcode into any combination of AV1, hevc or H.264. So that's the benchmark and that whole process, including the host CPU, will draw 450 watts. So slightly over one watt per stream. Now if you just look at our just VPU, the T1U draws 27 Watts and it will do 32 simultaneous. So it's. It's less than one watt, but you know, I prefer to kind of look at the loaded system because, well, you have to take that into account, you know the server's there it's crying power.

Sean Varley:

Yeah exactly.

Mark Donnigan:

So that gives some benchmarks just on there, okay. And then final question here at least that we haven't already touched on is this service available as a cloud service? So, is this service? Maybe they're wondering if there's a video platform. I'll also give an answer. But first, sean, are you guys, do you have public cloud instances? You know, can I go out and use ARM in my favorite public cloud? Use ARM, sorry, use Ampere in my public cloud.

Sean Varley:

Yeah, the answer is in abundance. Okay, many, many clouds, uh, worldwide. Over 25 clouds actually. Cloud providers worldwide, uh, at this point in time, um, the largest ones in the united states obviously are oracle, uh, google and um azure, um, united states-based companies, I should say. And worldwide, we have many, many other partners. Right, you've got large providers in Europe like Scaleway, hetzner, ionos, and then more regional ones are also kind of really picking up on this. But, yes, you can get access to our instances worldwide and if you go up on our website, we've got a nice summary of where to get it.

Mark Donnigan:

Okay, great, yeah, yeah, that's awesome. And then to answer from the NetEnt VPU perspective so sadly today we don't have any instances in the public clouds. However, two options to be able to utilize our hardware if you don't just buy it and run it yourself. One is that we are building a platform and that work is in very active development. So we have a charter customer already beginning to test a major brand that a lot of people would know Sports content, very high quality. So we're really excited about that.

Mark Donnigan:

The second is that, in fact, at IBC there's going to be a very significant announcement one of just many to come but this is going to really kick things off of a platform that pretty much everyone in the industry utilizes in one way, shape form or another to distribute content, is going to be integrating and, in fact, has already begun integrating NetEnt VPUs into the network. So that's going to then be another option for those that maybe today are running everything virtually running in the cloud and love what we're talking about here but say but I got a big problem, I don't have anywhere to put these things, so how do I do this? So solutions are coming and we think it's going to be again use the word transformational. It is going to be very transformational as to how workflows can get built, so really excited. Well, sean, again, thank you so much for joining Voices of Video. We will have you back. We're going to be doing a whole lot more of these things and you know we'll do some case studies. Maybe the next one will be a case study discussion. I think that'd be great.

Sean Varley:

Beautiful. Yeah, I love it. You know, then we can really kind of tear open the box and get in there and look at, look at more of what's happening on the inside. But, mark, thank you for having me. It's been an absolute pleasure and looking forward to doing it again.

Mark Donnigan:

Awesome, well, good, well, thank you, and thank you to all the listeners. We really, without you, we wouldn't have a show. So you know we appreciate you coming back week after week. All right Till next time. Have a great day.

Sean Varley:

Thank you this episode of Voices of Video is brought to you by NetInt Technologies. If you are looking for cutting edge video encoding solutions, check out NetInt's products at netintcom.

People on this episode