NVIDIA Corporation (NVDA) Management on Bank of America 2019 Global Technology Conference (Transcript)

About: NVIDIA Corporation (NVDA)
by: SA Transcripts
Subscribers Only
Earning Call Audio

NVIDIA Corporation (NASDAQ:NVDA) Bank of America 2019 Global Technology Conference Call June 5, 2019 12:30 PM ET

Company Participants

Ian Buck - VP, Accelerated Computing Business Unit

Conference Call Participants

Vivek Arya - Bank of America Merrill Lynch

Vivek Arya

Good morning. Welcome to this session. I'm Vivek Arya. I cover Semiconductors and Semi-Cap Equipment at BFA Merrill Lynch. And I'm absolutely delighted and honored and cannot express the treat to have Ian Buck, VP of Accelerated Computing, and for those keeping track, the inventor of CUDA with us. So, really appreciate he's being with us. He is joined by Simona Jankowski, who we all know and love, Vice President of Investor Relations from NVIDIA. So, real pleasure to have you in and Simona at our conference and look forward to the session.

Ian Buck


Question-and-Answer Session

Q - Vivek Arya

So, maybe Ian, as a start, just to kind of level set everyone, what is accelerated computing? What's happening in semiconductor technology or the kind of workloads in data center enterprise to drive the need for accelerated computing?

Ian Buck

Yes. So, it's a good question. About 15 years ago, myself and many other researchers around the community noticed this interesting thing happening with these new kinds of processors called GPUs. In the computer gaming world, graphics is becoming more and more programmable, and you can do -- for more and more realism and we started looking at maybe these GPUs can be used for something other than playing video games and rendering.

What was interesting is that by taking those calculations and focusing on just the computational parts of high-performance computing codes, simulation codes, data analytics, and math, actually the GPUs were improving in performance at a rate that was ahead of Moore’s Law because they were focusing not on how fast they can run a single thread of execution, but how fast that can run a massively parallel computation, the kinds of population that you see in oil and gas discovery for size and processing, for weather simulation, and now today for simulating some of the world's largest neural networks.

It's accelerated computing, meaning it's paired with a really fast CPU. You need both, a CPU and a GPU. A CPU handles the single thread execution, the operating system, orchestration. The GPU handles the heavy lifting.

By focusing on those areas, we can build architectures that are not designed to provide the fastest single thread of latency, but the optimal throughput for focal point calculations, mixed precision calculations, AI calculations.

So, we focus our target on certain markets. We build virtually fast efficient accelerators for them. And the other thing that was interesting is that this also came from the graphics world was it's not just about the silicon, not just about the processor and how many flops it had. To make this new programming model work, we had to build a software stack on top of it, and you had to help the world's developers take advantage of this new kind of programming model.

So, we started CUDA. I joined in 2004. We launched it in 2006. We've been at it for 15 years -- I just had my 15th anniversary, at building up that software ecosystem. It started with HPC and the markets I just talked about: supercomputing, weather simulation, oil and gas, data processing, and has evolved, of course, in the last three -- five years to include AI, which has helped -- taken it on to a whole new level.

As a result, the workload in data centers is changing. It used to be you looked at a hyperscale data center, and it was a big Hadoop cluster. I/O was king. The compute was very thin. If you looked at those servers, there was the cheapest, simplest CPUs, a decent amount of memory and a ton of flash storage or even disk with lots of I/O. It was an I/O-driven-centric data center.

Fast-forward to today, and what you have is a highly compute center -- compute-centric data center. You have the -- they're becoming data center-wide compute engines for doing AI for predicting, understanding their data and even as well learning from HPC and supercomputing world even how to build compute-rich, compute-capable, at-scale data centers for AI and HPC and data science.

As a result, today, if you look at the servers that are being deployed in hyperscale and in enterprises now, they're compute-rich. They often have four, eight or 16 GPUs per server because the more compute that you put inside the box, the more performance that you get for your dollar. And they're starting to really dial up the interconnect. You're seeing much more growth in InfiniBand technologies, doing -- figuring out the -- not just the hardware, but the software stacks at scale. We've been very active in distributed computing, distributed training for training these neural networks.

Also, as you deploy these new neural networks, you're seeing more compute happening at the edge, putting the point of presence with some GPUs, some acceleration there to run these neural networks. You can't just rely on the CPU to provide that real-time latency.

So, we're seeing a sort of a scale-up happen inside the data centers themselves, inside of hyperscale. And we're starting to see that acceleration technology pushing it all the way to the edge with edge servers with smaller form factor, single-GPU or two to four-GPU configurations.

Vivek Arya

Got it. So, in the [Indiscernible] acceleration that you're all -- you also mentioned that it's becoming a lot more pervasive in the supercomputing side, in the enterprise side, in the cloud data centers. Is there a difference in what you need to optimize in these different situations? Because from the outside looking in, it seems like, oh, it's all the same AI or the same acceleration. But do you think the optimization differs whether you're talking about supercomputing or enterprise or cloud data centers?

Ian Buck

It's a function of scale. I think the super supercomputers build one -- they want one application across literally tens of thousands of GPUs. And it's literally one program. In hyperscale, you're at that level, right? You're running -- you're hosting a website, you're hosting a service, you often have -- run one client at a time. So, for the edge kind of calculations, they're not -- they don't have the levels of interconnect technology for those use cases.

What we're seeing is -- now is that they are starting to expand from single-service solutions to more pod-like architectures where you have, especially for AI, the training. You don't train on a -- you may train on a single node for a while, but to really get to the -- to production levels of accuracy and scale, you train across many nodes.

People are training -- we get reports of 1,000 GPUs to train a single neural network or -- and typically, it's in the hundreds. And that's -- we can't obviously do it on a single node. You're doing that across multiple nodes with either InfiniBand or RoCE, which is high-speed Ethernet kind of interconnect.

So, that is starting to -- and you're starting to see that. Amazon and their P3dn instance, which they launched late last year is exactly for that market. They dialed up their interconnect at 100 gigabit using their network and technology. They're offering racks and collections of servers now. Instead of just renting a single server, you rent a pod or a constellation to do some of these workloads.

So, it's a matter of scale and timing. It all started in HPC. It started with supercomputing. Now, you go to these conferences and all the HPC guys are at the AI conferences, all the AI guys are at the HPC conferences because there's a natural cross-pollination.

Vivek Arya

Got it. The other thing that we have seen is the number of silicon options that are coming up. I think there are close to 40-plus start-ups working on all different versions of silicon, all attempting to overtake NVIDIA's lead in the market. We have FPGA competitors. So, let me ask the question this way. If we were starting a company today to work on AI, would the product be a GPU, an ASIC, an FPGA, some other kind of accelerator? And why?

Ian Buck

Yes, I think it's really hard to -- as someone who has lived and experienced this for 15 years, I had a lot more hair. So, yes, to build up a platform that can do accelerated computing, to be a platform that is programmable, remember, we're just in the beginning of the AI revolution. The neural networks were -- we talked about two years ago, are no longer relevant. They're being replaced by newer, more complicated, richer, deeper, larger neural networks that wouldn't have even been thought of.

The BERT, our model being talked about for natural language processing, is over 350 million parameters. It's thousands of layers. It has all new operations in it that no one even thought of a few years ago. That required -- that was only possible through building a -- not by NVIDIA. It was built by the people and developers and platform -- people using our platform.

So, building platforms is obviously a challenging task. We have 1.2 million developers on our platform, over 600 HPC -- major HPC applications, all of the AI frameworks. And I think we're approaching 1 billion CUDA-capable GPUs shipped. So, I think that is where we play.

And we -- and the other thing is we're also obviously coming -- moving very fast. We're on our fifth generation of AI hardware. If you go back from -- the original work was done on Kepler, which was designed for HPC. But then we did -- we have done Maxwell, Pascal, Volta, and Turing. And we continue down this path.

So, I think if you're a new place, a new start-up, you have to find your niche; your area where you think you can perhaps specialize or do something different and that's up for them to answer how they're going to play in that space.

In the data center space, especially since we're talking about data center, utility is really important. I can't stress that enough. Hyperscale, their economics work because they deploy at scale. They typically have one engineer, one IT engineer for every 100,000 servers. So, they're not in the business of standing up lots of bespoke or different things. So, we -- one of the values that we bring to the data center is that we have utility through a wide variety of use cases.

We can run every neural network out there. Obviously, they've trained on us. They can deploy on us as well. We are -- it's still a GPU, so it can be used for BDI or graphics capabilities as well, which they like. It can be used for cloud gaming. It can also be used for data science, for things like XGBoost and some of those big data analytics applications as well. So, the -- and of course, HPC and simulations still runs CUDA.

So, that utility keeps -- makes that investment logical because they -- for public cloud, they can service all those markets with a single GPU or a set of GPUs at different price points or capabilities and still capitalize on all of it.

Vivek Arya

Got it. So, it's essentially, instead of being stuck with a niche solution that can do one thing extremely well, but once they go beyond that, utilization drops off, to something that is a lot more multipurpose and can extend across workloads.

Ian Buck

And by the way, I'm not saying that NVIDIA necessarily is going to go after every single market in AI. We believe that every device should be leveraged and use AI in the future. It has proven to be an amazing way of writing new software for deploying and understanding data. In the IoT world, our doorbells, our kitchen countertops, our shoelaces, I don't know, are going to have some kind of processing from the data that's coming into them. And they're going to need something at the -- that all the way IoT Edge.

In that space, economics are obviously different. Cost is a huge factor. It's good -- so we've open-sourced our -- some of our DLA or AI technologies for those industries to take advantage of it. So if they want to put a small ASIC that can recognize the word Alexa or whatever into that hockey puck, that's great. I think it's going to drive more utilization inside of data centers, whether it be training those neural networks or even doing real-time inferencing.

Vivek Arya

Got it. So, I remember the time when NVIDIA was an upstart, right, challenging the -- Intel's domination in CPUs. And at that time, people did not realize, right, the broad range of applications that GPU could be used for. So, what are you doing to make sure that there is no other upstart that kind of puts you in the background that you always stay ahead in technology?

Ian Buck

We're constantly investing in our platform and in our community. I mean I think what's important to note is we innovate extremely rapidly. We've probably new architectures every year to 18 months now, which iterate on -- we're on our second generation of Tensor Core for AI explicitly.

So, we build great GPUs and continue to learn from the market and as the AI and HPC use cases move and pivot and new workloads become more important. Because we're working with the community, we're intimate with all of the applications obviously, and the users, the engineering engagement is extremely high. We can project to make sure we're building the right architectures in the future.

The second thing we do is we take -- we invest in the software stack. I think it's not just CUDA as a programming language. Actually, the language itself is relatively conceptually very simple. It's a program. It's all the libraries and all the different domain-specific libraries that we've got on top of it.

Whether it be linear algebra libraries, simple processing, AI libraries, sparse algebra, data science, I think we have over 20 different CUDA libraries, if you will, on top of that. And that's really what a lot of our ecosystem uses. And then on top of that, you have the 600 and some odd HPC applications, all the AI frameworks that's improving performance on top of that.

So, what you see is this compounding of performance that gets us that 10x every generation because that 10x not comes from just the faster chip. It comes from all the investment we're making in the algorithms and software for doing things faster and the work that our ecosystem is doing.

Even on the same chip, in one year's time, I think in 2018 -- going from 2018 to 2019 on Volta, we improved HPC by almost 2x. And that -- it's the same hardware. It's just new versions of our CUDA stack, new versions of our libraries, new compilers and all the work by -- being done by the community. So, that platform -- and keeping that platform vibrant and engaged, our users love it, our developers love it because they get this sort of free performance, if you will, from that software stack and that investment and evolves over time.

So, that is -- it's exhausting for sure. But it's kind of what we -- it's in the culture of NVIDIA and what we do. We're a highly software-centric, developer-focused company. It came from our -- perhaps our original roots in the graphics world, game developers. We love developers. We have our GTC conference, which is a developer conference. We're now taking it around the world. And as a result, it's -- that provides that compounded growth even between architectures.

Vivek Arya

Got it. So, I'm glad you brought up CUDA. And I think that's been seen as a very differentiating feature for NVIDIA and part of that maintaining and sustaining that differentiation. Maybe for the audience, just explain what is CUDA?

And when we hear from a lot of the other accelerator options that -- look, all the cloud data centers are really optimizing at the framework level or all the complexities going to the framework level, will there even be a role for CUDA, right, beyond just NVIDIA's hardware overall?

Ian Buck

That's a great question. So, people, when they say CUDA, they tend to see or mean different things. What it started was, was a way of programming our GPU in a massively parallel way. So, you could take anyone who understood C, C++ or FORTRAN, and about an hour, I could sit them down and show them how to use C and C++, FORTRAN on a GPU in a way that was straightforward a program. You just need to understand what the concept of a thread was and when you call the function on the GPU; it just ran not once but millions of times.

Conceptually very simple, and then they would take their -- the inner loops, the compute portion of the code. And instead of looping over the data on the CPU, they would just issue it and call that function and parallel over all the data. Conceptually very simple.

But what it's evolved to, what CUDA is today, and we often -- sometimes call it CUDA-X is an entire platform of all those, not just a way to program a GPU but all the libraries and all the different ways to get access to it.

Specifically the domain-specific library, I think we're on our eighth version of our AI library: cuDNN; BLAS; FFT; cuSPARSE for sparse algebra; RAND random number generation, which is used in the finance community; all sorts of DeepStream, which is used -- which combines our video capabilities with our inference to be able to stream. I think it's up to 60 HD 1080p frames and do -- run a full neural network on every pixel of every frame for IVA kind of applications. So, it's become much bigger than just another program. It's that entire platform.

Now, it extends and as well -- we become a critical partner for many of those applications. You mentioned frameworks. So I have a huge team of AI framework engineers. We work hand in hand with Google and TensorFlow, with Facebook on PyTorch, with Amazon on MXNet, with Baidu with PaddlePaddle, with Microsoft on ONYX because while they focus on that AI interface for their specific frameworks, we're focusing on the performance. So we focus on -- and they love it, right?

So, we can -- instead of having them write the optimized kernels that do that math, the right convolution or that RL -- LSTM layer type, we have abstracted all that for them. So, we can provide that sort of performance and computational base while they focus on the programmability especially as AI is moving for the new capabilities, new layer of support, all of the things that they need to do.

We've also partnered with them on distributed training at scale. So, we have done multi-node libraries, communication collective libraries. When you do a neural network, when you're training a single neuron to work across many new servers, they frequently have to communicate and share data. So, you're basically training in parallel, and they're keeping in sync all along the way, so they learn in concert in one neural network. That requires tight integration with the network stack by doing add/reduce and gather/scatters across that. So, we provide all those libraries as well as the framework teams, so they can envisage it.

So, it's not just -- and as a result, they -- it's also a partnership. They trust us. They trust our developers. I can check things into their code. They -- and we've established that. And also, of course, customers as well, they're motivated.

Vivek Arya

Good. Now, specifically for your data center business, has been an extremely important growth driver for the company. I think over the last -- it's close to a $3 billion business, grown 70% compounded the last five years, nearly a quarter of the company now.

So, we hear about the high-performance computing or the supercomputing part of it. There is the cloud part of it. There is the enterprise and other parts of it. And I assume that they are at different places in their adoption of AI and hence at different kind of growth rates. Could you help us understand what is the growth rate potential of this kind of a franchise?

Ian Buck

We're definitely still--

Vivek Arya

Just a number would be just sufficient.

Ian Buck

2023, I think we're seeing total TAM of about $50 billion. It's split between HPC, which may be about $10 billion. I would say cloud and cloud AI is probably another $20 billion of the $50 billion. And then enterprise is another $20 billion. So, roughly 10-20-20, with a total of $50 billion for 2023.

Vivek Arya

All right. So, over these next three or four years, which of these do you think will grow the fastest?

Ian Buck

I'm excited about the enterprise one. We're seeing obviously AI -- I mean they're all very exciting. HPC, we're well entrenched. They see the value. While AI was originally not understood, they now see the value of it. In fact, there's a major supercomputing award every year called the Gordon Bell Award.

And this year, it went to a AI -- a neural network that was developed to predict atmospheric rivers. This is the weather patterns that is bringing all this rain to California and the strange weather. It's actually one of the world's largest neural networks. It consumed over an exaflop of computation on the Summit supercomputer, which is the number one supercomputer in the world, and one in HPC or the supercomputing -- the major award this last year.

So, I'm -- as a result, I'm excited. They're now all talking about making sure that every supercomputer they build is an AI supercomputer, which is obviously very exciting. And it is burning investment. People want to build. And even at the national level, whether it's your AI strategy, your national AI supercomputing strategy.

The cloud side will continue to grow as they adopt AI. The new thing that's exciting in cloud, I think, is that -- is the models are getting bigger quickly. 300 -- I think OpenAI published a report that said AI -- to train a neural network and the largest neural networks that are of value has grown 300,000x in 10 -- in five years, going back to the original ImageNet to today's BERT or GP2.

The thing that's driving that is conversational AI or speech AI. When you -- that's the -- I'm going to talk to my phone. It has to understand what I'm saying. It's called ASR, automatic speech recognition. It then has to untick those words and come up with an answer. That's natural language processing. And then once it comes up with an answer, it has to speak back to me in a human-sounding natural voice. One of the reasons why that's gotten so good lately is because all three stages of those pipelines has started to move to AI. The -- those are the neural networks that are some of the world's largest ones.

And that is -- we're seeing that drive a lot of engineering activity, a lot of investment in building up not just bigger training clusters, pods at scale, but also, it's required for acceleration for inference. So, that inference use cases -- neural networks, you just cannot run them on the CPUs. Vast majority of the inference market today is running these lightweight recommender things, the speech that has to run, and it accelerated because it's a real-time use case.

So, I think that one is probably the next wave of investment and growth -- driving some of the growth that we're seeing on the inference side for sure. The enterprise is still new, albeit they're now learning how to use AI. We're starting to see those success cases come out, things like predictive maintenance, retail analytics. They're -- it's a combination of traditional deep machine learning, think algorithms like XGBoost, which are now ported over to the GPU, and figuring out how to build those AI technologies for those kinds of use cases. I would say that's the smallest among all the markets today, but the one that obviously could see rapid growth as businesses figure out how to adopt and use AI.

Vivek Arya

Got it. Don't you think businesses will just go to the cloud for these kind of workloads? Like what is the need for building this on-prem?

Ian Buck

We're seeing both. So it's a choice of economics. Certainly, if you can have the capacity, the engineering ability, IT ability to build dedicated clusters, they can make the economics work in that use case. And then we can -- we often see them do that on capacity they can confirm is going to be 24/7 utilized and then burst into the cloud.

So, right today, you see this hybrid strategy, a very active hybrid strategy. And the cloud guys, too, acknowledge it, used with Azure stack or Amazon's on-site PRADA offering and with Google, with Atos and what they're doing there.

Businesses have an IT infrastructure that they want to use, they want to control, in some cases, that's paramount to them. We just saw Google have an outage for example, but they can fill up the capacity to burst in the cloud. So, having that is critical as we grow, as cloud guys want to grow, but also, as businesses want to serve their businesses, they need some level of on-prem capability.

I think -- and the consumer Internet companies, the Snapchats of the world, the Pinterests, obviously, those -- a lot of those are cloud-based first and foremost. So a lot of the new, exciting stuff is obviously -- the start-up community obviously starts in the cloud. Why not? It costs $3 an hour to rent Volta GPU, less than $1 to rent a T4 inference GPU. It's a logical place for them to start and scale up. And eventually, they'll turn into the multibillion-dollar conglomerates, and they can figure out if they want to have it on-prem or move it to the cloud.

Vivek Arya

Got it. So, I mentioned data center has had very impressive growth for NVIDIA over the last five years. Recently, we have seen a slowdown. A lot of it perhaps it's because of just cloud CapEx slowdown and a lot of macro issues and so forth. But sometimes, Ian, from the outside looking in, the perception is, look, NVIDIA has just training and inference. Training is very mature, and inference is going to be very competitive, okay?

So, could you just help address -- we talked about inference, but first, the training part. Is there growth still left in training? Aren't the frameworks already all optimized? Why do we need to train them further?

Ian Buck

Yes, fair question. I think one of the benefits of the size of our business now is that we're -- and perhaps exciting from a revenue standpoint, we are exposed to the broader CapEx influences by the hyperscalers. And we did see -- as a result, saw that pause and they paused, which means their AI teams have to continue to use the infrastructure that they have. Demand doesn't slow down, so -- they get angrier. These are some of the most important people inside of those companies. And so they're going to naturally want more of their AI infrastructure back.

And as you can see, there's the job market and the market for these data scientists, people developing AI. The universities are pumping out new talent. They're being gobbled up by the Facebooks and Amazons and Googles of the world. They're going to need -- they need the infrastructure or continue to develop those neural networks for those services. So, I -- we fully expect that to just be cyclical and come back in the latter half of the year.

The -- and with respect to training as mature, I would say that training is as -- they're -- we're at the beginning of the AI -- still feeling at the beginning of AI use cases. It started in image processing, which actually is relatively a small portion of the use case today. If you look inside of what the market's really doing, it's all about natural language processing, understanding content, either posts or Web content. It's about speech. So, ASR, NLP, TTS, neural networks like Jasper for ASR, BERT for natural language processing, Wavenet and Tacotron for TTS.

Actually, if I -- to create a human-sounding voice, if you look at what Microsoft has done with Cortana, they actually have a human or nonhuman testing. And it's where we've tested around the company. It's 50/50, you can tell. Yes. So, and you need that neural network. And actually, you can create all the right breath notes, all the right reflections to just make it sound human. It's quite amazing.

BERT now is a language model which basically can take -- you can pass a question in, text in, give it a body of text like a Wikipedia page, and it produces the answer out. There was an ImageNet kind of moment -- milestone past last year. BERT was the first time they ever created this question-and-answer test. They tested humans. Stanford tested humans and tested BERT, and it actually was more accurate than humans for the first time.

So, we saw -- the last time we saw that was the original ImageNet that was recognizing what's inside this picture where ResNet-50 and others surpassed human recognition and created the explosion of AI. We just hit that in natural language processing. It took that long because these neural networks are huge, 350 million parameters. It literally takes weeks to train even on our biggest DGX-2 servers with 16 of our fastest GPUs.

So, the computational intensity is only going up for these new neural networks. And what's driving it is the new AI services, the new capabilities, being able to have that conversational AI to do speech, which everyone sees as the future for search, for how we interact with our devices, whether it be our phones, the hockey pucks on our countertops or in our car. You're starting to see computer -- you will talk to the computers. We'll talk to the cloud through our voice. The way you and I are talking, it's how we're going to talk to it in the future, too.

So, that workload, those use cases are driving huge amounts of computational complexity and capacity necessary to make that work. So, I think while the training -- the neural -- the frameworks are all well optimized for GPUs, we're continuing to expand them to do more multi-node, pod-like training. And the neural networks are not getting any smaller. They're just getting bigger. The applications and use cases are getting more amazing.

And as a result, the computational workload to develop them, to train them is going up exponentially. And the requirements to run them, i.e., need -- requiring acceleration to provide that real-time latency is there.

To do BERT, even just the BERT pass of that speech pipeline might take up to a second. And you add the ASR and the TTS, you're going to have -- you're trying to run that on the CPU, it's like seconds; you ask it a question and wait. And if you've ever been on a phone call around the world and you get the satellite delay, it's awful. No one's going to use that. So, that is driving the accelerated use cases, where we can do that BERT pass, and literally, I think it's less than 20 milliseconds.

Vivek Arya

And then on the inference side, where instead of performance -- and the optimization is more around latency, so is that where you think it could be a lot more fragmented market? Is it fair to assume that we have seen, for example, even with Intel, that adding more capabilities in the CPU with new Cascade Lake, they have this DL Boost feature coming out later for more inference. They have shown a lot of data around that.

And I remember in the traditional graphic side and PC gaming, for example, they were able to add a lot more integrated graphics capability that kind of cut the room for discrete graphics. So, why isn't that more competition in inference, which could actually be a bigger market? Why aren't ASICs a lot more competition because they can be optimized to improve latency for very, very specific workloads?

Ian Buck

Yes. So, on the 90%-plus of inference today -- I mean they still don't own CPUs. And so this is an area of -- and what's driving the growth for our inference workloads, and its double-digits in terms of percentage of our data center revenues now, is the new workloads that just require acceleration. And it's not like they need a little bit. It's they simply cannot -- where orders of magnitude more efficient and lower latency because of the computational workloads they have to do.

So, I fully expect CPUs and Intel, it's logical, they will continue to add AI capabilities. But fundamentally, what they have built is a single-threaded execution CPU where we've optimized for executing the AI from the end architecture and the software stack and the numerical recipes to run these things efficiently with an accelerator that sits beside that CPU. As a result, we're between 10 and 50x lower latency, more throughput than what you can get on a CPU today and we'll continue to innovate down that path.

The other important part of this is a mixed precision. So, often when you're doing inference, you can take advantage of not running things with the standard 32-bit floating-point arithmetic. You can do things with 16-bit floating point or even 8-bit integer. There's even work being done in 4-bit integers, so literally four zeros and ones to execute inference.

While it's easy to make -- I don't want to say easy, but it's simple to make a 16-bit floating-point multiplier, you can -- those are -- there's nothing secret about that. The numerical recipes and calculations to do that with high accuracy, which is critical, it's hard, very hard. And one of the things that we've done is invest in -- put all of those numerical recipes that you actually can give it a full 32-bit floating-point neural network that's been trained.

And we compile it to FP 16 or 8-bit integer as part of our inference stack, and -- so that you don't have to do a numerical scientist to figure that out. So, we take all of the 100 or so neural networks that we're optimizing for and tuning for specifically. We fasten through our compilers and provide that publicly. That's what our TensorRT software does for it. And people are deploying that today real time across all those -- to all those different workloads.

Vivek Arya

Got it. What role does manufacturing process play in the performance aspects, right, of the accelerator? Because from the products NVIDIA has announced, you're still on 12-nanometer node. We have seen your graphics competitor go to 7-nanometer, planning to bring out a lot of 7-nanometer products later this year. A FPGA competitor is also planning to, right, increase 7-nanometer. So, what role does manufacturing process play in competitiveness and performance?

Ian Buck

Certainly, building great -- manufacturing is only one step of that process. The more important thing is building an architecture that is efficient, that can utilize those -- that computation, what those transistors can do to keep the memory and the compute flowing without any hiccups, to do the -- and to do things at the right precision and numerics and then, of course, the software stack on top of it.

So, as a result, what the end users and developers see is the output of the entire stack and the productivity of the entire stack. So, architectural -- more the architectural efficiency and deliberate performance matters way more than whether or not you're in 16 or 12 or 7. We're not in that -- definitely not in that space. As a result, I think we're -- if you just look at ResNet-50, we're four times faster than -- in training our ResNet-50 than an AMD 700 chip. And yes, we're in a custom 12-nanometer FinFET technology.

So, it's really about that deliberate performance across the whole stack that makes you efficient and performing in AI. And it's -- that's where the work's done. I mean NVIDIA has more software engineers than hardware engineers for good reason. You think of this as -- we're at a semiconductor conversation here, but I'm -- my background in software and we're as much or even more of a software company than hardware.

Vivek Arya

Got it. Just the last question in the few minutes we have left. So, NVIDIA announced a decision to buy Mellanox. Tell us, how does that fit into your overall data center business?

Ian Buck

Well, we are seeing growth at data center-scale computing, like I talked about in the beginning. It's not just how fast of a server you can make or how many GPUs necessarily you can fit inside of a server. And we've done a lot of work there with NVLink, with -- breaking from adding new capabilities like the ASICs and modules and building these high-performance servers. The pivot -- and it was always there in HPC with MPI and supercomputing, now seeing it broaden across the whole market for doing data center-scale computing. And data center-scale computing needs data center-scale networking.

Our solution today to deliver fast simulation or fast AI needs a confluence of amazing compute infrastructure, rich server -- GPU servers, a strong networking backbone that can do computation across -- distributed calculations across the multiple nodes at least and communicate.

And the third pillar would be the storage solution. There's lots of different storage providers out there doing great work. We've partnered with all of them. But -- so what we felt like, we could move fast and help -- well, Mellanox has always been a great partner with NVIDIA, worked with them for over a decade in HPC and now the AI space. And now we can move even faster with the data center-scale networking. And what people are building today is truly amazing. So, I'm excited to -- as things close, to work with them and define that future together.

Vivek Arya

Got it. Since we still have a minute left, I actually did want to ask you about China, right? So, with China as -- obviously, they are an important customer for your products but potentially also a competitor down the line. They have access to a lot of data. There's a lot of development in silicon that's going on there. They do have access to foundries. So, how do you see China as kind of both a customer and a competitor going forward?

Ian Buck

The products we're building for China are not just for AI. It is a worldwide market. Whether it be in the U.S., in Europe, amazing work happening in Japan or China, we provide that platform to all of our customers. Each of them needs to decide their build-versus-buy decision. I think you can hopefully understand in this conversation the cost and the challenge of building a platform and all the work that you have to do. And certainly, we've seen that with some other providers, and they have to make their own decisions about that.

Today, the Chinese hyperscalers; Alibaba, Baidu, Tencent, have been great partners and enjoy our platform. In the end, they're trying to provide a service to their customers, to -- whether it be a public cloud or Internet service technologies or social media platforms. So, they have to make a decision whether they want to invest in their own thing and detract from that or just leverage our platform and continue to move forward.

They've been great customers with their parts, so they're building for all of our hyperscaler customers, have really let them build those products and those new services and less be distracted by their own engineering efforts.

Vivek Arya

Got it. Perfect. Thank you, Ian. Thank you, Simona.

Ian Buck

Thank you.

Vivek Arya

Really appreciate your taking the time. Thanks, everyone, for joining us.