Nvidia Corporation (NASDAQ:NVDA) impressed again with a beat this quarter and a raise next quarter. However, that wasn’t enough to move the stock price. It was during the earnings call that we saw the stronger price action when management discussed the Blackwell architecture. The first question on the earnings call was a direct question on when Blackwell will be in production: Q: “So this year, we will see Blackwell revenue, it sounds like?” A: The CEO offered one, simple sentence in a measured tone: “We will see a lot of Blackwell revenue this year.”

There were other bullish remarks about Blackwell ramping this year, such as

We will be shipping [Blackwell]. Well, we've been in production for a little bit of time. But our production shipments will start in Q2 and ramp in Q3, and customers should have data centers stood up in Q4.”

This was strong language to use, as it’s apparent that Hopper has runway left given the beat/raises we saw in this quarter. To have the two architectures merge seamlessly in terms of timing in H2 is quite ideal.

Nvidia is the world’s leading GPU design company, which bears reminding since such little emphasis on Wall Street is placed on what the designs intend to solve. We’ve covered Nvidia extensively for years, with our inaugural articles on data center GPUs and Nvidia as an AI stock published in Seeking Alpha in November 2018 and March 2019. Since then, I’ve written over 25 analyses on the free side and premium side that highlights why Nvidia was posed to be the next King of Tech and would even surpass Apple. At the time, this was inconceivable.

Now, for those paying close attention, there are clues that the company’s fast and furious data center growth will see a second wind with Blackwell. The new architecture is at the forefront of training and inference for trillion+ parameter models. The analysis below is in-depth, but well worth the time in understanding how Nvidia plans to pass the baton from the Hopper architecture (2023-2024) to the Blackwell architecture (2024-2025). More than five years ago, I called CUDA the moat for Nvidia’s AI data center story, yet should that moat become breached, the company’s rapid product road map is the first line of defense.

Nvidia is Hitting Peak Growth: The Hopper Impact

Revenue of $26 billion was up 18% QoQ and up 262% from the year ago quarter. This means Q4 was officially the peak quarter for revenue growth, which we covered previously on Tech Insider Research. Revenue beat expectations by 5.9% with analysts expecting $24.6 billion in revenue for growth of 242% YoY.

Nvidia will now face tougher comps as it laps Hopper’s impact from last year. The company is off to a decent start by forecasting next quarter revenue of $28 billion. Analysts were expecting $26.84 billion. This represents growth of 107% up from growth of 98.8% expected.

The intra-quarter revisions are particularly strong. However, regardless of ongoing upward revisions, it’s unlikely we return to the peak growth we saw in Q4 and Q1 (current quarter).

Typically, a growth investor should be cautious when a company hits its peak growth rate after a drastic rise in the stock price.

Organic Growth

However, Nvidia’s margins and earnings expansion are creating an outlier of a stock. Margins may be a bit more muted as we go along as there are rumors Blackwell GPUs will be priced starting at $30,000 to $40,000 but will have more expensive memory components with HBM3e. As long as margins remain within range, this will not be consequential considering Nvidia is posting organic growth.

This is drastically different from a stock that relies on growth at any cost, which is where rapid growth is bought rather than earned. The quality of Nvidia’s growth is much better than what tech investors are used to, and this is predominately why Nvidia stock is resilient (within reason; there will always be selloffs in the market). As supply/demand becomes more balanced, it will be Nvidia’s aggressive product road map, which often is designed to compete with themselves, that will keep pricing power stable, starting with Blackwell.

For example, there are recent reports that Amazon's (AMZN) AWS is pausing orders on Hopper GPUs ahead of Blackwell GPUs. The market may interpret this as weakness, but this is actually a sign of immense strength. Nvidia needs to pass the baton from the H100s and H200s to the Blackwell architecture for the stock price to extend. We are less concerned with what happens in the immediate-term, and in fact, we have stated a few times that Nvidia is a buy on dips, implying the stock won’t go up forever. Instead, we are encouraged to see early signs of a careful transition to the next architecture to help inform our next buy.

Nvidia’s $150B to $200B Data Center: The Blackwell Effect

There is nothing quite like rapid earnings revisions intra-quarter to determine the quality of a position. For example, consider that Nvidia sold off directly after the November report, yet has rapidly gone up 91% since. The earnings revisions are why Nvidia is so strong intra-quarter:

This upcoming quarter is expected to report growth of 242%. Last August, the growth for the April quarter was expected to be 91.6%. Only three months ago, the estimates for the April quarter were for growth of 197.5%. Stated in terms of revenue, this quarter’s revisions have doubled from $13.8 billion in August to $24.5 billion. The company reported $26 billion, up 262%.

Next quarter, the company is expected to report growth of 107%. This was expected to be growth of 44.6% last November. Stated in terms of revenue, next quarter’s revenue has gone up $7 billion from $19.5 billion in November to $28 billion in May. In the past three months alone, the estimates went up $8.5 billion.

Below, we discuss why margins, cash flow and strong earnings support our decision to buy on dips. However, there is also a decent probability that FY2026 and FY2027 revenue estimates are too low. The most bullish analyst from KeyBanc is calling for a $200 billion data center segment by 2025. HSBC believes Nvidia’s FY26 revenue could be as high as $196 billion, which implies about a $192 billion data center segment. Loop Capital foresees a $150 billion data center segment as soon as this year, while Wells Fargo has estimates for a $150 billion data center segment by 2027. The exact timing from these analysts has a range, but the conclusion is very similar.

There are additional data points in the supply chain and on the demand side that support Blackwell seeing an increase in orders over Hopper. For example, Taiwan Semi’s (TSM) CoWos capacity, which is essential for Blackwell’s architecture, is estimated to rise to 40,000 units/month by the end of 2024, which is more than a 150% YoY increase from ~15,000 units/month at the end of 2023. Applied Materials (AMAT) has boosted its forecast for HBM packaging revenue from a prior view for 4X growth to 6X growth this year. According to Wells Fargo, Taiwanese export data rose 360% year-over-year and 33% quarter-over-quarter, and is often correlated to Nvidia data center revenue.

Notably, the premiere component for the H200 and Blackwell is HBM3e memory, which is currently supply constrained. Samsung (OTCPK:SSNLF) and SK Hynix are both re-allocating ~20% of DRAM production capacity to HBM to meet high demand, while HBM4 roadmaps are being accelerated.

CEOs of major companies in AI acceleration are in agreement the total addressable market is much, much larger than today’s market size. Lisa Su of Advanced Micro Devices (AMD) has stated the AI chip market will reach $400B by 2027. Intel’s (INTC) CEO has stated AI chips will become a $1T opportunity by 2030, which is almost twice the size of the entire chip industry in 2023.

Big Tech capex is supporting this growth. Our firm has been especially strong on correlating capex to AI investments for our paid research members, where we held a 1-hour webinar in April discussing our expectations that capex increases supporting AI stocks. We followed this up with free analysis on Seeking Alpha that tracked a 35% YoY increase to $200 billion across Big Tech companies. A disproportionate amount of this will go to Nvidia.

Nvidia’s Blackwell will Answer to Hopper’s Excellence

The product road map is the single most important thing investors should be focused on. A good chunk of the AI accelerator story is understood at this point. What is not understood is how aggressive Nvidia is becoming by speeding up to a one-year release cycle for its next generation of GPUs instead of a two-year release cycle.

This means Nvidia is competing with itself by putting Blackwell dangerously close to Hopper’s product cycle. This move is bold, it’s daring, and it’s absolutely necessary.

Here is the very ambitious eight month schedule Nvidia has set for itself:

The H200 with HBM3e is shipping now.

The B100 and GB200 are shipping in late 2024.

The B200 will be released in early 2025.

The Blackwell architecture remains on 4nm dies, similar to the Hopper architecture. What is different is that Blackwell has 2 reticle-sized GPU dies. Reticle size refers to the limit in the chip surface that can be exposed by a single mask. The limit is set by the lithography equipment. At one point it was expected Blackwell would be on 3nm dies, yet due to reasons unknown, Nvidia is moving forward with 4nm. Since Nvidia cannot offer a more advanced process node, the company is instead doubling the silicon. The Blackwell architecture is rumored to be priced between $30,000 to $40,000, which is higher than the H100’s reported $25,000 cost. This is competitive considering B200 will offer nearly 30X better performance (benchmarks are provided by Nvidia).

B100 & B200

The B100 is a replacement chip, which means customers can remove the H100 and place the B100 in the same rack. The B100 is air-cooled and doubles NVLink speeds from the H100 and H200. The B100 will ship in Q3 and provide upgrades to memory from 80GB in the H100, 141GB in the H200 to 192GB in the B100.

The B200 GPU chipset due in Q1 of next year will deliver a 2.5X training improvement and 5X inference improvement over the H100. This is due to the B200 having 208 billion transistors compared to the H100’s 80 billion transistors.

The B200 will also have 20 petaflops of FP4 compared to the H100’s 4 petaflops of FP8, reaching 32 petaflops of FP8 in the DGX H100 systems. The difference is that the smaller bit size allows for an economical way to achieve more speed when giving up a small amount of accuracy doesn’t make a critical difference. This also helps in the face of a slowing Moore’s Law. Following the release of the Hopper H100, Intel released Gaudi2, which supports FP8. About two years back, chipmakers Graphcore, AMD and Qualcomm pushed for an industry-standard for floating-point format FP8. However, the recent B200 will have a second-generation transformer engine that supports 4-bit floating point (FP4) with the goal of doubling the performance and size of models the memory can support while maintaining accuracy.

Part of the secret sauce of the H100 is the transformer engine. The A100 lacked support for FP8 compute at default, whereas the H100 leveraged a transformer engine to switch between FP8 and FP16, depending on the workload. The second-generation transformer engine in the Blackwell architecture will offer FP4. This is helpful because AI models are moving toward neural nets that lean on the lowest precision and yet still yield an accurate result. In this case, 4 bits double the throughput of 8-bit units, compute faster and more efficiently, and they require less memory and memory bandwidth.

The main feature of the Transformer Engine is the ability to choose what precision is needed for each layer in the neural network at each step, transitioning between 4-bits, 8-bits, 16-bits, or 32-bits. The H100 can do matrix math with two forms of 8-bit numbers with either 5-bits as the exponent or 4-bits as the exponent: E5M2 and E4M3. This is important because the E4M3 may be favored for back propagation, while E5M2 may be favored for inferencing.

Building on the first-gen transformer engine, the B200’s second-gen transformer engine will support double the compute and model sizes with new 4-bit floating-point AI inference capabilities.

GB200

According to the current product road map, the GB200 will be released before the B200 GPUs. The real fireworks will begin with the GB200 NVL36/NVL72 systems in late 2024 and then continue with the B200 GPUs in early 2025.

The GB200 Grace Blackwell chip connects two Blackwell Tensor core GPUs with the Nvidia Grace CPU. The GB200 NVL 72 rack-scale exascale supercomputer, connects 36 Grace CPUs with 72 Blackwell GPUs in a rack-scale design with liquid cooling. We’ve written in-depth about liquid cooling for our premium research members, learn more here.

According to HSBC, the average sales price of NVL36/NVL72 server rack will be $1.8 million and $3 million, respectively. Notably, it is expected the GB200 systems will have strong margins due to using an in-house CPU.

Here are the stats provided from Nvidia on how it will compare:

30X faster real-time trillion-parameter LLM inference

4X LLM training

25X energy efficiency

18X data processing.

Nvidia, the GB200 System due to ship in Q4 this year (Nvidia)

The GB200 will provide 4X faster training performance than the H100 HGX systems and will include a second-generation transformer engine with FP4/FP6 Tensor core. As stated above, the 4nm process integrates two GPU dies connected with 10 TB/s NVLink with 208 billion transistors.

NVLink Switch is a major component to the Blackwell upgrade. Fifth-generation NVLink enables multi-GPU communication at high speed, reaching 1.8 TB/s bidirectional throughput or 14X the bandwidth of PCIe for a single GPU.

For the NVL72 systems, NVLink Switch can reach 130 TB/second, which is “more than the aggregate bandwidth of the internet.” Therefore, it’s the compute and the communication capabilities of the upcoming GB200 release that are important to consider. The 72 GPUs in the NVL72 can be used as a single accelerator for 1.4 exaflops of AI compute power.

Why GB200s and B200s will drive more demand:

To scale up a model, AI departments utilize a Mixture of Experts (MoE) approach. MoE distributes a computational load across “multiple experts” (or neural networks) and trains across thousands of GPUs using what is called model and pipeline parallelism. This enables more compute-efficient pretraining, yet the parameters still need to be loaded in RAM, so the memory requirements remain high.

For inference, GB200 will deliver “a 30X speedup” for 1 trillion­­+ parameter models by leveraging FP4 precision and fifth-generation NVLink. This is what that the leap in real-time throughput for inference looks like for a 1.8 trillion parameter model:

Nvidia Blog

Blackwell is for the trillion+ parameter era of generative AI. The architecture is designed to support the largest language models today and is future-proofed with the GB200 NVL72 rack-scale solution, which is an exascale computer that contains up to 5,000 NVLink cables that total 2 miles. You also have to consider that AMD was coming to market in the first release with nearly 2X memory as the H100. Nvidia is remaining competitive with HBM3e and soon HBM4 to help models run in memory.

The GB200 also has a new decompression engine that allows GPUs to process and decompress compressed data sets to speed up database queries. Coupled with 8 TB/s of high memory bandwidth and high speed NVLink, the GB200 systems deliver up to 18X faster database queries. In addition to this, there are up to 13X faster physics-based simulations compared to CPUs and 22X faster simulations for computational fluid dynamics (CFD).

More on Memory:

High bandwidth memory (HBM) offers higher bandwidth, capacity, performance, and lower power by vertically stacking up to twelve DRAM memory chips to shorten how far data has to travel, while also allowing for smaller form factors. Stacked memory chips are connected through something called “through silicon vias” or TSVs. HBM is increasingly being used to power machine learning, high-performance data centers, and more recently, generative AI models.

CoWoS (chip-on-wafer-on-substrate) architecture refers to 3D stacking of memory and processor modules layer by layer to create chiplets. The architecture leverages through-silicon vias (TSVs) and micro-bumps for shorter interconnect length and reduced power consumption compared to 2D packaging.

The advanced CoWoS packaging that is needed to combine logic system-on-chip (SOC) with high bandwidth will take longer, and thus, it’s expected that Blackwell will be able to fully ship by Q4 this year or Q1 next year. How management guides for this will be up to them, but commentary should be fairly informative by Q3 time frame.

GPUs will move from 8Hi configurations to 12Hi HBM3e configurations by 2025. These upgrades are needed to train and deploy large models with trillions of parameters in the near future. What Nvidia’s product road map intends to accomplish is a way forward for real-time inference that is computationally efficient, cost-effective and energy efficient.

My firm has covered HBM3e in the past, when we stated in a premium research report six months ago:

The recent surge in generative AI and AI GPUs, spurred by the success of OpenAI’s ChatGPT and development of hundreds of other large language models, are forecast to bring about a new DRAM market, underpinned by high-bandwidth memory (HBM) and DDR5 […] Nvidia is rapidly moving forward with its GPU roadmap, as it aims to launch its next-gen H200 and B100 GPUs next year followed by the X100 GPU in 2025 – each GPU will accelerate AI inference times along an exponential curve, thus creating a need for more memory and more bandwidth.”

Post Q1 Revenue and EPS:

Revenue of $26 billion is up 18% QoQ and up 262.1% from the year ago quarter. This means Q4 was officially the peak quarter for revenue growth. Revenue beat expectations by 5.9% with analysts expecting $24.6 billion in revenue for growth of 242% YoY.

Nvidia will now face tougher comps as it laps Hopper’s impact from last year. The company is off to a decent start by forecasting next quarter revenue of $28 billion, representing a 107.3% YoY growth at the midpoint. Analysts were expecting $26.84 billion.

The intra-quarter revisions are particularly strong. However, regardless of ongoing upward revisions, it’s unlikely we return to the peak growth we saw in Q4 and Q1 (current quarter).

GAAP EPS of $5.98 compares EPS of $4.93 last quarter. This represents QoQ earnings growth of 21.3% and YoY earnings growth of 629%.

Adjusted EPS of $6.12 beat estimates of $5.58. This represents growth of 18.6% QoQ and 461% growth YoY.

Margins:

As expected, margins have expanded across the board.

GAAP gross margin of 78.4% compares to 64.6% in the year ago quarter, up 13.8 points YoY and up 240 bps from last quarter. This represents gross profit of $20.94 billion.

We will see a softening in gross margin due to a deceleration from peak revenue. Management is guiding for GAAP gross margin of 74.8% for next quarter, with added color that the full-year gross margins “are expected to be in the mid-70% range.”

GAAP operating margin of 64.9% compares to 50.3% in the year ago quarter. This represents operating profit of $16.9 billion. For next quarter, GAAP OPM is expected to soften to 60.5%, according to management’s guidance.

Net margin this quarter was 57.1% compared to 28.4% in the year ago quarter, and was up 150 basis points QoQ. This represents a net profit of $14.9 billion.

Cash Flow:

Cash flow was strong (unsurprisingly) with some of the highest free cash flow margins among the Mag 7:

Operating cash flow of $15.35 billion represents a margin of 58.9% which expanded 690 bps QoQ from 52% and expanded 18.4 points in the year ago quarter.

Free cash flow of $14.94 billion represents a margin of 57.3%, which was up 660 bps and is up 20.5 points YoY.

The company has $31.4 billion in cash and $9.71 billion in debt.

Nvidia announced a ten-for-one stock split, which will be effective June 6th, 2024. Trading will commence on a split-adjusted basis at market open Monday, June 10th, 2024.

Nvidia is increasing its cash dividend by 150% from $0.04 per share to $0.10 per share of common stock. The increased dividend is equivalent to $0.01 per share on a post-split basis. This quarter, the company utilized cash of $7.8 billion towards shareholder returns, including $7.7 billion in share repurchases and $98 million in cash dividends.

Key Segments:

Data center revenue, of $22.6 billion, was up 427% YoY and up 23% QoQ. This marks an annualized run rate of $90 billion. Before earnings, we updated our investment thesis to state that we will see a $200 billion data center segment revenue by the close of FY2026 based on the Blackwell architecture, which would represent 65% upside from current analyst data center estimates. This requires speculation, of course, but management did state this in the call:

Blackwell will be available in over 100 OEMs at launch nearly double compared to Hopper, and will support broad and fast deployments.”

Management’s Q2 guide implies data center revenue of about $24 billion next quarter. This is assuming $4 billion from the other four segments, which reported a combined $3.5 billion this quarter. The CFO stated all segments would be up in Q2 on QoQ basis: “We expect sequential growth in all market platforms.”

Gaming reported revenue of $2.65 billion, which was up 18% YoY yet is down 8% QoQ. The company said the following in the opening remarks: “GeForce RTX GPUs, now with over 100 million installed base, gamers, creators and AI enthusiasts, unmatched performance for Gen AI on PCs.”

ProViz revenue, of $427 million, was up 45% YoY and down 8% QoQ

Automotive was up 11% YoY and up 17% QoQ

OEM and other revenue of $78 million was up 1% YoY, but down 13% QoQ.

Earnings Call:

One of the key points in the earnings call was the ROI that cloud service providers will see from renting GPUs. This may have been provided to help shine some light on why capex budgets continue to grow (emphasis added):

For every $1 spent on NVIDIA AI infrastructure, cloud providers have an opportunity to earn $5 in GPU instant hosting revenue over four years. NVIDIA's rich software stack and ecosystem and tight integration with cloud providers makes it easy for end customers up and running on NVIDIA GPU instances in the public cloud. For example, using Llama 3 with 700 billion parameters, a single NVIDIA HGX H200 server can deliver 24,000 tokens per second, supporting more than 2,400 users at the same time. That means for every $1 spent on NVIDIA HGX H200 servers at current prices per token, an API provider serving Llama 3 tokens can generate $7 in revenue over four years.”

The company also went out of its way to highlight that they are well diversified beyond major cloud providers by pointing out that:

Large cloud providers continue to drive strong growth as they deploy and ramp NVIDIA AI infrastructure at scale and represented the mid-40s as a percentage of our Data Center revenue.”

They highlighted that enterprises like Tesla and consumer internet companies like Meta are also strong growth verticals. Management also emphasized that it’s not only companies they have as customers, but also countries like Singapore and Japan.

Conclusion

As stated on Making Money with Charles Payne before earnings, the Q1 earnings report is only one piece to the story, whereas the ultimate fireworks will be when the Blackwell architecture begins to ship in Q3-Q4. The product road map is communicating that AI accelerators are secular; not cyclical.

We have seen peak growth this quarter – even with the beat/raise that Nvidia is becoming known for, H2 will certainly see a slowdown. This is normally a great jumping off point for investors, but those who stick with Nvidia will be rewarded for a few reasons:

This is an organic growth company, which is very rare in tech, where most growth is bought. That means Nvidia is likely to remain strong on margins and EPS, even in the face of slowing revenue growth.

The supply chain is providing hints that analyst estimates for the data center are too low – there could be up to 65% upside on those estimates in the next 6–7 quarters.

The reason I side with Keybanc, Loop and others in thinking the estimates are too low – and this last point is critical – is because Nvidia is speeding up its product road map and introducing the Blackwell architecture to address the trillion+ parameter models that Big Tech will compete to create and train.

Nvidia has sold off 10% or greater about 9 times since the 2022 low. We see any dips as buying opportunities as we brace for Blackwell toward the end of this year.

