Nvidia Ada Lovelace and GeForce RTX 40-Series: Everything We Know
By the end of the year, Nvidia’s Ada architecture and the rumoured GeForce RTX 40-series graphics cards should be available, most likely in September or October. That’s two years after the Nvidia Ampere architecture and, given Moore’s ‘Law’ slowdown (or, if you prefer, death), perfectly on pace. We have a lot of information about what to expect thanks to the Nvidia hack earlier this year. Everything we know and expect from Nvidia’s Ada architecture and the RTX 40-series family has been gathered into this primary hub.
There are numerous theories circulating right now, and Nvidia has revealed very little about its ambitions for Ada, also known as Lovelace. What we do know is that Nvidia has described its data centre Hopper H100 GPU, and we expect consumer goods to follow in the not-too-distant future, much like the Volta V100 and Ampere A100.
That last example is perhaps the most representative of what to expect. The A100 was unveiled in May 2020, with consumer Ampere GPUs following four months later in the shape of the RTX 3080 and RTX 3090. If Nvidia’s Ada Lovelace GPUs follow a similar release timetable, the RTX 40-series should be available in August or September. Let’s start with a high-level summary of the Ada series’ alleged specifications.
First and foremost, the foregoing information should be taken with a grain of salt. For the GPUs, we’ve estimated clock speeds of 1.6 to 2.0 GHz, which is consistent with Nvidia’s prior Ampere, Turing, and even Pascal designs. Because it’s completely possible that Nvidia will beat those clocks, we’re treating these figures as estimates.
We’re assuming that Nvidia would employ TSMC’s 4N process — often known as “4nm Nvidia” — on all Ada GPUs, which could be technically inaccurate. We know Hopper H100 uses TSMC’s 4N node, which appears to be a tweaked form of TSMC’s N5 node, which has been believed to be the node Nvidia would use for Ada, as well as what AMD will use for Zen 4 and RDNA 3.
To be honest, the name of the node isn’t nearly as important as the GPU specifications and performance. In other words, “a rose by any other name would smell as delicious.” We’ve long since past the stage when the names of process nodes correspond to physical features on a chip. Physical scaling of chips has slowed down with the past several process nodes, and they’re now just marketing names. Whereas 250nm (or 0.25 micron) chips had elements you could point at and measure at 0.25um width, physical scaling of chips has slowed down with the past several process nodes, and they’re.
For the time being, transistor counts are a best estimation. We do know that the Hopper H100 will have 80 billion transistors (which is an estimate, but let’s go with it). The A100 GPU has 56 billion transistors, more than twice as many as the GA102 consumer halo chip, but there are hints that Nvidia may “go big” with the AD102 GPU, which could be closer in size to the H100 than the GA102 was to the GA100. If and when credible information becomes available, we’ll update the tables, but for now, any assertions of transistor counts are simply different estimations than ours.
Ada appears to be a monster in principle, given on the “leaked” information we’ve seen thus far. It will have much more SMs and related cores than existing Ampere GPUs, which should result in a significant performance improvement. Even if Ada turns out to be less powerful than the leaks suggested, we can expect performance from the top GPU — maybe an RTX 4090, though Nvidia may change the name again — to be a significant improvement over the RTX 3090 Ti.
At launch, the RTX 3080 was around 30% quicker than the RTX 2080 Ti, and the RTX 3090 contributed another 15%, at least when the GPU was pushed to its maximum by operating at 4K ultra. This is also something to consider. Even at 1440p extreme, if you’re using a less powerful processor than one of the absolute finest for gaming, such as the Core i9-12900K or Ryzen 7 5800X3D, you can find yourself CPU constrained. To get the most out of the fastest Ada GPUs, a major system update will almost certainly be required.
Let’s go into the details now that the high-level overview is out of the way. The number of SMs on Ada GPUs will be the most visible difference from the present Ampere generation. The AD102 might have 71 percent more SMs than the GA102 at the top. Even if nothing else in the architecture changes, we expect a considerable gain in performance.
This will apply to not only graphics but also other aspects. On the Tensor core performance, we’re using Ampere calculations, and a fully enabled AD102 device running at close to 2GHz would provide deep learning/AI compute of up to 590 TFLOPS in FP16. By comparison, the GA102 in the RTX 3090 Ti reaches roughly 321 TFLOPS FP16 (using Nvidia’s sparsity function). Based on core counts and clock speeds, that’s an 84 percent gain. Ray tracing devices should benefit from the same theoretical 84 percent speed improvement.
That is, unless Nvidia reworks the RT cores and Tensor cores for third- and fourth-generation implementations, respectively. Though we could be incorrect, we believe that large upgrades to the Tensor cores are unnecessary – the big gains in deep learning hardware will be more for Hopper H100 than Ada AD102. Meanwhile, the RT cores might likely see enhancements that boost per-core RT speed by another 25–50% over Ampere, just as Ampere was roughly 75% faster per RT core than Turing.
In the worst-case scenario, simply porting the Ampere architecture from Samsung Foundry’s 8N process to TSMC’s 4N (or 5N or whatever) process and not changing much else with the architecture, adding more cores and maintaining similar clocks should provide more than enough of a generational performance increase. Nvidia may produce far more than the bare minimum, but even the entry-level AD107 chip would be a good 30% or more better than the current RTX 3050.
Remember that the SM counts indicated are for the entire chip; Nvidia will most likely use partially deactivated chips to boost yields. For example, the Hopper H100 has 144 potential SMs, however only 132 are enabled on the SXM5 model, while 114 are enabled on the PCIe 5.0 card. Nvidia will most likely release a top-end AD102 solution (i.e. RTX 4090) with between 132 and 140 SMs, with lower-tier variants having fewer SMs. Of course, this leaves the door open for a future card (such as the RTX 4090 Ti) with a fully active AD102 after yields have improved.