You have a graphics workstation, so which graphics do you need?
If you need a mobile workstation with the highest GPU performance, then you will gravitate toward options that include the fastest, most powerful GPU that you can find. If that is your requirement, then you will likely also need plenty of RAM, storage, a fast CPU, and a 4K display. It’s easy to see your choices coming down to a fully configured 17-inch mobile workstation.
If your workflow is essentially limited to 3D CAD and 3D modeling, then you might be looking at a well configured 15-inch system with a fast, but basic, GPU.
The issue in making the right choice is this: many professional workflows lie somewhere in between these two endpoints described above.
Over the past 18 months, NVIDIA has driven the Turing architecture into nearly every make and model of mobile workstation on the market. The mobile workstation GPUs include the Quadro T1000, T2000, RTX 3000, RTX 4000, and the RTX 5000.
Octane Bench pushes NVIDIA GPUs to 100% capacity with CUDA-based rendering workloads
All of these GPUs are based on NVIDIA’s Turing architecture. The Quadro RTX versions are not only larger and faster, but they contain NVIDIA Tensor Cores for artificial intelligence applications and NVIDIA RT Cores for raytracing applications. The Quadro T1000 and T2000 do not have these cores. The Tx000 models are smaller GPUs, have less graphics memory, and consume less power.
This makes the choosing between a Quadro T2000 and Quadro RTX 3000 is one of the more critical choices.
It’s time for a closer look
Let’s start at the top and look at the specs for the Quadro T2000 and the Quadro RTX 3000 mobile GPUs. I had similar mobile workstations for testing that had these two GPUs. Both the systems used NVIDIA’s “Max Q” version of the mobile GPU, so the comparison is direct and relevant.
Specs | Quadro T2000 | Quadro RTX 3000 |
---|---|---|
Transistors | 4.7 billion | 10.8 billion |
Die Size | 200 mm2 | 445 mm2 |
Memory / Type | 4 GB GDDR5 | 6 GB GDDR6 |
Memory Bus | 128 bit | 192 bit |
Memory Bandwidth | 112.1 GB/s | 288.0 GB/s |
GPU Base Clock | 1200 MHz | 600 MHz |
GPU Boost Clock | 1620 MHz | 1215 MHz |
Power (TDP) | 40 W | 60 W |
CUDA Cores | 1024 | 1920 |
TMUs (Texture Units) | 64 | 120 |
ROPS | 32 | 64 |
Tensor Cores | 0 | 240 |
RT Cores | 0 | 30 |
What stands out is that the Quadro RTX 3000 is more than twice the size of the Quadro T2000. And that doesn’t translate to twice the memory, twice the power, twice the performance, or twice the price.
The Quadro RTX 3000 has almost twice the number of CUDA cores and TMUs, and it does have twice as many Raster Output Units (ROPs). The GPU also adds 240 Tensor Cores and 30 RT Cores which account for the more-than-double transistor count.
The Quadro RTX 3000 consumes 50% more power than the Quadro T2000. This last point, along with the size, is a difference that significantly impacts the mobile workstation design.
Performance differences between the Quadro T2000 and the Quadro RTX 3000
Performance is the key. Let’s first look at SPEC Viewperf 2020 performance. Viewperf is useful for us here because it is very good at isolating the GPU performance.
The most GPU-intensive test, Energy, shows the greatest performance difference of nearly 60%. The least difference is found in one of the CAD tests, Siemens NX, with a performance gain of just over 20%. Mostly the performance delta in these tests lies between 30% and 40%.
Since Viewperf is excellent at isolating the GPU performance because it strips out the application-related overhead while using typical geometry from each set of software. What this implies is that you will likely see a smaller performance difference when benchmarking the applications themselves because the general application overhead is part of your real workflow.
The points to take away from this benchmark result are:
1. The Quadro RTX 3000 has a larger performance advantage when running GPU-heavy workloads, and
2. Interactive 3D application performance will be noticeably more responsive with the Quadro RTX 3000 even if the overall application performance increase is not 20% - 60% as in these tests.
Octane Bench measures NVIDIA GPU performance running CUDA applications. The tests in the benchmark are GPU-based rendering computations. The benchmark loads the GPU to its maximum capacity. GPU utilization essentially remains at 100% for the duration of the benchmark.
The CPU is not used in this test. The CPU spikes to 51% capacity when each test loads. After that, the percentage CPU load is in the single digits.
The Quadro RTX 3000’s performance advantage ranges from 66% to 87%. The version of Octane Bench used here is not specifically using the specialized Tensor Cores and RT Cores. This means that these results are a one-to-one GPU-computing comparison.
What is apparent from these results is that GPU-computing centric applications benefit significantly if you chose a Quadro RTX 3000 over a Quadro T2000.
In Premiere Pro rendering tests, the result is normalized against the video sequence duration. In this benchmark, a one minute video sequence that renders in one minute will have a score of 100%. If it renders in 30 seconds, it scores 50%. Therefore, lower scores are better.
These tests avoid using a highly compressed, long-GOP video format. Rendering a complex sequence with multiple video streams using a long-GOP format chokes the CPU and rendering times go through the roof whether you have a fast GPU or not.
In the last test using a single 4K video sequence with color correction, the result is 30% faster with the Quadro RTX 3000. With more complicated video sequences, the performance difference increased to above 40% and in one test, the Quadro RTX 3000 is more than twice as fast as the Quadro T2000.
The result? The faster GPU is a must-have configuration choice if you are not doing the most basic video editing or if your system’s primary workload is full-time, repetitive video editing and rendering.
What about those Tensor Cores and RT Cores?
This is an interesting point. The testing above doesn’t use either of these new features in the RTX boards. That’s a benefit for a head-to-head performance comparison.
But reality is different. Here is why.
If you use your mobile workstation in product design, then visualization tools are also important to your workflow. High-end rendering means raytracing. And for real time raytracing, software vendors are using both RT Cores and Tensor Cores.
The RT Cores are dedicated to accelerating raytracing. But raytracing is very hard work. A trick that allows software to deliver real time raytracing is to combine GPU-accelerated raytracing with AI.
With NVIDIA’s help, application developers have implemented GPU-accelerated raytracing to quickly generate a decent, but still grainy ray-traced image. Then they apply an AI to clean up the image correctly for a final frame. The process of cleaning up a grainy image is called de-noising. With Tensor Cores, the RTX GPU is able to perform the de-noising process much faster than using brute-force calculations – even with RT Cores.
Artificial intelligence is such a useful technology that it is applied similarly to many other problems. Video applications like Blackmagic Design’s DaVinci Resolve take advantage of fast GPU-computing. Using their Neural Engine, many features are also accelerated using AI.
If you are a product design engineer, the extra computing power and graphics memory will make a difference in one of the new real time simulation applications like Ansys Discovery Live. As with rendering, video, and special effects, simulation programs are leveraging artificial intelligence to deliver results faster.
Because AI can be applied across many workstation applications, you may already be able to take advantage of the Quadro RTX 3000’s Tensor Cores. It is very likely that additional features in your core applications will become AI-accelerated during the three year lifecycle of your mobile workstation.
Pricing differences: how much more does a Quadro RTX 3000 cost?
Here is the good news. If you are doing serious work on your mobile workstation, then the rest of the system has to be a reasonable configuration. Let’s forget about anything below a Core i9 CPU, 32 GB of RAM, 1 TB of SSD storage, and a 4K display.
I looked at one workstation vendor’s pricing for comparable mobile workstations. Going from Intel integrated graphics to a Quadro T2000 added $265 without tax. Going from a Quadro T2000 to a Quadro RTX 3000 added another $160.
Then I specified a system that has an i9, 64 GB of RAM, a 2 TB SSD drive, and a 4K display. With the Quadro T2000, the mobile workstation was just over $5000. Upgrading to a Quadro RTX 3000 increased the price to just under $5200.
That equals a 3% increase in cost before tax.
A Final Perspective
How does a 3% increase sound? Your workstation’s raw GPU performance will increase at least 20% and usually much more. Your workstation will have more graphics memory for complicated workloads. And your workstation will be equipped with RT Cores and Tensor Cores which will accelerate your workflows today and into the future.
My guess is that you already see my point. Looking at the choice from this point of view, I would take the Quadro RTX 3000 every time.