TensorFlow on the Scan 3XS Data Science Workstation

Information is everything. Turning mountains of data into valuable information requires a workstation fully configured with exactly the right hardware and software. Scan has the answer for you.

The world produces enormous amounts of data every 24 hours. And the volume of data is growing. One estimate predicts that we will produce over 450 million terabytes of data every day by 2025.

Data is not information. For companies to make decisions, data must become information. Until recently, processing the vast quantities of data has been very difficult if not impossible. In the last few years, converging hardware and software technologies make the impossible, possible.

Artificial intelligence (AI) and big data analytics technologies are key technologies in the world of data science. 

The Data Science Workstation

Data Science Workstations accelerate AI and data science projects.

Enter the Data Science Workstation from Scan Computers

Dual Quadro RTX 8000

The GPU is a Highly Parallel Computing Engine

In AI, applications are driven by neural networks and deep learning. Training a neural network so that it can perform its task is an inherently parallel computing problem which requires large data sets and a large number of training iterations. This massively parallel, computationally intensive problem was unrealistic until GPU-computing could be applied.

The GPU has always been a massively parallel computing processor. Parallelism is exactly the nature of graphics processing. It’s natural to apply GPU power to other, parallel processing intensive problems.

These processors were developed and grew in the shadows. In the 80’s and early 90’s graphics was accelerated by ASICs (an Application-Specific Integrated Circuit). These chips were not programmable, and they were referred to simply as graphics chips.

Initially, graphics chips executed very low-level 2D drawing functions: line drawing and illuminating individual pixels. They also developed basic 3D functions like triangle shading and texture mapping. The computationally intensive graphics functions were still processed on the CPU.

Transistor counts increased and more of the 3D computations were absorbed by the graphics chip. Further acceleration required the graphics chip to become programmable. Initially, only simple programmability with limited flexibility was possible.

But the quest for faster, more realistic graphics meant that the ability to program a graphics chip was unstoppable. The graphics chip became a complex parallel computing processor. Initially, developers loaded and ran general purpose programs on the GPU via graphics APIs such as OpenGL. When NVIDIA married the graphics processor with the C-like programming language, CUDA, the GPU was finally born.

GPUs are now the key computing technology driving AI training applications today. The Scan 3XS Data Science Workstation uses two Quadro RTX 8000 graphics cards. They are among the most powerful GPUs on the market.

The 3XS Data Science Workstation in this review has been conceived for this kind of project. It’s more than just powerful hardware. The company has years of experience in AI and data science. Scan has analyzed the requirements of a data scientist. Scan has multiple data scientists in the company who add their expertise to the final product. The result is a data science workstation that completely redefines the meaning of “turnkey solution”.

The test system is configured with two very powerful GPUs, dual processors, 196 GB of main memory, and fast storage. What makes a Data Science Workstation particularly interesting? Like the hardware, the software stack requirements have been analyzed, well-defined, pre-installed, and thoroughly tested. In addition to delivering a platform with well-defined hardware and software, the solution includes enterprise support from Scan and from NVIDIA.

Extreme GPU performance

First, the hardware specification is critical. Whether in AI or in analytics, data science problems are accelerated with GPU computing. Choosing graphics is arguably the most important decision.
 
At the heart of the 3XS Data Science Workstation are two Quadro RTX 8000 graphics boards. Each hosts 48 GB of GDDR6 graphics memory, and the boards are connected with NVLink.

Performance also scales well with multiple GPUs. NVLink is a high-speed interconnect device developed by NVIDIA which allows the GPUs to communicate and share data efficiently and is critical to multi-GPU performance.

The Quadro RTX 8000 is the flagship GPU of the Turing architecture from NVIDIA. The board requires two slots and a lot of power: 295 W total dissipated power (TDP).

Scan 3XSTechnologyBenefit
GPUDual Quadro RTX 8000 graphics boards connected via NVLinkGPU acceleration is critical to data science application performance. We found that application performance scales linearly with GPU performance, including with multiple GPUs. 
CPUIntel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz – Max Turbo of 3.20 GHz, 16 cores, Hyper-threading, and supports up to 1 TB of memoryDual Xeon CPUs provide 64 threads and the CPUs provide the processing performance and memory capacity required to feed the high-performance GPUs. These CPUs include Intel DL Boost technology to accelerate AI applications.
Samsung SSD 970 EVO Plus 2TBM.2 SSD using V-NAND 3-bit MLC2 TB capacity storage with high performance and high reliability. 
ST8000NE001-2M71018 TB SATA 6 GB/S storage with a 7200 RPM spindle speed and a 1.2 million hours MTBF rating. Enterprise-level, high-capacity storage.
Memory196 GB of 2666 MHz memory (12 DIMMs)Extensive memory capacity to process large data sets.

Options are plentiful on this workstation. There is an option to select one CPU and one GPU.  With two CPUs there is an option for either one or two GPUs. The maximum memory configuration is 768 GB. A single NVMe SSD drive with up to 4 GB of storage is possible. Up to 10 SATA drives can be configured either as SSD or as HDD.

GPU options include one or two Quadro RTX 6000 boards, one or two Quadro RTX 8000 boards or one or two GV100 boards.   

The Quadro RTX 8000 GPU sports 18.6 billion transistors and is built on a 12 nm process. All of those transistors deliver impressive specs. The GPU has 4,608 CUDA cores and 576 Tensor cores. The board is manufactured with 48 GB of GDDR6 memory. The large memory capacity allows for larger data sets to be kept on the graphics card. 

Tensor cores are designed to accelerate Deep Learning. They accelerate specific mathematical operations which are commonly found in Deep Learning applications. One example includes a combined matrix multiply plus accumulate function.

RT cores are specific to casting rays in raytracing. The accelerated raytracing pipeline is combined with NVIDIA’s AI-based de-noising function. The combination uses both RT cores for graphics and Tensor cores for AI to enable real-time raytracing. It is one example in which AI is able to accelerate processes by orders of magnitude compared to alternative methods.

The Data Science Workstation

Data Science Workstations combine specialized hardware, software, and professional support.

Each Quadro RTX 8000 draws nearly 300W and occupies two slots. The boards are connected via NVLink. NVLink provides a high-speed interconnect between the GPUs and their memory which makes on-board calculations more efficient. NVLink allows for transfers of 100 GB/s between the GPUs. NVLink also creates a flat memory space so that both the GPU and the CPU can access system and graphics memory directly.

Quadro RTX 8000
CUDA Parallel-Processing Cores4,608
NVIDIA Tensor Cores576
NVIDIA RT Cores72
GPU Memory48 GB GDDR6 ECC memory
FP32 Performance16.3 TFLOPS
Max Power Consumption / Total Dissipated Power (TDP)Board: 295W
GPU: 260W
Graphics BusPCI Express 3.0 x 16
Display ConnectorsDP 1.4 (4), VirtualLink (1)
Form Factor4.4” (H) x 10.5” (L) Dual Slot

Processors with Deep Learning Performance

The 3XS Data Science Workstation has dual Intel Xeon processors. Each has 16 cores and hyper-threading to provide the workstation with 64 logical processors. These CPUs run from a base clock speed of 2.10 GHz up to 3.20 GHz. 

Along with the 64 threads, these Cascade Lake processors include Intel’s DL Boost technology. DL Boost is designed to accelerate the performance of common AI functions.

No matter what the computing problem is to be solved, the accuracy of the final result is critical. Therefore, in general purpose computing, the push has been towards higher precision computing at faster speeds. 

Accelerating AI functions is sometimes counterintuitive. Research in AI training and inference has shown that computing at lower precision can yield results in training and in inference that are equally accurate. This research lies at the heart of Intel’s DL Boost technology. 

Reducing the computing precision for deep learning has two benefits. One benefit is to reduce memory bandwidth requirements. A second benefit is to reduce the transistors and power needed to perform operations. 

 

Intel Xeon Silver 4216with DL Boost

Intel Xeon. Silver 4216 CPUs support DL Boost for AI

In other words, the same number of transistors and amount power can process more data. This optimization is executed with fused-multiply add (FMA) units. The FMA function resembles the operations of the Quadro RTX 8000 Tensor Cores.

To implement this, Intel created a new instruction set, Intel Advanced Vector Extensions 512 (Intel AVX-512). It was originally developed for Xeon Phi co-processors. Today, the Intel AVX-512 instruction set is also available on Intel Xeon Scalable processors like the two Intel Xeon Silver 4216 processors in Scan’s 3XS Data Science Workstation.

Pre-configured for AI and Data Science

The Scan 3XS Data Science Workstation is pre-configured with a thoroughly tested, GPU-accelerated, software stack. The pre-configured software is tailored to AI and data science.

The core of the GPU-acceleration for AI and data science comes from RAPIDS. A properly configured AI & data science workstation needs RAPIDS, CUDA-X, Python, multiple deep learning frameworks, and GPU-optimized libraries. NVIDIA Docker containerization, which simplifies development and deployment with GPU acceleration, is included in the Scan DSW software stack as well.

The benefit of Scan doing all of this configuration work is simple.

I pushed the 3XS power button and in fewer than 15 minutes, I was logged into the NVIDIA GPU Cloud (NGC) and downloaded my first dataset. Compare that to hours, if not days, in order to properly configure Python, Docker, CUDA, RAPIDS, and the needed frameworks.

Scan Computers 3XS Data Science Workstation

Boot and go. Fifteen minutes from start up to workload.

Given the completeness of the pre-installed software stack, it is very likely that you will be running existing projects in a similarly short timeframe. This makes adding the performance that the Scan system offers into running projects very simple, if not seamless.

Data scientists can access tools and interfaces like Python with which they are already familiar. Many deep learning frameworks are included. The core software accelerating data science and AI lies in RAPIDS, an open-source collection of GPU-accelerated libraries and APIs for data science and artificial intelligence.

RAPIDS focuses on GPU acceleration for the entire AI and data science workflow from end-to-end. Importantly, it interfaces to Python above it and uses the CUDA libraries below it. This combination delivers the familiar Python environment on top of the GPU-optimized CUDA platform.

The Data Science Workstation Software Stack

The 3XS Data Science Workstation software stack. Source: NVIDIA

RAPIDS adds the GPU acceleration to the software stack and Dask adds the scaling. Dask scales the software stack to multi-GPU and multi-threaded CPU environments. Dask is able to scale Python, RAPIDS, and a whole host of libraries.

This software stack is complete.

  • It includes programming languages preferred for AI and data science. 
  • It contains deep learning frameworks. 
  • It includes a GPU accelerated core. 
  • It scales from a single workstation to data center clusters.

Linear Performance

Since this is not a typical workstation, I was curious to test the performance. I tested two AI workloads. The first used NVIDIA’s TensorFlow container for image recognition. The training processes images and the performance is measured in images per second. A second AI test is for natural language and utilizes a large database. Finally, I used OmniSci as an application test using databases for commercial flight analysis and for city resource management. 

Performance testing had two targets in this review. One result that seemed possible to simulate was different GPU memory size configurations. The most interesting result that I wanted to understand was performance-scaling with multiple GPUs. Let’s look at both.

Each of the Quadro RTX 8000 boards has 48 GB of memory. It is possible to change the batch size for the workloads. A GPU with less memory requires a smaller batch size to run successfully. A GPU with a much larger set of memory, like the Quadro RTX 8000, can run workloads with larger batch sizes. 

I wanted to see how changes in the batch size effected performance. Since GPUs with less memory require smaller batch sizes, I ran tests with a batch size parameter from 32 to 768. The result? As the batch size value increased, so did performance. 

The TensorFlow Resnet results increased thirty percent. The performance increased with batch sizes from 32 to 256. After 256, the performance remained at the same level.

TensorFlow Dual vs Single GPU

Dual GPU performance gains stayed above 100%

TensorFlow Dual GPU

Batch size increases mimic smaller to larger graphics memory capacity.

The Big LSTM workload from the NVIDIA examples increased performance three-fold. It showed performance gains for each incremental increase in the batch size parameter. 

I was particularly interested to test single GPU vs dual GPU performance using this system. A simple parameter change allowed me to run tests in both single and dual GPU modes. 

I expected to see good scaling, perhaps around 90%. In fact, the scaling in this system consistently showed a performance gain just over 100%. At this time, I can only speculate, but it would seem that the dual GPU configuration with NVLink’s ability to create a larger, flat memory model model addressable by both the GPU and CPU may contribute to the better-than-double results.

Big LSTM Dual GPU

More graphics memory should provide significant performance gains

Big LSTM Dual vs Single GPU

Big LSTM performance also benefits from dual GPUs

For application testing, I used OmniSci’s Immerse platform. Immerse provides interactive investigation of large data sets. OmniSci develops software for making queries on extremely large data sets in real-time by using the parallelism of GPUs.

I began with a data set on flight delays. This data set is not so large and the Scan 3XS could handle it easily. I then tested a larger data set of New York City. Interactively searching and visualizing data from the city stressed the system more. It loaded a single Quadro RTX 8000 up to approximately 25-30% while moving smoothly through the data.

This data set has seven million rows. The company specializes in tools that allow interactive investigation and analysis of very large data sets with billions of rows of data. By leveraging highly parallel GPU computing, OmniSci provides interactive data query responses which normally require hours of wait-time. 

The 3XS Data Science Workstation can certainly handle larger data sets. The capacity of the system, as configured, already takes workloads that would normally be destined for a rack of GPU server systems. 

OmniSci Immerse on the Scan 3XS Data Science Workstation

OmniSci Immerse provides interactive analysis of large data sets.

In this test we can see that a reasonable size data set barely stresses a single Quadro RTX 8000. This data science workstation still has excess performance to apply to very tough problems in data analysis.

How does this compare to CPU-only performance? Using dual Quadro RTX 8000 GPUs accelerates end-to-end AI workflows ten times faster than CPU-only processing. In some cases, it is not even a question of how much faster the performance is. It’s a question of being able to do the job or of not being able to do the job. Massive database query applications like that from OmniSci simply cannot exist without GPU acceleration.

Professional Support

The first point that I must make: support at Scan has been consistent for years.

Scan Computers has been involved in developing high-end workstations for data science and AI for years. The company has invested and built a team to support customers at the IT level and also at the project level.

Scan was the first  certified NVIDIA Elite Solution Provider in Europe. They support your projects with their own expertise as well as that of NVIDIA.

The company’s dedication to this scientific market includes a team with  multiple data scientists. Scan delivers hardware and software, … of course. But with the experienced team at Scan, the company can also accompany your projects as you progress through each stage.

The Data Science Workstation

The Scan Data Science Workstation comes with  professional support by a team with years of experience in AI and Data Science.

Evolving with Your Computing

Scan delivers a turnkey solution. Scan has in-house experts to accompany your projects. And Scan also can support your evolving needs in computing.

When projects scale up and you need GPU-servers, then Scan can help. When you want cloud workstation, then Scan can help. When you need HPC servers, then Scan can help. When you need specialized applications, then Scan can help.

The point here is not that Scan Computers is a one-stop-shop for your IT. No. The company is more than that. They have the years of experience, the in-house expertise, and the product lines that you might need as your computing requirements in AI and Data Science grow. As your computing evolves, Scan supports your computing at each step.

Why do you need a Data Science Workstation?

For those who don’t have on-going projects in AI or data science, you may wonder how other companies are using these technologies. Imagine processes in your own organizations which rely on teams with expert knowledge to perform repetitive and time-consuming tasks. These could be good candidates for AI.

Workstation applications in CAD, design, and special effects address specific users in manufacturing, product development, film, etc. AI and data science, on the other hand, span broad ranges of industry. Here are a few examples.

In manufacturing, AI is being used to take computer vision and quality assurance to new levels. Financial institutions are using AI to detect fraudulent transactions. Telecommunication companies use AI to increase call clarity with AI-based noise reduction. Transmission infrastructure networks can be optimized for coverage and cost.

Retail companies face traditional challenges in inventory management, forecasting, and logistics. They apply technology to manage 100s of thousands of products in 10’s of thousands of stores. Healthcare companies are creating field-operable devices for DNA sequencing to allow discovery of pathogens more rapidly. And data science techniques are being applied to the development of drugs and vaccines. 

A Final Perspective

Data Science and AI technology is permeating many business sectors and is changing the competitive relationships between companies. These technologies impact domains as diverse as image recognition to natural language processing, visualization, video, smart cities, telecommunications, finance, and transportation. 

Scan’s long-term investment in AI and data science means that they have built a team of experts to support your technology journey. And the company’s product range gives you a partner who can deliver workstation solutions for the office, servers for the data center, and virtualized computing for efficient computing access.

The data science workstation from Scan is designed with optimized, enterprise-grade hardware. The technology is balanced to deliver performance in machine learning and big data analytics.

The software stack includes the major tools needed for AI and data science. It is pre-installed and tested so that your team can be productive minutes after booting the system.

Scan Computers 3XS Data Science Workstation

The Scan Data Science Workstation comes with experts pre-installed.

More than just fast hardware and preinstalled software, the Scan 3XS Data Science Workstation is a solution that combines Scan’s hardware, a complete set of software tools, and experts in AI and data science. It is a turnkey solution complete with experts to support you. Your team will be off to a fast start as well as having a partner for the rest of the AI and data science journey.

Your Time Is Valuable

At Professional Workstation, we appreciate your time. And we appreciate when workstation users like you share how you work, when you tell us what you use for hardware, and when you give us your opinion.

Thank You!

To give back to users like you, we'll give away a fast, durable external hard drive to a lucky survey participant. It's optional of course, but we'll add your name to the drawing for each survey that you complete. 

Subscribe!

And, if you sign up for our newsletter below, then we'll add your name one more time. 

Thank you for participating in the Professional Workstation surveys!

You could win...

You might win one of the fast external hard drives that we're giving away like this G-Technology, G-DRIVE Mobile Pro SSD.

We'll give away a fast, durable external hard drive to a lucky survey participant. 

We'll add your name for each survey. And if you sign up for our newsletter below, then we'll add your name one more time.

PW Email Subscription

Go to top