Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
This article is part of a VB Special Issue called “Fit for Purpose: Tailoring AI Infrastructure.” Catch all the other stories here.
Data centers are the backend of the internet we know. Whether it’s Netflix or Google, all major companies leverage data centers, and the computer systems they host, to deliver digital services to end users. As the focus of enterprises shifts toward advanced AI workloads, data centers’ traditional CPU-centric servers are being buffed with the integration of new specialized chips or “co-processors.”
At the core, the idea behind these co-processors is to introduce an add-on of sorts to enhance the computing capacity of the servers. This enables them to handle the calculational demands of workloads like AI training, inference, database acceleration and network functions. Over the last few years, GPUs, led by Nvidia, have been the go-to choice for co-processors due to their ability to process large volumes of data at unmatched speeds. Due to increased demand GPUs accounted for 74% of the co-processors powering AI use cases within data centers last year, according to a study from Futurum Group.
According to the study, the dominance of GPUs is only expected to grow, with revenues from the category surging 30% annually to $102 billion by 2028. But, here’s the thing: while GPUs, with their parallel processing architecture, make a strong companion for accelerating all sorts of large-scale AI workloads (like training and running massive, trillion parameter language models or genome sequencing), their total cost of ownership can be very high. For example, Nvidia’s flagship GB200 “superchip”, which combines a Grace CPU with two B200 GPUs, is expected to cost between $60,000 and $70,000. A server with 36 of these superchips is estimated to cost around $2 million.
While this may work in some cases, like large-scale projects, it is not for every company. Many enterprise IT managers are looking to incorporate new technology to support select low- to medium-intensive AI workloads with a specific focus on total cost of ownership, scalability and integration. After all, most AI models (deep learning networks, neural networks, large language models etc) are in the maturing stage and the needs are shifting towards AI inferencing and enhancing the performance for specific workloads like image recognition, recommender systems or object identification — while being efficient at the same time.
>>Don’t miss our special issue: Fit for Purpose: Tailoring AI Infrastructure.<<
This is exactly where the emerging landscape of specialized AI processors and accelerators, being built by chipmakers, startups and cloud providers, comes in.
What exactly are AI processors and accelerators?
At the core, AI processors and accelerators are chips that sit within servers’ CPU ecosystem and focus on specific AI functions. They commonly revolve around three key architectures: Application-Specific Integrated Circuited (ASICs), Field-Programmable Gate Arrays (FPGAs), and the most recent innovation of Neural Processing Units (NPUs).
The ASICs and FPGAs have been around for quite some time, with programmability being the only difference between the two. ASICs are custom-built from the ground up for a specific task (which may or may not be AI-related), while FPGAs can be reconfigured at a later stage to implement custom logic. NPUs, on their part, differentiate from both by serving as the specialized hardware that can only accelerate AI/ML workloads like neural network inference and training.
“Accelerators tend to be capable of doing any function individually, and sometimes with wafer-scale or multi-chip ASIC design, they can be capable of handling a few different applications. NPUs are a good example of a specialized chip (usually part of a system) that can handle a number of matrix-math and neural network use cases as well as various inference tasks using less power,” Futurum group CEO Daniel Newman tells Venturebeat.
The best part is that accelerators, especially ASICs and NPUs built for specific applications, can prove more efficient than GPUs in terms of cost and power use.
“GPU designs mostly center on Arithmetic Logic Units (ALUs) so that they can perform thousands of calculations simultaneously, whereas AI accelerator designs mostly center on Tensor Processor Cores (TPCs) or Units. In general, the AI accelerators’ performance versus GPUs performance is based on the fixed function of that design,” Rohit Badlaney, the general manager for IBM’s cloud and industry platforms, tells VentureBeat.
Currently, IBM follows a hybrid cloud approach and uses multiple GPUs and AI accelerators, including offerings from Nvidia and Intel, across its stack to provide enterprises with choices to meet the needs of their unique workloads and applications — with high performance and efficiency.
“Our full-stack solutions are designed to help transform how enterprises, developers and the open-source community build and leverage generative AI. AI accelerators are one of the offerings that we see as very beneficial to clients looking to deploy generative AI,” Badlaney said. He added while GPU systems are best suited for large model training and fine-tuning, there are many AI tasks that accelerators can handle equally well – and at a lesser cost.
For instance, IBM Cloud virtual servers use Intel’s Gaudi 3 accelerator with a custom software stack designed specifically for inferencing and heavy memory demands. The company also plans to use the accelerator for fine-tuning and small training workloads via small clusters of multiple systems.
“AI accelerators and GPUs can be used effectively for some similar workloads, such as LLMs and diffusion models (image generation like Stable Diffusion) to standard object recognition, classification, and voice dubbing. However, the benefits and differences between AI accelerators and GPUs entirely depend on the hardware provider’s design. For instance, the Gaudi 3 AI accelerator was designed to provide significant boosts in compute, memory bandwidth, and architecture-based power efficiency,” Badlaney explained.
This, he said, directly translates to price-performance benefits.
Beyond Intel, other AI accelerators are also drawing attention in the market. This includes not only custom chips built for and by public cloud providers such as Google, AWS and Microsoft but also dedicated products (NPUs in some cases) from startups such as Groq, Graphcore, SambaNova Systems and Cerebras Systems. They all stand out in their own way, challenging GPUs in different areas.
In one case, Tractable, a company developing AI to analyze damage to property and vehicles for insurance claims, was able to leverage Graphcore’s Intelligent Processing Unit-POD system (a specialized NPU offering) for significant performance gains compared to GPUs they had been using.
“We saw a roughly 5X speed gain,” Razvan Ranca, co-founder and CTO at Tractable, wrote in a blog post. “That means a researcher can now run potentially five times more experiments, which means we accelerate the whole research and development process and ultimately end up with better models in our products.”
AI processors are also powering training workloads in some cases. For instance, the AI supercomputer at Aleph Alpha’s data center is using Cerebras CS-3, the system powered by the startup’s third-generation Wafer Scale Engine with 900,000 AI cores, to build next-gen sovereign AI models. Even Google’s recently introduced custom ASIC, TPU v5p, is driving some AI training workloads for companies like Salesforce and Lightricks.
What should be the approach to picking accelerators?
Now that it’s established there are many AI processors beyond GPUs to accelerate AI workloads, especially inference, the question is: how does an IT manager pick the best option to invest in? Some of these chips may deliver good performance with efficiencies but might be limited in terms of the kind of AI tasks they could handle due to their architecture. Others may do more but the TCO difference might not be as massive when compared to GPUs.
Since the answer varies with the design of the chips, all experts VentureBeat spoke to suggested the selection should be based upon the scale and type of the workload to be processed, the data, the likelihood of continued iteration/change and cost and availability needs.
According to Daniel Kearney, the CTO at Sustainable Metal Cloud, which helps companies with AI training and inference, it is also important for enterprises to run benchmarks to test for price-performance benefits and ensure that their teams are familiar with the broader software ecosystem that supports the respective AI accelerators.
“While detailed workload information may not be readily in advance or may be inconclusive to support decision-making, it is recommended to benchmark and test through with representative workloads, real-world testing and available peer-reviewed real-world information where available to provide a data-driven approach to choosing the right AI accelerator for the right workload. This upfront investigation can save significant time and money, particularly for large and costly training jobs,” he suggested.
Globally, with inference jobs on track to grow, the total market of AI hardware, including AI chips, accelerators and GPUs, is estimated to grow 30% annually to touch $138 billion by 2028.
Source link