After years of development, Meta may finally roll out its homegrown AI accelerators in a meaningful way this year.
The Facebook empire confirmed its desire to supplement deployments of Nvidia H100 and AMD MI300X GPUs with its Meta Training Inference Accelerator (MTIA) family of chips this week. Specifically, Meta will deploy an inference-optimized processor, reportedly codenamed Artemis, based on the Silicon Valley giant’s first-gen parts teased last year.
“We’re excited about the progress we’ve made on our in-house silicon efforts with MTIA and are on track to start deploying our inference variant in production in 2024,” a Meta spokesperson told The Register on Thursday.
“We see our internally developed accelerators to be highly complementary to commercially available GPUs in delivering the optimal mix of performance and efficiency on Meta-specific workloads,” the rep continued. Details? Nope. The spokesperson told us: “We look forward to sharing more updates on our future MTIA plans later this year.”
We’re taking that to mean the second-gen inference-focused chip is rolling out widely, following a first-gen lab-only version for inference, and we may find out later about parts intended primarily for training or training and inference.
Meta has become one of Nvidia and AMD’s best customers as its deployment of AI workloads has grown, increasing its need and use of specialized silicon to make its machine-learning software run as fast as possible. Thus, the Instagram giant’s decision to develop its own custom processors isn’t all that surprising.
In fact the mega-corp, on the face of it, is relatively late to the custom AI silicon party in terms of real-world deployment. Amazon and Google have been using homegrown components to accelerate internal machine-learning systems, such as recommender models, and customer ML code for some years. Meanwhile, Microsoft revealed its homegrown accelerators last year.
But beyond the fact that Meta is rolling out a MTIA inference chip at scale, the social network hasn’t disclosed its precise architecture nor which workloads it’s reserving for in-house silicon and which it’s offloading to AMD and Nvidia’s GPUs.
It’s likely Meta will run established models on its custom ASICs to free up GPU resources for more dynamic or evolving applications. We’ve seen Meta go down this route before with custom accelerators designed to offload data and compute intensive video workloads.
As for the underlying design, the industry watchers at SemiAnalysis tell us that the new chip is closely based on the architecture in Meta’s first-gen parts.
Stepping stones
Announced in early 2023 after three years of development, Meta’s MTIA v1 parts, which our friends at The Next Platform looked at last spring, were designed specifically with deep-learning recommender models in mind.
The first-gen chip was built around a RISC-V CPU cluster and fabbed using TSMC’s 7nm process. Under the hood, the component employed an eight-by-eight matrix of processing elements each equipped with two RV CPU cores, one of which is outfitted with vector math extensions. These cores fed from a generous 128MB of on-chip SRAM and up to 128GB of LPDDR5 memory.
AMD slaps together a silicon sandwich with MI300-series APUs, GPUs to challenge Nvidia’s AI empire
Zuckerberg wants to build artificial general intelligence with 350K Nvidia H100 GPUs
Qualcomm signals its PC push will coincide with back to school sales and be tied to a Windows launch
Singtel does the ‘we’re building datacenters to host Nvidia clusters’ thing
As Meta claimed last year, the chip ran at 800 MHz and topped out at 102.4 trillion operations per second of INT8 performance or 51.2 teraFLOPS at half precision (FP16). By comparison, Nvidia’s H100 is capable of nearly four petaFLOPS of sparse FP8 performance. While nowhere near as powerful as either Nvidia or AMD’s GPUs, the chip did have one major advantage: Power consumption. The chip itself had a thermal design power of just 25 watts.
According to SemiAnalysis, Meta’s latest chip boasts improved cores and trades LPDDR5 for high-bandwidth memory packaged using TSMC’s chip-on-wafer-on-substrate (CoWoS) tech.
Another notable difference is Meta’s second-gen chip will actually see widespread deployment across its datacenter infrastructure. According to the Facebook titan, while the first-gen part was used to run production advertising models, it never left the lab.
Chasing artificial general intelligence
Custom parts aside, the Facebook and Instagram parent has dumped billions of dollars on GPUs in recent years to accelerate all manner of tasks ill-suited to conventional CPU platforms. However, the rise of large language models, such as GPT-4 and Meta’s own Llama 2, have changed the landscape and driven the deployment of massive GPU clusters.
At the scale Meta operates, these trends have necessitated drastic changes to its infrastructure, including the redesign of several datacenters to support the immense power and cooling requirements associated with large AI deployments.
And Meta’s deployments are only going to get larger over the next few months as the company shifts focus from the metaverse to the development of artificial general intelligence. Supposedly, work done on AI will help form the metaverse or something like that.
According to CEO Mark Zuckerberg, Meta plans to deploy as many as 350,000 Nvidia H100s this year alone.
The biz also announced plans to deploy AMD’s newly launched MI300X GPUs in its datacenters. Zuckerberg claimed his corporation would end the year with the equivalent computing power of 600,000 H100s. So clearly Meta’s MTIA chips won’t be replacing GPUs anytime soon. ®
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : The Register – https://go.theregister.com/feed/www.theregister.com/2024/02/02/meta_ai_chips/