Open source AI champion Hugging Face is making $10 million in GPU compute available to the public in a bid to ease the financial burden of model development faced by smaller dev teams.
The program, called ZeroGPU, was announced by Hugging Face CEO Clem Delangue via Xitter on Thursday.
“The open source community doesn’t have the resources available to train and demo these models that the big tech have at their disposal, which is why ChatGPT remains the most used AI application today,” he wrote.
“Hugging Face is fighting this by launching ZeroGPU, a shared infrastructure for indie and academic AI builders to run AI demos on Spaces, giving them the freedom to pursue their work without financial burden.”
Founded in 2016, Hugging Face has become a go-to source of open source AI models which have been optimized to run on a wide variety of hardware – thanks in part to close partnerships with the likes of Nvidia, Intel, AMD and others.
Delangue regards open source as the way forward for AI innovation and adoption, so his biz is making a bounty of complete resources available to whoever needs it. ZeroGPU will be made available via its application hosting service and run atop Nvidia’s older A100 accelerators – $10 million worth of them — on a shared basis.
This setup differs from the way many cloud providers rent GPU resources. Customers often require long-term commitments in order to get the best deals, which can be limiting for smaller players who can’t predict the success of their models ahead of time. The Big Cloud model is also problematic for larger ones trying to commercialize the models they already have.
Stability AI’s GPU commitments were reportedly so large that the British model builder behind the wildly popular Stable Diffusion image generator actually defaulted on its AWS bills.
The shared nature of Hugging Face’s approach means that – at first at least – it will be limited to AI inferencing, rather than training. Depending on the size of the dataset and model, training even small models can require thousands of GPUs running flat out for extended periods of time. Hugging Face’s admittedly thin support docs state that GPU functions are limited to a maximum of 120 seconds, which is clearly not sufficient for training.
The Register contacted Hugging Face for clarification on the applications of ZeroGPU, and a spokesperson replied that it is “mostly inferencing, but we have exciting ideas for the others.” So watch this space.
In terms of how Hugging Face has gotten around having to dedicate entire GPUs to individual users, there’s no shortage of ways to achieve this, depending on the level of isolation required.
According to Delangue, the system is able to “efficiently hold and release GPUs as needed” – but how that actually plays out under the hood isn’t clear.
Techniques like time slicing to run multiple workloads simultaneously and Nvidia’s multi-instance GPU (MIG) tech – which allows the chip to be partitioned into seven logical GPUs – have previously been employed by cloud providers like Vultr to make GPU compute more accessible to developers.
Another way of going about it is by running the workloads in GPU-accelerated containers orchestrated by Kubernetes. Or Hugging Face could be running serverless functions similar to how Cloudflare’s GPU services work.
However, it’s worth noting there are practical limits to all of these approaches – the big one being memory. Based on the support docs, Hugging Face appears to be using the 40GB variant of the A100. Even running 4-bit quantized models, that’s only enough grunt to support a single 80 billion parameter model. Due to key-value cache overheads, the practical limit will be less.
We’ve asked Hugging Face for clarification on how it’s going about sharing those compute resources. We’ll update if and when there’s new information.
Aleph Alpha enlists Cerebras waferscale supers to train AI for German military
US senators’ AI roadmap aims for $32B in R&D spending
OpenAI co-founder to depart ChatGPT
Datacenters looking to renewables, nuclear, and gas, in quest for more power
At a time when GPUs are a scarce resource – so much so that bit barns like Lambda and CoreWeave are using their hardware as collateral to acquire tens of thousands of additional accelerators – Hugging Face’s offering may come as a relief for startups looking to build AI-accelerated apps based on popular models.
It probably doesn’t hurt that Hugging Face raised $235 million in a Series-D funding round led by all of the AI heavy weights you might expect – including Google, Amazon, Nvidia, AMD, Intel, IBM and Qualcomm.
However, this is also somewhat ironic, in that several of Hugging Face’s biggest supporters are the ones developing the kinds of proprietary models Delangue worries could end up squeezing out smaller AI startups.
ZeroGPU Spaces is in open beta now. ®
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : The Register – https://go.theregister.com/feed/www.theregister.com/2024/05/17/hugging_face_nvidia_zerogpu/
