Hugging Nvidia challenges with the new Hugs service | Large language models tutorial python github | Best course on large language models llm | Llm machine learning tutorial pdf | Turtles AI

Hugging Nvidia challenges with the new Hugs service
Reduced costs and hardware compatibility expanded for the models of AI
Isabella V25 October 2024

 

Hugging Nvidia challenges with its new Hugs offer, promising lower costs and a wider compatibility for AI models. Thanks to the containerization, users can implement models on various hardware without having to manually optimize the system.

Key points:

  • Hugging Face Lancia Hugs, Nim of Nvidia Nim competitor.
  • Hugs offers lower costs and support for various hardware.
  • Users can use preconfigured containers without complex optimization.
  • Availability of popular open source models and next expansion.

Hugging Face has recently introduced Hugs, a new service that questioned Nvidia aspirations in the AI ​​software sector. This service is proposed as an alternative solution to the inference microservices (nim) of Nvidia, offering the possibility of performing and distributing large language models (LLM) on a variety of hardware, significantly reducing costs for customers. Hugs is essentially composed of images of continated models, ready for use and designed to simplify the implementation process. Users, rather than having to face the complexities of Vllm or LLM Tensorrt to optimize performance, can simply start a pre -packaged container via Docker or Kubernetes and interact with it using the usual Openai bees calls. This approach, based on open source technologies such as Text Generation Inference (TGI) and Transformers, offers the flexibility to operate on different hardware platforms, including Nvidia and AMD GPUs, with future plans to also support specialized accelerators such as Amazon or TPU inferentia of Google. However, at the moment, Intel Gaudi does not seem to be included in the support. It is important to note that, although Hugs is funded on open source technologies, it is not a free service; The cost for use on cloud platforms such as AWS or Google Cloud is about 1 dollar per hour per container. In comparison, Nvidia charges a similar figure for the use of the cloud GPUs, but requires a significant annual cost for on-premises implementation. For more complex models such as Llama 3.1 405b of Meta, the Hugging Face solution is economically more advantageous, with the possibility of operating on a wide range of hardware, thus avoiding the limitations of the nvidia ecosystem. For those who want to test Hugs on a small scale, the images will also be available on Digitalocean, although the processing costs are still applicable. Digitalocean has recently launched virtual machines enhanced by NVIDIA H100 GPU, with costs ranging from 2.5 to 6.74 dollars per hour. In addition, users subscribed to Hugging Face Hub, paying 20 dollars per month, will be able to implement Hugs on their infrastructure. Hugging Face remains prudent in the choice of models, focusing on some of the most popular in the open source panorama, such as the Llama 3.1 of Meta and the Mistral AI models. A future expansion is also planned with the inclusion of other models, such as those of the Phi series of Microsoft. Despite the Hugging Face offer, those interested can always opt to develop authinarized models independently, using tools such as VLLM or Tensorrt. The true essence of what you pay, both for Hugs and for the nim, resides over time and in the commitment necessary to configure the containers so that they can work at best.

In the Panorama in constant evolution of AI, the competition between Hugging Face and Nvidia could mark a significant turning point.