Nvidia Becomes a Major Model Maker With Nemotron 3

by Amelia Forsyth


Nvidia has made a fortune supplying chips to companies working on artificial intelligence, but today the chipmaker took a step toward becoming a more serious model maker itself by releasing a series of cutting-edge open models, along with data and tools to help engineers use them.

The move, which comes at a moment when AI companies like OpenAI, Google, and Anthropic are developing increasingly capable chips of their own, could be a hedge against these firms veering away from Nvidia’s technology over time.

Open models are already a crucial part of the AI ecosystem with many researchers and startups using them to experiment, prototype, and build. While OpenAI and Google offer small open models, they do not update them as frequently as their rivals in China. For this reason and others, open models from Chinese companies are currently much more popular, according to data from Hugging Face, a hosting platform for open source projects.

Nvidia’s new Nemotron 3 models are among the best that can be downloaded, modified, and run on one’s own hardware, according to benchmark scores shared by the company ahead of release.

“Open innovation is the foundation of AI progress,” CEO Jensen Huang said in a statement ahead of the news. “With Nemotron, we’re transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale.”

Nvidia is taking a more fully transparent approach than many of its US rivals by releasing the data used to train Nemotron—a fact that should help engineers modify the models more easily. The company is also releasing tools to help with customization and fine-tuning. This includes a new hybrid latent mixture-of-experts model architecture, which Nvidia says is especially good for building AI agents that can take actions on computers or the web. The company is also launching libraries that allow users to train agents to do things using reinforcement learning, which involves giving models simulated rewards and punishments.

Nemotron 3 models come in three sizes: Nano, which has 30 billion parameters; Super, which has 100 billion; and Ultra, which has 500 billion. A model’s parameters loosely correspond to how capable it is as well as how unwieldy it is to run. The largest models are so cumbersome that they need to run on racks of expensive hardware.

Model Foundations

Kari Ann Briski, vice president of generative AI software for enterprise at Nvidia, said open models are important to AI builders for three reasons: Builders increasingly need to customize models for particular tasks; it often helps to hand queries off to different models; and it is easier to squeeze more intelligent responses from these models after training by having them perform a kind of simulated reasoning. “We believe open source is the foundation for AI innovation, continuing to accelerate the global economy,” Briski said.

The social media giant Meta released the first advanced open models under the name Llama in February 2023. As competition has intensified, however, Meta has signaled that its future releases might not be open source.

The move is part of a larger trend in the AI industry. Over the past year, US firms have moved away from openness, becoming more secretive about their research and more reluctant to tip off their rivals about their latest engineering tricks.



Source link

You may also like

Leave a Comment