Cerebras and Neural Magic Unlock the Power of Sparse LLMs for Faster, More Power Efficient, Lower Cost AI Model Training and Deployment

Cerebras Systems, the pioneer in accelerating generative AI, and Neural Magic, a leader in high-performance enterprise inference servers, today announced the groundbreaking results of their collaborat...

Business Wire

70% Sparse Models Are 3x Faster with No Loss of Accuracy

SUNNYVALE, Calif. & CAMBRIDGE, Mass.: Cerebras Systems, the pioneer in accelerating generative AI, and Neural Magic, a leader in high-performance enterprise inference servers, today announced the groundbreaking results of their collaboration for sparse training and deployment of large language models (LLMs). Achieving an unprecedented 70% parameter reduction with full accuracy recovery, training on Cerebras CS-3 systems and deploying on Neural Magic inference server solutions enables significantly faster, more efficient, and lower cost LLMs, making them accessible to a broader range of organizations and industries.

“For the first time ever, we achieved up to 70% sparsity for a foundational model, such as Llama, with full accuracy recovery for challenging downstream tasks,” said Sean Lie, CTO and co-founder of Cerebras. “This breakthrough enables scalable training and accelerated inference – our CS-3 system provides near theoretical acceleration for training sparse LLMs, and Neural Magic’s inference server, DeepSparse, delivers up to 8.6x faster inference than dense, baseline models.”

With native hardware support for unstructured sparsity, the Cerebras CS-3 system accelerates training for 70% and higher sparse models – far exceeding the yet unrealized peak on GPUs like H100 and B100. This is because GPU sparsity is limited and rigid – with only 50% support using a fixed ratio. With the CS-3 system, purpose-built for sparse models with the industry's highest memory bandwidth, AI practitioners can employ novel techniques from Neural Magic, such as sparse pretraining and sparse fine-tuning to their datasets, to create highly sparse LLMs without sacrificing accuracy. The results are faster, smaller models which retain the full accuracy of their slower, dense counterparts.

“Together with Cerebras and their purpose-built AI hardware, we created sparse, foundational models that deliver lightning-fast inference through our sparsity-aware software platform,” said Mark Kurtz, CTO of Neural Magic. “This paradigm shift provides enterprises and researchers alike with much more efficient, cost-effective, and accessible deployment of LLMs across a wide range of industries and real-world applications.”

To facilitate the adoption and further development of sparse LLMs, Cerebras and Neural Magic have released the models, recipes, implementations, and documentation of this sparsity breakthrough. For more information, please visit https://neuralmagic.com/blog/unlocking-affordable-and-sustainable-ai-through-sparse-foundational-llms/.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building from the ground up a new class of AI supercomputer. Our flagship product, the CS-3 system, is powered by the world’s largest and fastest AI processor, our Wafer-Scale Engine-3. CS-3s are quickly and easily clustered together to make the largest AI supercomputers in the world, and make placing models on the supercomputers dead simple by avoiding the complexity of distributed computing. Leading corporations, research institutions, and governments use Cerebras solutions for the development of pathbreaking proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on premise. For further information, visit https://www.cerebras.net.

About Neural Magic

Neural Magic accelerates AI for the enterprise and brings operational simplicity to GenAI deployments. As a software-delivered solution, Neural Magic optimizes open-source models, like large language models, to run efficiently on commodity hardware. Organizations can spend less to advance AI initiatives to production, without sacrificing performance and accuracy with their models. Founded by a MIT professor and an AI research scientist, challenged by the constraints of existing hardware, Neural Magic enables a future where developers and IT can tap into the power of state-of-the-art, open-source AI with none of the friction.

Fonte: Business Wire

Last News

Sparkle and Telsy test Quantum Key Distribution in practice

Successfully completing a Proof of Concept implementation in Athens, the two Italian companies prove that QKD can be easily implemented also in pre-existing…

Dronus gets a strategic investment by Eni Next

Eni's VC company invest in the Italian drone company to develop new solutions for industrial plants monitoring

Technology Reply wins the 2024 Oracle Partner Awards - Europe South Innovation

Oracle recognizes Technology Reply’s ability to develop and deliver pioneering solutions through partnering with Oracle

25 Italian Startups Will Be Present at Expand North Star 2024

Scheduled for October, the world's largest startup event will bring together more than 2,000 exhibitors in Dubai, UAE

G11 Media Networks

InnovationOpenLab is a channel of BitCity, a newspaper registered at the court of Como ,
n. 21/2007 del 11/10/2007- Registration ROC n. 15698

G11 MEDIA S.R.L. Registered office Via NUOVA VALASSINA, 4 22046 MERONE (CO) - P.IVA/C.F.03062910132 Como business register n. 03062910132 - REA n. 293834 CAPITALE SOCIALE Euro 30.000 i.v.

Over 50% of Fortune 500 Now Use AuditBoard’s Modern Connected Risk Platform

Evidation Partners with 1upHealth to Integrate Diverse Sources of Permissioned Real-World Health Data for Research

Mirantis OpenStack for Kubernetes First to Deliver Enterprise-Grade OpenStack Caracal

Palomar Health and Palomar Health Medical Group Partner With IKS Health to Leverage Cutting-Edge Healthcare Technology and Services to Drive a Holistic Transformation

Josh Johnstone Joins Valor PayTech as Vice President of Marketing

Footwear Brand Cole Haan Embarks on a Digital Growth Strategy with Jesta I.S. ERP & Endless Aisle Technology

Protect AI Honored by Goldman Sachs for Entrepreneurship

Cash App Pay Integrates with Lyft for Seamless Payments

Cerebras and Neural Magic Unlock the Power of Sparse LLMs for Faster, More Power Efficient, Lower Cost AI Model Training and Deployment

Related news

Last News

Sparkle and Telsy test Quantum Key Distribution in practice

Dronus gets a strategic investment by Eni Next

Technology Reply wins the 2024 Oracle Partner Awards - Europe South Innovation

25 Italian Startups Will Be Present at Expand North Star 2024

Most read

Cost-cutting, GenAI Drive Contact Center Outsourcing

Rithum Names New CMO and CFO

Universal Robots Expands ‘Beyond the Welding Cart’ at FABTECH 2024

Data Center Cooling Business Research Report 2024: Global Market to Surpass…

G11 Media Networks

Cerebras and Neural Magic Unlock the Power of Sparse LLMs for Faster, More Power Efficient, Lower Cost AI Model Training and Deployment

Related news

Last News

Most read

Newsletter signup

G11 Media Networks