Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1 405B Model

Cerebras Systems today announced that it has set a new performance record for Llama 3.1 405B – a leading frontier model released by Meta AI. Cerebras Inference generated 969 output tokens per second...

Business Wire

World Record 969 Tokens per Second is up to 75 times Faster than GPUs Offered by Hyperscalers

SUNNYVALE, Calif.: Cerebras Systems today announced that it has set a new performance record for Llama 3.1 405B – a leading frontier model released by Meta AI. Cerebras Inference generated 969 output tokens per second. Data from third party benchmark firm Artificial Analysis shows Cerebras up to 75 times faster than GPU based offerings from hyperscalers. Cerebras Inference is running multiple customer workloads on Llama’s 405B model at 128K full context length and 16-bit precision.

Not only was Cerebras’s performance nearly two orders of magnitude faster than the leading GPU offerings from hyperscalers, but the time to first token, the definitive measure of latency and user experience, was the fastest measured in the world at 240 milliseconds – a fraction of leading GPU clouds.

These breakthroughs represent a significant milestone in AI inference capabilities, enabling instant responses from large language models that typically require several seconds to generate outputs. For the first time, the largest of language models has real time performance and this is only available on Cerebras.

“Scaling inference is critical for accelerating AI and open source innovation. Thanks to the incredible work of the Cerebras team, Llama 3.1 405B is now the world’s fastest frontier model running at a rapid-fire pace of 969 tokens/sec. Through the power of Llama and our open approach, super fast and affordable inference is now in reach for more developers than ever before. We are thrilled to support Cerebras’s latest breakthrough as they continue to push the boundaries in compute for AI,” said Ahmad Al-Dahle, GenAI VP at Meta.

“Today, we enabled one of the largest models in the world -- one of a few select frontier models -- to achieve instant performance,” said Andrew Feldman, co-founder and CEO of Cerebras. “Cerebras holds the world record in Llama 3.1 8B and 70B performance, and with today’s announcement, we're extending our lead to Llama 3.1 405B – delivering 969 tokens per second. By running the largest models at instant speed, Cerebras enables real-time responses from the world’s leading open frontier model. This opens up powerful new use cases, including reasoning and multi-agent collaboration, across the AI landscape.”

As recent work by OpenAI and others has shown, greater inference compute directly translates into greater model accuracy and capability. Traditional LLMs print out their answers without deliberation or reflection, often resulting in incorrect responses for complex problems. Run time techniques that use search and chain-of-thought reasoning, on the other hand, explore different possibilities and provide the best possible answer. This "think before answering" approach provides dramatically better performance on demanding tasks like math, science, and code generation, fundamentally boosting the intelligence of AI models without additional training. By running up to 75x faster than GPU solutions, Cerebras Inference allows AI models to "think" far longer and return more accurate results.

Cerebras Inference is powered by the Cerebras CS-3 system and its industry-leading AI processor, the Wafer Scale Engine 3 (WSE-3). Unlike graphics processing units that force customers to make trade-offs between speed and capacity, the CS-3 delivers best-in-class per-user performance and the lowest latency while delivering high throughput. The massive size of the WSE-3 enables many concurrent users to benefit from blistering speed. With 7,000x more memory bandwidth than the Nvidia H100, the WSE-3 solves Generative AI's fundamental technical challenge: memory bandwidth. Developers can easily access the Cerebras Inference API, which is fully compatible with the OpenAI Chat Completions API, making migration seamless with just a few lines of code.

Availability

Cerebras Inference Llama 3.1 405B with context lengths up to 128K is in customer trials today with general availability in Q1 2025. It is offered at $6 dollars per million input tokens and $12 dollars per million output tokens.

Llama 3.1 8B and 70B are available as free and paid versions on our website.

To learn more visit: www.cerebras.ai.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building from the ground up a new class of AI supercomputer. Our flagship product, the CS-3 system, is powered by the world's largest and fastest commercially available AI processor, our Wafer-Scale Engine-3. CS-3s are quickly and easily clustered together to make the largest AI supercomputers in the world, and make placing models on the supercomputers dead simple by avoiding the complexity of distributed computing. Cerebras Inference delivers breakthrough inference speeds, empowering customers to create cutting-edge AI applications. Leading corporations, research institutions, and governments use Cerebras solutions for the development of pathbreaking proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on premise. For further information, visit www.cerebras.ai or follow us on LinkedIn or X.

Fonte: Business Wire

Last News

RSA at Cybertech Europe 2024

Alaa Abdul Nabi, Vice President, Sales International at RSA presents the innovations the vendor brings to Cybertech as part of a passwordless vision for…

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

G11 Media's SecurityOpenLab magazine rewards excellence in cybersecurity: the best vendors based on user votes

How Austria is making its AI ecosystem grow

Always keeping an European perspective, Austria has developed a thriving AI ecosystem that now can attract talents and companies from other countries

Sparkle and Telsy test Quantum Key Distribution in practice

Successfully completing a Proof of Concept implementation in Athens, the two Italian companies prove that QKD can be easily implemented also in pre-existing…

G11 Media Networks

InnovationOpenLab is a channel of BitCity, a newspaper registered at the court of Como ,
n. 21/2007 del 11/10/2007- Registration ROC n. 15698

G11 MEDIA S.R.L. Registered office Via NUOVA VALASSINA, 4 22046 MERONE (CO) - P.IVA/C.F.03062910132 Como business register n. 03062910132 - REA n. 293834 CAPITALE SOCIALE Euro 30.000 i.v.

WiSA Technologies Inks Definitive Agreement to Acquire CompuSystems, Inc.

Pitch in to Provide: Donate to the Holiday Food Drive hosted by Peraton and Partners

Middle East & Africa Data Center Colocation Market Industry Outlook & Forecast 2024-2029: New Entrants Include Agility, Compass Datacenters, Cloudoon, EDGNEX Data Centres, Kasi Cloud, NED and Nxtra - ResearchAndMarkets.com

i-ESG Accelerates Expansion into the Middle East with Abu Dhabi Entity Establishment

Axcelead DDP Enters into Drug Discovery Service Agreement with Astellas for Targeted Protein Degraders

Dareesoft Accelerates Middle East Expansion With AI Road Hazard PoC Success

Canada RegTech Business and Investment Report 2024-2029: Proactive Compliance Solutions, Integration of Advanced Technologies, Collaboration with Regulatory Bodies Impact on the $877 Million Market - ResearchAndMarkets.com

Volato Group Announces Resolution of Non-Compliance with the NYSE American Rules

Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1 405B Model

Related news

Last News

RSA at Cybertech Europe 2024

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

How Austria is making its AI ecosystem grow

Sparkle and Telsy test Quantum Key Distribution in practice

Most read

Mutual of Omaha and Workday to Help Companies Enhance Employee Benefits…

Sei Labs Releases New “Giga” Roadmap That Will Bring 50x Improvement to…

Swoop Celebrates Triple Recognition in PM360’s 13th Annual Innovations…

Mastercard Finalizes Acquisition of Recorded Future

G11 Media Networks

Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1 405B Model

Related news

Last News

Most read

Newsletter signup

G11 Media Networks