News

LMArena's Ranking Method

LMArena's Ranking Method

Since launching the platform, developing a rigorous and scientifically grounded evaluation methodology has been central to our mission. A key component of this effort is providing proper statistical uncertainty quantification for model scores and rankings. To that end, we have always reported confidence intervals alongside Arena scores and surfaced any

The Next Stage of AI Coding Evaluation Is Here

The Next Stage of AI Coding Evaluation Is Here

Introducing Code Arena: live evals for agentic coding in the real world AI coding models have evolved fast. Today’s systems don’t just output static code in one shot. They build. They scaffold full web apps and sites, refactor complex systems, and debug themselves in real time. Many now

New Product: AI Evaluations

New Product: AI Evaluations

Today, we’re introducing a commercial product: AI Evaluations. This service offers enterprises, model labs, and developers comprehensive evaluation services grounded in real-world human feedback, showing how models actually perform in practice.