LMArena Team

Created by researchers from UC Berkeley’s SkyLab, LMArena is an open platform where everyone can easily access, explore and interact with the world’s leading AI models.

Catch me if you can! How to beat GPT-4 with a 13B model

Catch me if you can! How to beat GPT-4 with a 13B model

Authors Shuo Yang* Wei-Lin Chiang* Lianmin Zheng* Joseph E. Gonzalez Ion Stoica Announcing Llama-rephraser: 13B models reaching GPT-4 performance in major benchmarks (MMLU/GSK-8K/HumanEval)! To ensure result validity, we followed OpenAI’s decontamination method and found no evidence of data contamination. What’s the trick behind it? Well, rephrasing

Copilot Arena

Copilot Arena

Copilot Arena has been downloaded 2.5K times on the VSCode Marketplace, served over 100K completions, and accumulated over 10K code completion battles.

Chatbot Arena Categories

Chatbot Arena Categories

By grouping tasks into categories, we can assess models’ strengths and weaknesses in a more granular way.

Preference Proxy Evaluations

Preference Proxy Evaluations

Most LLMs are optimized using an LLM judge or reward model to approximate human preference. These training processes can cost hundreds of thousands or millions of dollars. How can we know whether to trust an LLM judge or reward model, given its critical role in guiding LLM training?

Agent Arena

Agent Arena

With the growing interest in Large Language Model (LLM) agents, there is a need for a unified and systematic way to evaluate agents.

Statistical Extensions of the Bradley-Terry and Elo Models

Statistical Extensions of the Bradley-Terry and Elo Models

Chatbot Arena uses the Bradley-Terry model for the purposes of statistical inference on the model strength. Recently, we have developed some extensions of the Bradley-Terry model, and the closely related Elo model, for the purpose of binary-comparison inference problems.

RedTeam Arena

RedTeam Arena

We are excited to launch RedTeam Arena, a community-driven redteaming platform, built in collaboration with Pliny and the BASI community!