LMArena Team

Created by researchers from UC Berkeley’s SkyLab, LMArena is an open platform where everyone can easily access, explore and interact with the world’s leading AI models.

Does Style Matter?

Does Style Matter?

We controlled for the effect of length and markdown, and indeed, the ranking changed. This is just a first step towards our larger goal of disentangling substance and style in Chatbot Arena leaderboard.

Chatbot Arena Conversation Dataset Release

Chatbot Arena Conversation Dataset Release

Since its launch three months ago, Chatbot Arena has become a widely cited LLM evaluation platform that emphasizes large-scale, community-based, and interactive human evaluation. In that short time span, we collected around 53K votes from 19K unique IP addresses for 22 models. In this blog post, we are releasing an

The Multimodal Arena is Here!

The Multimodal Arena is Here!

You can now chat with your favorite vision-language models from OpenAI, Anthropic, Google, and most other major LLM providers to help discover how these models stack up against each other. Contributors: Christopher Chou* Lisa Dunlap* Wei-Lin Chiang Ying Sheng Lianmin Zheng Anastasios Angelopoulos Trevor Darrell Ion Stoica Joseph E. Gonzalez

Introducing Hard Prompts Category in Chatbot Arena

Introducing Hard Prompts Category in Chatbot Arena

Introducing Hard Prompts, a new and challenging category in the Chatbot Arena Leaderboard. Contributors: Tianle Li Wei-Lin Chiang Lisa Dunlap Background Introducing Hard Prompts, a new and challenging category in the Chatbot Arena Leaderboard. Over the past few months, the community has shown a growing interest in more challenging prompts

What's up with Llama 3? Arena data analysis

What's up with Llama 3? Arena data analysis

Authors Lisa Dunlap Evan Frick Tianle Li Isaac Ong Joseph E. Gonzalez Wei-Lin Chiang On April 18th, Meta released Llama 3, their newest open-weight large language model. Since then, Llama 3-70B has quickly risen to the top of the English Chatbot Arena leaderboard with over 50,000 battles. This remarkable

LMSYS Chatbot Arena Kaggle Competition

LMSYS Chatbot Arena Kaggle Competition

Predicting Human Preference with $100,000 in Prizes Overview LMSYS and Kaggle are launching a human preference prediction competition! You are challenged to predict which responses users will prefer in head-to-head battles between Large Language Models (LLMs). You’ll work with a dataset from the Chatbot Arena, containing conversations and

From Live Data to High-Quality Benchmarks - The Arena-Hard Pipeline

From Live Data to High-Quality Benchmarks - The Arena-Hard Pipeline

Authors Tianle Li* Wei-Lin Chiang* Evan Frick Lisa Dunlap Banghua Zhu Joseph E. Gonzalez Ion Stoica Building an affordable and reliable benchmark for LLM chatbots has become a critical challenge. A high-quality benchmark should 1) robustly separate model capability, 2) reflect human preference in real-world use cases, and 3) frequently