LMArena Team

Created by researchers from UC Berkeley’s SkyLab, LMArena is an open platform where everyone can easily access, explore and interact with the world’s leading AI models.

Introducing the Search Arena: Evaluating Search-Enabled AI

Introducing the Search Arena: Evaluating Search-Enabled AI

Authors Mihran Miroyan* Tsung-Han Wu* Logan King Tianle Li Anastasios N. Angelopoulos Wei-Lin Chiang Narges Norouzi Joseph E. Gonzalez TL;DR 1. We introduce Search Arena, a crowdsourced in-the-wild evaluation platform for search-augmented LLM systems based on human preference. Unlike LM-Arena or SimpleQA, our data focuses on current events and

LMArena Community Updates: Looking Ahead

LMArena Community Updates: Looking Ahead

Today, we’re excited to begin sharing community updates in our blog as we continue to make progress towards long-term growth.

WebDev Arena: A Live LLM Leaderboard for Web App Development

WebDev Arena: A Live LLM Leaderboard for Web App Development

WebDev Arena allows users to test LLMs in a real-world coding task: building interactive web applications.

RepoChat Arena

RepoChat Arena

RepoChat lets models automatically retrieve relevant files from the given GitHub repository. It can resolve issues, review PRs, implement code, as well as answer higher level questions about the repositories-all without requiring users to provide extensive context.

Arena Explorer

Arena Explorer

We developed a topic modeling pipeline and the Arena Explorer. This pipeline organizes user prompts into distinct topics, structuring the text data hierarchically to enable intuitive analysis. We believe this tool for hierarchical topic modeling can be valuable to anyone analyzing complex text data.

Code Editing in Copilot Arena

Code Editing in Copilot Arena

Copilot Arena enables not only paired code completions but also paired code edits as well. Unlike code completions—which automatically appear after short pauses—code edits are manually triggered by highlighting a code snippet and then writing a short task description.

Catch me if you can! How to beat GPT-4 with a 13B model

Catch me if you can! How to beat GPT-4 with a 13B model

Authors Shuo Yang* Wei-Lin Chiang* Lianmin Zheng* Joseph E. Gonzalez Ion Stoica Announcing Llama-rephraser: 13B models reaching GPT-4 performance in major benchmarks (MMLU/GSK-8K/HumanEval)! To ensure result validity, we followed OpenAI’s decontamination method and found no evidence of data contamination. What’s the trick behind it? Well, rephrasing