Nano Banana (Gemini 2.5 Flash Image): Try it on LMArena “Nano-Banana” is the codename that was used on LMArena during testing for what is now known as: Gemini 2.5 Flash Image. Try it for yourself directly on LMArena.ai
Introducing BiomedArena.AI: Evaluating LLMs for Biomedical Discovery LMArena is honored to partner with the team at DataTecnica to advance the expansion of BiomedArena.ai: a new domain-specific evaluation track.
A Deep Dive into Recent Arena Data Today, we're excited to release a new dataset of recent battles from LMArena! The dataset contains 140k conversations from the text arena.
Search Arena & What We’re Learning About Human Preference Search Arena on LMArena goes live today, read more about what we've learned so far about human preference with the search-augmented data.
Hello from LMArena: The Community Platform for Exploring Frontier AI At LMArena, everything starts with the community. There have been a lot of new members joining us in the past few months so we thought it would be a good time to reintroduce ourselves! Created by researchers from UC Berkeley’s SkyLab, LMArena is an open platform where everyone can
LMArena and The Future of AI Reliability About a month ago, we announced that LMArena was becoming a company to better support our growing community platform. As we take this next step, we're staying true to our original mission of rigorous, neutral, and community-driven evaluations. Today, we’re excited to share that we’ve raised
Celebrating Community Impact: 3M+ votes, 400+ models, and 300+ pre-release tests To date, the community has evaluated over 400+ public models on LMArena as well as 300+ pre-release tests. Tens of millions of battle pairings have been served to users across the world, and each vote has shaped real-world AI performance and development. Around this time two years ago, the community
Does Sentiment Matter Too? Introducing Sentiment Control: Disentangling Sentiment and Substance Contributors: Connor Chen Wei-Lin Chiang Tianle Li Anastasios Angelopoulos Introduction You may have noticed that recent models on Chatbot Arena appear more emotionally expressive than their predecessors. But does this added sentiment actually improve their rankings on the leaderboard? Our previous exploration revealed
How Many User Prompts are New? We investigate 355,575 LLM battles from May 2024 to Dec 2024 to answer the following questions: 1. What proportion of prompts have never been seen before (aka “fresh”)? 2. What are common duplicate prompts? 3. How many prompts appear in widely used benchmarks?
LMArena is Growing to Support our Community Platform LMArena started as a scrappy academic project from UC Berkeley: just a handful of PhD students and undergrads working day and night on a research prototype. Today, we have two announcements: 1. We are starting a company to support LMArena! LMArena will stay neutral, open, and accessible to everyone. We
Introducing the Search Arena: Evaluating Search-Enabled AI Authors Mihran Miroyan* Tsung-Han Wu* Logan King Tianle Li Anastasios N. Angelopoulos Wei-Lin Chiang Narges Norouzi Joseph E. Gonzalez TL;DR 1. We introduce Search Arena, a crowdsourced in-the-wild evaluation platform for search-augmented LLM systems based on human preference. Unlike LM-Arena or SimpleQA, our data focuses on current events and
LMArena Community Updates: Looking Ahead Today, we’re excited to begin sharing community updates in our blog as we continue to make progress towards long-term growth.
WebDev Arena: A Live LLM Leaderboard for Web App Development WebDev Arena allows users to test LLMs in a real-world coding task: building interactive web applications.
RepoChat Arena RepoChat lets models automatically retrieve relevant files from the given GitHub repository. It can resolve issues, review PRs, implement code, as well as answer higher level questions about the repositories-all without requiring users to provide extensive context.