Software
LMArena
About
An open-source research project for evaluating Large Language Models using a blind ELO-based leaderboard.
Key Features
- Blind side-by-side model comparisons ๐ฅ
- Dynamic community-driven leaderboard ๐
- API access for researchers ๐งช
Pros
- Unbiased, human-preference metrics ๐ค
- Broad selection of frontier models ๐
Cons
- Subjective evaluation criteria โ๏ธ
- High variance in user prompts ๐
