Animation of astronaut in nature, Chai AI
CHAI PRIZE
THE LLM COMPETITION
$1 MILLION CASH PRIZE
STARTS JUNE 19TH 2023
The world's first open community challenge with real-user evaluations. Submit your model and compare how you rank against other teams.
Accelerating community AGI.
Partners
How we will be evaluating model performance
Language models are difficult to evaluate, and it is therefore difficult to condense model performance into a single evaluation metric. This is why we are launching the world’s first community-based evaluation method: user activity, measured by deploying your model directly to millions of users. We believe that by combining online user activity based off interactions with your model, together with a suite of offline evaluation metrics, the community will be able to accelerate the path towards open AGI.
Prize contenders
#
1
2
3
4
5
6
7
8
9
Team
Stability AI
Together
Nomic
Pygmalion
Mosaic
UC Berkeley
Lmsys
Meta
EleutherAI
Model
AlphaChat
INCITE-Chat-3B
GPT4ALL
Pygmalion 6B
MPT-7B-Chat
Koala 13B
Vicuna 13B
LLaMA 7B
GPT-J 6B
Members
Stability AI logo
together.xyz logo
nomic logo
Pygmalion logo
Mosaic logo
UC Berkeley logo
lmsys logo
Meta logo
EleutherAI logo
gold medal
gold medal
gold medal
silver medal
silver medal
silver medal
silver medal
bronze medal
bronze medal
Scores
2.78
1.34
1.33
1.02
0.98
0.81
0.80
0.68
0.67
Entries
108
212
82
23
49
63
56
14
97
(Leaderboard for illustration purposes only)
Guanaco Competition Format
Chai Reward Model (Small)
We will be open-sourcing Chai’s reward model (GPT2 classifier), which is trained directly on 170M user-generated signals, predicting whether or not a conversation is likely to continue given a message completion. You can use this model for offline evaluation or integrate it as part of your RLHF pipeline.
Chai AI Reward Model
170M
Supervised-target trained
250M
GPT2 Classifier
You will be training
Language models are expensive to train. To ensure that the competition is accessible for everyone, we will be experimenting a range of base models, the 3B model from together.xyz will have the fastest iteration speed.

Chai AI Model Training
Model Evaluation
Once your model has been uploaded, we will be running an internal AI safety classifier to ensure your model is safe to be deployed. Depending on the number of submissions, we will be selecting top-performing models based on offline evaluation metrics for real user A/B-testing.
Chai AI Safety Classifier
Chai AI Safety Classifier
Chai AI Network Effect
1M+
Active Users
Real-user evaluation
1
2
3
4
5
Public Leaderboard
© 2023 CHAI RESEARCH CORP. ALL RIGHTS RESERVED