CHAI PRIZE
THE LLM COMPETITION
$1 MILLION CASH PRIZE
STARTS JUNE 19TH 2023
STARTS JUNE 19TH 2023
The world's first open community challenge with real-user evaluations.
Submit your model and compare how you rank against other teams.
Accelerating community AGI.
Accelerating community AGI.
How we will be evaluating model performance
Language models are difficult to evaluate, and it is therefore difficult to condense model performance into a single evaluation metric.
This is why we are launching the world’s first community-based evaluation method:
user activity, measured by deploying your model directly to millions of users.
We believe that by combining online user activity based off interactions with your model,
together with a suite of offline evaluation metrics,
the community will be able to accelerate the path towards open AGI.
Prize contenders
#
1
2
3
4
5
6
7
8
9
Team
Stability AI
Together
Nomic
Pygmalion
Mosaic
UC Berkeley
Lmsys
Meta
EleutherAI
Model
AlphaChat
INCITE-Chat-3B
GPT4ALL
Pygmalion 6B
MPT-7B-Chat
Koala 13B
Vicuna 13B
LLaMA 7B
GPT-J 6B
Members


















Scores
2.78
1.34
1.33
1.02
0.98
0.81
0.80
0.68
0.67
Entries
108
212
82
23
49
63
56
14
97
(Leaderboard for illustration purposes only)
Guanaco Competition Format
Chai Reward Model (Small)
We will be open-sourcing Chai’s reward model (GPT2 classifier), which is trained directly on 170M user-generated signals, predicting whether or not a conversation is likely to continue given a message completion. You can use this model for offline evaluation or integrate it as part of your RLHF pipeline.

170M
Supervised-target trained
250M
GPT2 Classifier
You will be training
Language models are expensive to train. To ensure that the competition is accessible for everyone, we will be experimenting a range of base models, the 3B model from together.xyz will have the fastest iteration speed.

Model Evaluation
Once your model has been uploaded, we will be running an internal AI safety classifier to ensure your model is safe to be deployed. Depending on the number of submissions, we will be selecting top-performing models based on offline evaluation metrics for real user A/B-testing.

Chai AI Safety Classifier

1M+
Active Users
Real-user evaluation
1
2
3
4
5
Public Leaderboard
© 2023 CHAI RESEARCH CORP. ALL RIGHTS RESERVED