CHAI AI Cluster

1.4 EXAFLOP CLUSTER

POWERING 1.2T TOKENS PER DAY

0K
OPS PER SECOND
0K
LLMS DEPLOYED
0T
TOKENS PER DAY

CHAI has seen demand for its AI models grow exponentially. This demand has exceeded the capacity and ability of off-the-shelf providers. Out of necessity, we have had to verticalize and bring inference in-house. Starting small with a cluster of A5000s rented on-demand from CoreWeave in 2023, we've grown to a cluster size of thousands of GPUs, spread across 4 regions. Multi-region inference has challenges and has brought CHAI to the cutting edge of technology.

TOKENS PROCESSED PER DAY

8.1T
1.2T
1.2T
0.8T
0.4T
0.3T
0.1T
[ GPU Cluster ]

1.4 EXAFLOPS GPU CLUSTER
FOR AI INFERENCE

kCLUSTER

At CHAI, we serve hundreds of in-house trained LLMs across several GPU chip types from both AMD and Nvidia. While open-source solutions such as vLLM work well for simple workloads, we've found that we can further improve upon vLLM by almost an order of magnitude through several optimizations, such as custom kernels and compute-efficient attention approximations.

NUMBER OF GPUS
5000 GPUs
NUMBER OF TOKENS SERVED
1.2T Tokens / s
NUMBER OF UNIQUE LLMS SERVED
51K LLMs
CLUSTER COMPUTE PERFORMANCE
>1.4 Exaflops
NVIDIA A100
NVIDIA A100
NVIDIA L40S
NVIDIA L40S
AMD Mi325x
AMD Mi325x
AMD Mi300x
AMD Mi300x