lmms-eval 0.6

Trusted evaluation
for multimodal models

Run lmms-eval on any dataset with verified, reproducible results. Evaluation as a service — from CLI to leaderboard.

Evaluation as a service

Submit your model, pick your benchmarks, and get verified results.

Trusted Verification

Run evaluations in a controlled environment. Results are cryptographically verified and reproducible.

CLI & API

Use our CLI tool with your API key, or submit evaluations directly from the web interface.

Web Interface

Configure and launch evaluation jobs from your browser. Monitor progress in real-time.

Modal Inference

Powered by Modal's serverless GPUs. No infrastructure setup required — just point to your model.

Leaderboard

Compare your model against others on the nano-VLM leaderboard and community benchmarks.

Extensible

Bring any HuggingFace model and dataset. Supports custom evaluation configurations.

How it works

01

Authenticate

Sign in with your GitHub account and get your API key.

02

Configure

Specify your model's HuggingFace repo and the datasets to evaluate.

03

Run

Submit via web or CLI. We run inference on Modal's GPU infrastructure.

04

Verify

Get verified, reproducible results. Optionally publish to the leaderboard.

Works from your terminal

Integrate evaluation into your workflow with a single command.

# Install the CLI
$ pip install lmms-eval
# Authenticate
$ lmms-eval auth --api-key YOUR_API_KEY
# Run evaluation
$ lmms-eval --model hf/your-org/your-model \
--tasks mme,mmmu_val \
--submit