Trusted evaluation
for multimodal models
Run lmms-eval on any dataset with verified, reproducible results. Evaluation as a service — from CLI to leaderboard.
Evaluation as a service
Submit your model, pick your benchmarks, and get verified results.
Trusted Verification
Run evaluations in a controlled environment. Results are cryptographically verified and reproducible.
CLI & API
Use our CLI tool with your API key, or submit evaluations directly from the web interface.
Web Interface
Configure and launch evaluation jobs from your browser. Monitor progress in real-time.
Modal Inference
Powered by Modal's serverless GPUs. No infrastructure setup required — just point to your model.
Leaderboard
Compare your model against others on the nano-VLM leaderboard and community benchmarks.
Extensible
Bring any HuggingFace model and dataset. Supports custom evaluation configurations.
How it works
Authenticate
Sign in with your GitHub account and get your API key.
Configure
Specify your model's HuggingFace repo and the datasets to evaluate.
Run
Submit via web or CLI. We run inference on Modal's GPU infrastructure.
Verify
Get verified, reproducible results. Optionally publish to the leaderboard.
Works from your terminal
Integrate evaluation into your workflow with a single command.