LLM-Evals-Catalogue
Catalog of evaluation metrics for large language models
Pricing
See website
Flat rate
Adoption
→StableLicense
Open Source
Data freshness
—Overview
What is LLM-Evals-Catalogue?
A comprehensive catalogue of evaluation metrics and benchmarks for assessing the performance of large language models, aiding in model selection and improvement.
Key differentiator
“LLM-Evals-Catalogue stands out by providing a comprehensive and standardized set of evaluation metrics specifically tailored for large language models, enabling more informed model selection and improvement decisions.”
Capability profile
Strength Radar
Honest assessment
Strengths & Weaknesses
↑ Strengths
Fit analysis
Who is it for?
✓ Best for
Data science teams looking for a standardized way to evaluate and compare large language models
Researchers who need a comprehensive set of benchmarks for their studies
Machine learning practitioners aiming to improve model performance through systematic evaluation
✕ Not a fit for
Teams needing real-time streaming evaluations (batch-only architecture)
Projects with limited computational resources, as some evaluations may be resource-intensive
Cost structure
Pricing
Free Tier
None
Starts at
See website
Model
Flat rate
Enterprise
None
Performance benchmarks
How Fast Is It?
Next step
Get Started with LLM-Evals-Catalogue
Step-by-step setup guide with code examples and common gotchas.