FELM
Meta-benchmark for evaluating factuality in large language models.
Pricing
See website
Flat rate
Adoption
→StableLicense
Open Source
Data freshness
—Overview
What is FELM?
FELM is a meta-benchmark designed to evaluate how well different tools assess the factuality of outputs from large language models, aiding developers and researchers in understanding model reliability.
Key differentiator
“FELM stands out as the only meta-benchmark specifically designed to evaluate the factuality of outputs from large language models, providing a unique tool for researchers and developers focused on model reliability.”
Capability profile
Strength Radar
Honest assessment
Strengths & Weaknesses
↑ Strengths
Fit analysis
Who is it for?
✓ Best for
Researchers and developers who need to evaluate the factual accuracy of LLMs.
Teams working on improving model reliability through benchmarking.
✕ Not a fit for
Users looking for real-time factuality evaluation in production environments.
Projects that require a graphical user interface (GUI) for interaction.
Cost structure
Pricing
Free Tier
None
Starts at
See website
Model
Flat rate
Enterprise
None
Performance benchmarks
How Fast Is It?
Next step
Get Started with FELM
Step-by-step setup guide with code examples and common gotchas.