FELM

Meta-benchmark for evaluating factuality in large language models.

EstablishedOpen SourceLow lock-in

Visit Website ↗Compare ⇄

Pricing

See website

Flat rate

Adoption

→Stable

License

Open Source

Data freshness

—

Overview

What is FELM?

FELM is a meta-benchmark designed to evaluate how well different tools assess the factuality of outputs from large language models, aiding developers and researchers in understanding model reliability.

Key differentiator

“FELM stands out as the only meta-benchmark specifically designed to evaluate the factuality of outputs from large language models, providing a unique tool for researchers and developers focused on model reliability.”

Capability profile

Strength Radar

Honest assessment

Strengths & Weaknesses

↑ Strengths

Evaluates the factuality of outputs from large language models.

Provides a standardized benchmark for comparing different evaluators.

Aids in understanding model reliability and accuracy.

Fit analysis

Who is it for?

✓ Best for

Researchers and developers who need to evaluate the factual accuracy of LLMs.

Teams working on improving model reliability through benchmarking.

✕ Not a fit for

Users looking for real-time factuality evaluation in production environments.

Projects that require a graphical user interface (GUI) for interaction.

Cost structure

Pricing

Free Tier

None

Starts at

See website

Model

Flat rate

Enterprise

None

Performance benchmarks

How Fast Is It?

Next step

Get Started with FELM

Step-by-step setup guide with code examples and common gotchas.

View Setup Guide →