LegalEvalHub offers curated leaderboards that aggregate performance across multiple tasks. Aggregated leaderboards may encompass tasks from the same benchmark, on the same type of document, or on the same type of legal reasoning.

Models are evaluated on each individual task, then scores are aggregated to produce overall leaderboard performance. We calculate three metrics: To appear on an aggregate leaderboard, models must be evaluated on all tasks within that leaderboard. This ensures fair comparison and prevents selective reporting.

Available Aggregate Leaderboards

LegalBench (Full)

161 LegalBench tasks.

LegalBench (Reasoning)

Subset of LegalBench focused on more complex reasoning.

HousingQA (Knowledge)

Tests model knowledge of housing law.

HousingQA (Statute Comprehension)

Tests model ability to read and interpret housing law statutes.