supply_chain_disclosure_disclosed_audits
- Task Description: Given a disclosure, determine whether the statement discloses to what extent, if any, that the retail seller or manufacturer conducts audits of suppliers to evaluate supplier compliance with company standards for trafficking and slavery in supply chains.
- Task Type: Binary classification
- Document Type: corporate disclosure
- Number of Samples: 387
- Input Length Range: 107-5764 tokens
- Evaluation Metrics: accuracy (maximize), balanced_accuracy (maximize), f1_macro (maximize), f1_micro (maximize), valid_predictions_ratio (maximize)
- Tags: corporate law, interpretation
- Paper: LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
- Dataset Download: https://hazyresearch.stanford.edu/legalbench/
7 submissions
Rank | Model | accuracy | balanced_accuracy | f1_macro | f1_micro | valid_predictions_ratio | Date | Results |
---|---|---|---|---|---|---|---|---|
1 | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 0.955 | 0.885 | 0.871 | 0.955 | 1.000 | 2025-08-04 | View |
2 | google/gemma-2-27b-it | 0.884 | 0.859 | 0.751 | 0.884 | 1.000 | 2025-07-24 | View |
3 | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 0.844 | 0.863 | 0.710 | 0.844 | 1.000 | 2025-07-25 | View |
4 | claude-3-haiku-20240307 | 0.781 | 0.648 | 0.581 | 0.781 | 1.000 | 2025-07-25 | View |
5 | gpt-4o-mini | 0.770 | 0.848 | 0.644 | 0.770 | 1.000 | 2025-07-02 | View |
6 | claude-3-5-haiku-20241022 | 0.758 | 0.841 | 0.634 | 0.758 | 0.992 | 2025-08-01 | View |
7 | gpt-4.1-nano | 0.522 | 0.724 | 0.459 | 0.522 | 1.000 | 2025-07-03 | View |