insurance_policy_interpretation


7 submissions

Rank Model accuracy balanced_accuracy f1_macro f1_micro valid_predictions_ratio Date Results
1 claude-3-5-haiku-20241022 0.556 0.631 0.526 0.556 1.000 2025-08-01 View
2 gpt-4.1-nano 0.474 0.407 0.405 0.474 1.000 2025-07-03 View
3 gpt-4o-mini 0.451 0.557 0.392 0.451 1.000 2025-07-02 View
4 google/gemma-2-27b-it 0.436 0.448 0.354 0.436 1.000 2025-07-24 View
5 meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo 0.429 0.515 0.361 0.429 1.000 2025-07-25 View
6 meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo 0.421 0.444 0.320 0.421 1.000 2025-07-30 View
7 claude-3-haiku-20240307 0.323 0.352 0.251 0.323 1.000 2025-07-25 View