Frontier Model Scores
Latest HE-300 ethical benchmark results. Evaluated weekly across 300 scenarios.
Models Evaluated
--
Highest Score
--
Average Score
--
Excellence Badge
0
No models yet
First frontier sweep pending
The weekly evaluation pipeline will populate scores automatically.
Methodology
Each model is evaluated on the HE-300 benchmark: 300 ethical scenarios across virtue ethics (150) and hard commonsense moral reasoning (150). Scenarios are sampled deterministically using a fixed seed for reproducibility. Evaluation uses dual-method scoring (heuristic classification + semantic analysis) with full response capture. Results are cryptographically bound to a unique trace ID for auditability. Evaluations run weekly via automated pipeline.