Building Australia's LLM Evaluation Stack: From Imported Scoreboards to AU-Specific Tasks
May 18, 2026
A proof-of-concept benchmark on Australian legal text exposes a hallucination one frontier model scored 4/5 on. The framework is not the scarce resource. The dataset is.

