UK's AI Security Institute Reveals AI Benchmarks Underestimate Capabilities by Limiting Compute

The UK's AI Security Institute finds that limiting compute budgets causes standard AI benchmarks to underestimate agent capabilities by up to 60 percent.

The UK's AI Security Institute has found that common AI benchmarks systematically undervalue AI agent performance by imposing strict compute budget limits, according to The Decoder. These constraints restrict the token budget, which directly impacts the evaluation of AI capabilities.

In tests across seven benchmarks, increasing the token budget tenfold led to a roughly 25 percent rise in success rates on software engineering tasks. This suggests that prior assessments underestimated AI progress, as actual advancements at the frontier are approximately 60 percent steeper than previously measured, depending on token budget allowances.

For Japanese markets, where AI-driven trading and automated systems are increasingly prevalent, these findings highlight the importance of revising evaluation standards to better capture AI potential and inform investment strategies.

#newsroom

By Mansa Kane

Tokyo Overnight

Japan markets, decoded. Get the brief in your inbox.

No spam. Unsubscribe anytime.