
ARC-AGI-3
Interactive reasoning benchmark to measure human-like intelligence in AI agents

About ARC-AGI-3
ARC-AGI-3 is an interactive reasoning benchmark designed to challenge AI agents to explore novel environments, acquire goals on the fly, build adaptable world models, and learn continuously. Rather than solving static puzzles, agents must learn from experience inside each environment by perceiving what matters, selecting actions, and adapting their strategy without relying on natural-language instructions. A 100% score means AI agents can beat every game as efficiently as humans, measuring intelligence across time and the gap between AI and human learning.
Key Features
- Interactive reasoning benchmark
- Replayable runs for transparent evaluation
- Developer toolkit for agent integration
- Interactive UI for testing and iteration
- API for agent integration
- 100% human-solvable environments
- Experience-driven adaptation
- Long-horizon planning with sparse feedback
Use Cases
Best For
Supported Languages
Pros & Cons
- Tests genuine reasoning and adaptation rather than memorization
- Transparent evaluation with replay functionality
- Clear design principles with meaningful feedback
- Challenges AI agents to learn from experience like humans
- Free and open access to benchmark
- Requires substantial computational resources for complex agent development
- Limited to interactive reasoning tasks, not other AI domains
- Competition format may create pressure for rapid iteration
Details
- Pricing
- free
- Company
- ARC Prize
- Views
- 6
Available On
Platform Support
Similar Tools
The Distributed Edge AI Platform for unified memory and context in self-improving AI
Live simulation of AI agents scamming each other and getting caught

