Sakana AI Launches Fugu Multi-Agent System: Outperforms GPT5.4 and Opus4.6 in Benchmarks

A Dynamic Multi-Agent Architecture: Preset-Free and Self-Organizing

Japanese AI research company Sakana AI has officially launched its flagship commercial product, the Sakana Fugu multi-agent orchestration system, and opened applications for early beta testing. This system is not a single model but an intelligent framework designed to dynamically coordinate multiple AI "Workers."

Product Line Positioning

The Fugu series offers two configurations: the Sakana Fugu Mini, optimized for low latency, and the Sakana Fugu Ultra, designed for demanding and complex tasks. Both are delivered via a single-model API compatible with the OpenAI format, greatly simplifying integration for developers.

Core Technology: Autonomous Learning and Dynamic Orchestration

The system's design philosophy stems from the team's Trinity and Conductor research papers presented at ICLR 2026. At its core is a lightweight language model, but it acts as a "conductor," not a "soloist."

Dynamic Task Allocation: The system discards the traditional multi-agent approach requiring manually pre-defined team roles or fixed pipelines. It automatically calls suitable models from a Worker pool and dynamically allocates tasks based on the perceived difficulty of the received job.
Adaptive Recursive Calling: Fugu possesses unique "test-time scaling" capability. The model can read its own previous outputs as context, autonomously identify flaws or deficiencies during runtime, and automatically initiate correction workflows.
Controllable Compute Axis: Users can treat "recursion depth" as a tunable parameter during inference. This allows developers to flexibly control the system's "thinking" intensity based on the need for result accuracy versus response speed.

Performance Benchmarks: Surpassing Top-Tier Single Models

According to official evaluation data released by Sakana AI, Fugu Ultra has achieved breakthrough scores on several challenging benchmarks, outperforming current leading flagship single models.

Benchmark Results

GPQAD (General Purpose Question Answering & Reasoning): Score 95.1
LCBv6 (Logic & Commonsense Benchmark v6): Score 93.2
SWEPro (Software Engineering Professional Test): Score 54.2

In the above tests, Sakana Fugu Ultra's overall performance surpassed that of models like GPT 5.4, Gemini 3.1, and Claude Opus 4.6. Its architectural advantages of multi-agent collaboration and dynamic error correction are particularly evident in hardcore reasoning and coding tasks.

The launch of the Fugu system marks a critical step in moving multi-agent collaboration from theoretical research to large-scale commercial application. Its paradigm of "dynamic orchestration and autonomous optimization" may bring new perspectives to developing complex AI applications.