Ernie 5.1 from Baidu inherits the pretraining foundation of Ernie 5.0, yet compresses total parameters to roughly one third and active parameters to about one half. The headline claim is even sharper: comparable pretraining performance at only about 6 percent of the cost of similar models at its scale.
That makes Ernie 5.1 interesting. The AI industry has spent years equating progress with more parameters, more data, and more compute. Baidu is arguing for a different kind of progress: extracting a stronger model from an already trained family of models, then improving it through more targeted reinforcement learning and post training.
What Ernie 5.1 changes compared with Ernie 5.0
According to Baidu’s official release, Ernie 5.1 is derived from Ernie 5.0 rather than trained as a completely separate model from scratch. The key idea is a Once For All elastic training framework. Instead of pretraining one model size at a time, Baidu trained Ernie 5.0 in a way that created many possible submodels inside a shared structure.
Those submodels vary across three dimensions. Depth changes how many Transformer layers are active. Width changes how much expert capacity is used in mixture of experts layers. Sparsity changes how many experts are activated during routing. Baidu then selected an optimized subnetwork from that larger matrix to create Ernie 5.1.
In practical terms, this means the heavy investment made in Ernie 5.0 can be reused more efficiently. Baidu claims Ernie 5.1 keeps flagship level intelligence while reducing inference cost and pretraining compute. If the numbers hold up in broad real world use, this approach matters because it could make advanced AI systems cheaper to run without sacrificing quality.
Why the 6 percent pretraining cost claim matters
The most striking claim around Ernie 5.1 is that its pretraining compute cost is only about 6 percent of comparable models. This does not mean the entire research cost was tiny. The model benefits from Ernie 5.0’s earlier training foundation. Still, the claim points to a broader trend in large language model development: efficiency is becoming as important as raw scale.
For developers and businesses, the cost of a model is not only about training. It also includes inference, latency, deployment complexity, and reliability on long tasks. A model that uses fewer active parameters per query can be cheaper to serve. A model that keeps strong performance at a smaller active scale can also support more use cases where response speed and budget matter.
This is why Ernie 5.1 should be read less as a simple model upgrade and more as a statement about model engineering. Baidu is emphasizing parameter efficiency, routing efficiency, and training infrastructure rather than only model size.
Agent and reasoning performance are central to the release
Baidu’s release puts strong emphasis on agentic capabilities. Baidu says Ernie 5.1 performs strongly on τ³ bench and SpreadsheetBench Verified, surpassing DeepSeek V4 Pro on those agent evaluation tasks.
The company also highlights performance on the Arena Search leaderboard. Baidu says Ernie 5.1 scored 1,223 on May 9, placing fourth globally and first among Chinese models. Search performance is especially relevant because many real AI tasks are no longer closed book. A model must know when to retrieve information, how to compare sources, and how to synthesize answers without drifting from the task.
On reasoning, Baidu reports that Ernie 5.1 approaches leading closed source models. On AIME26 with tool use, a demanding math benchmark, Baidu says the model scored 99.6 and ranked just behind Gemini 3.1 Pro. The company also says Ernie 5.1 performs near top models on GPQA and MMLU Pro, which test advanced knowledge and reasoning across fields.
The reinforcement learning system behind Ernie 5.1
One of the most technical parts of the Ernie 5.1 release is Baidu’s reinforcement learning infrastructure. The company says it built a disaggregated fully asynchronous system on PaddlePaddle to handle long horizon agent training more efficiently.
In simpler terms, Baidu separated the major reinforcement learning components into independent systems. Training, inference, reward calculation, and the agent loop are coordinated by an RL controller, but each subsystem can scale separately. This matters because long agent tasks often create bottlenecks. If one part waits for another, expensive hardware sits idle.
Baidu also focuses on reducing the gap between training and inference. In reinforcement learning for large models, generated examples may differ from the way the model behaves during training. That mismatch can destabilize learning. Baidu says it uses a unified FP8 low precision operator library and an optimized Rollout Router Replay technique for mixture of experts routing. The company claims this reduces K3 KL divergence by 50 percent while adding almost no extra latency.
The other notable piece is heterogeneous elastic resource scheduling. Baidu says it can assign different compute resources to training, inference, and reward subsystems as needed. It also uses idle CPU resources for logic heavy tasks such as code sandboxes and verifiers. That is a practical detail, but an important one. Agent training often depends on checking whether code runs, whether a spreadsheet action is valid, or whether a multi step answer can be verified.
How Baidu tries to avoid the seesaw effect
Baidu describes a common problem in model training: improving one capability can weaken another. A model tuned heavily for code may become less natural in conversation. A model optimized for concise reasoning may become less imaginative in creative writing. Baidu calls this the seesaw effect.
Ernie 5.1 uses a four stage post training pipeline centered on Multi Teacher On Policy Distillation. First, the model receives unified supervised fine tuning on high quality instruction data. This gives it a shared foundation for instruction following and tool use.
Second, Baidu trains multiple domain expert models in parallel. These experts can focus on areas such as code, reasoning, and agentic tasks, each with its own reward signals. This reduces interference because every expert can specialize before capabilities are merged.
Third, a student model learns from those experts through On Policy Distillation. The student samples from its own policy, then learns from multiple teacher models using token level reverse KL divergence. The aim is to consolidate specialized skills into one model without forcing every objective into the same training stage.
Fourth, Baidu applies general online reinforcement learning for open ended chat and creative tasks. This step is important because not every skill distills cleanly. Creative writing and conversation often have high entropy, meaning there are many valid outputs. If distillation is too rigid, the model may become overly smooth or predictable. Baidu says this final stage helps preserve generation diversity and alignment with user preferences.
Creative writing is a major target
Ernie 5.1 is not presented only as a reasoning model. Baidu also claims major gains in creative writing. The company says its internal evaluations show creative writing quality approaching Gemini 3.1 Pro. It highlights long form narrative control, emotional alignment, stylistic adaptability, and the ability to infer deeper user intent rather than only follow surface level prompts.
The model is also being rolled out across creative production platforms, including ISEKAI ZERO, Mulan AI, Diting Huanliu, and Storymaster. That choice is telling. Baidu appears to see creative agents as a proving ground for long context, role consistency, tone control, and interactive generation. These are difficult areas because the model must balance imagination with coherence over extended exchanges.