Baidu has unveiled Ernie 5.0, a new version of its multimodal base model that jointly models text, images, audio, and video for complete multimodal understanding and generation.
The new foundation model was unveiled at Baidu’s annual flagship event Baidu World 2025 on Thursday.
The new version aims to improve multimodal understanding, instruction execution, creative writing, factual reasoning, agentic planning and tool use.
A preview of the model is currently available to the public and enterprise users via Baidu AI Cloud's MaaS Qianfan platform.
During the event, Robin Li, co-founder and chief executive officer of Baidu, said that foundation models are evolving rapidly, as evidenced by continuous advances in the field such as increased model “thinking time”, native integration of multiple modalities, and the ability to self-learn and evolve.
"AI agents themselves are the most significant applications, and the speed of technological iteration is the only competitive moat," he said.
He added that Baidu will continue to invest in advancing the frontier models.
During the event, the company also unveiled a suite of other AI-powered products.
These include a new generation of real-time digital characters, an improved version of its Miaoda no-code application builder, and its GenFlow 3.0 general AI agent.
He also presented Famou, an evolving AI agent that aims to simulate and even surpass a high-level algorithm expert according to the firm.
Li claimed the AI agent is capable of quickly abstracting complex problems and automatically iterating as conditions change to provide a dynamic optimal solution. It can be applied in complex scenarios in the transport, energy, finance and logistics sectors.
Li emphasised that these new versions come at a time when the internalisation of AI capabilities in the modern workflow is becoming increasingly important.
"When you internalise AI, it becomes a native capability and transforms intelligence from a cost into a source of productivity," Li explained.
He added that the structure of the sector is moving towards a healthy “inverted pyramid”, in which he believes that applications create 100 times more value than basic models, having witnessed a significant evolution in model capabilities over the past year that goes far beyond the boundaries of AI-based chatbots.
“We have observed substantial advancements not only in areas like digital human technology and code agents but also in the autonomous evolution and global optimisation for general-purpose scenarios," Li said.









Recent Stories