Mistral has released Mistral Medium 3.5, a dense 128B parameter model that pulls instruction-following, reasoning, coding and vision into one set of weights. It replaces Mistral Medium 3.1 and Magistral inside Le Chat, and takes over from Devstral 2 in the Vibe coding agent. The headline trick is that you can dial reasoning effort up or down per request, so the same model handles a one-line chat reply or a long agentic coding session without swapping checkpoints.
The weights are published on Hugging Face under a modified MIT license, with API pricing set at $1.50 per million input tokens and $7.50 per million output tokens. Self-hosting is realistic on four GPUs, which puts it in reach of teams that previously relied on hosted-only frontier models.
What is actually new in Mistral Medium 3.5
Earlier Mistral releases split capabilities across separate models: Magistral handled reasoning, Devstral handled code, the base Medium line did general chat. Medium 3.5 merges them. Mistral calls it their first flagship merged model, and the practical effect is that you no longer route requests to different endpoints depending on whether the user wants a quick answer or a multi-step plan.
Key specs worth knowing:
- 128B dense parameters, not a mixture-of-experts setup.
- 256k token context window, large enough to fit substantial codebases or long research dossiers in one shot.
- Multimodal input with text and images, text output. The vision encoder was trained from scratch to handle variable image sizes and aspect ratios.
- Configurable reasoning effort, set per request as none for fast replies or high for complex prompts and agentic work.
- Multilingual coverage across English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic and more.
- Native function calling with reliable JSON output, aimed squarely at agent pipelines.
Benchmark numbers and what they mean
On SWE-Bench Verified, a benchmark that measures real GitHub issue resolution, Medium 3.5 scores 77.6%. On τ³-Telecom, an agentic benchmark, it lands at 91.4%. These numbers put it ahead of Devstral 2 and competitive with much larger open-weight models like Qwen3.5 397B A17B on coding tasks specifically.
The more interesting claim is durability. Mistral built the model for long-horizon work: calling multiple tools in sequence, recovering from failed tool responses, and producing structured output that downstream code can actually parse. That reliability under tool use is what made the next piece, async cloud agents, shippable.
Remote agents in Vibe move coding off the laptop
Until now, Mistral’s Vibe coding agent ran locally in your terminal. With this release, sessions can run in the cloud instead. Several can run in parallel, each in an isolated sandbox, and they notify you when they finish rather than blocking your terminal while they grind through a task.
A few details matter here:
- You can launch a cloud session from the Vibe CLI or directly from Le Chat in the browser.
- An ongoing local CLI session can be teleported to the cloud, carrying its history, task state and pending approvals across.
- When the work is done, the agent can open a pull request on GitHub. You review the diff instead of watching every keystroke.
- Vibe integrates with GitHub for code, Linear and Jira for issues, Sentry for incidents, and Slack or Teams for reporting.
The targeted use cases are the work that eats developer time without needing developer judgment on every line: module refactors, test generation, dependency bumps, CI investigations, straightforward bug fixes. You stay the reviewer rather than the keystroker.
Work mode in Le Chat
Le Chat gets a new Work mode, also powered by Medium 3.5. It is a longer-running agentic mode where connectors are on by default and the assistant can read and write across mailboxes, calendars, documents and other connected systems.
Practical examples Mistral highlights:
- Catch up across email, messages and calendar in a single run, then prepare meeting briefs with attendee context and recent news.
- Research a topic across the web and internal documents, then produce a structured report you can edit before sending.
- Triage an inbox, draft replies, create Jira issues from team discussions, and post summaries to Slack.
Every tool call and its reasoning are visible, and the agent asks for explicit approval before sensitive actions like sending messages or modifying data. That transparency is the part most enterprise teams will care about.
Why the deployment story matters
The pricing sits in roughly the same neighborhood as DeepSeek-V4 Pro, but the real differentiator is sovereignty. Mistral is positioning Medium 3.5 for customers who want European data residency, low API costs, or the ability to self-host entirely. The four-GPU minimum for self-hosting is unusually low for a 128B dense model and reflects work on inference efficiency, including a separately released EAGLE speculative decoding model for vLLM and SGLang.
To back this infrastructure long-term, Mistral has reportedly taken out an $830 million loan for a data center near Paris. That is a strong signal about where the company expects its margin to come from: not just API calls, but hosted enterprise deployments where compute and data stay on the continent.
How to actually use it
If you want to try it, the model is available through the Mistral API, in Le Chat on Pro, Team and Enterprise plans, and as open weights on Hugging Face. For local inference, vLLM is the recommended backend, with SGLang shipping day-zero support and llama.cpp and LM Studio support listed as work in progress. NVIDIA NIM containers are also available for those running on NVIDIA infrastructure.
For coding work specifically, install the Vibe CLI, point it at mistral-medium-3.5 and you can either use the Mistral API or a local vLLM server by editing ~/.vibe/config.toml.
The shift worth watching
The release itself is solid: one model, three former specializations, configurable reasoning, vision baked in. The bigger shift is what async cloud agents imply for how developers work. When a coding session can run for thirty minutes in a sandbox and come back with a draft pull request, the developer’s role moves further toward review, architecture and judgment. Medium 3.5 is the model that made Mistral comfortable shipping that workflow. Whether the agents are reliable enough to actually replace those keystrokes is the question every team will answer for itself over the next few months.