Gemini 3.5 Flash launched at Google I/O 2026

Gemini 3.5 Flash is Google’s latest Flash model, built for agentic workflows, coding tasks, long context reasoning, and multimodal understanding at high speed. In its launch post, Google frames the model as a way to combine frontier intelligence with action. It is designed to plan, use tools, coordinate subagents, and keep working across complex tasks under supervision.

The most important detail is the balance. Google says Gemini 3.5 Flash rivals larger flagship models across several dimensions while keeping the latency profile expected from the Flash family. Agentic AI often fails in practice because it becomes too slow or too expensive once it starts calling tools, checking work, reading files, and iterating.

What Gemini 3.5 Flash is designed to do

Gemini 3.5 Flash sits at the intersection of three use cases: agents, coding, and multimodal reasoning. Google describes it as its strongest agentic and coding model so far. The model is now generally available across Google Antigravity, the Gemini API in Google AI Studio, Android Studio, Gemini Enterprise Agent Platform, Gemini Enterprise, the Gemini app, and AI Mode in Google Search.

Google Antigravity is one of the main environments where Gemini 3.5 Flash is shown coordinating subagents for larger tasks.

Google’s examples are broad. Gemini 3.5 Flash can organize unstructured assets by renaming and categorizing them, synthesize a research paper and build a playable game, transform a legacy codebase into Next.js, generate city landscapes with subagents, and create interactive web interfaces from plain text descriptions.

Why speed changes the agent equation

Speed is not just a comfort feature for AI agents. It changes what is economically possible. A simple chat response might be acceptable if it takes several seconds. A multi step workflow can involve dozens or hundreds of intermediate actions. If every step is slow, the final experience becomes frustrating. If every step is expensive, the workflow becomes hard to scale.

Google says Gemini 3.5 Flash is four times faster than other frontier models when measured by output tokens per second. Artificial Analysis also reports strong speed, listing Gemini 3.5 Flash high at about 278 output tokens per second in its testing. Artificial Analysis also notes a higher time to first token for the high reasoning version than the median in its comparison group. That means the model may spend time thinking before it starts answering, especially when configured for harder reasoning. For chat style tasks, this can feel different from raw streaming speed. For agentic tasks, the tradeoff may be worthwhile if the model produces better plans, fewer failed tool calls, and more useful intermediate work.

Benchmarks that matter for agents and coding

Google’s launch data emphasizes benchmarks that reflect tool use, coding, and long horizon work rather than only short question answering. Gemini 3.5 Flash reportedly outperforms Gemini 3.1 Pro on several challenging evaluations, including Terminal Bench 2.1 at 76.2 percent, GDPval AA at 1656 Elo, MCP Atlas at 83.6 percent, and CharXiv Reasoning at 84.2 percent.

Terminal Bench focuses on agentic coding in terminal like environments. MCP Atlas evaluates multi step workflows using tool protocols. GDPval AA is aimed at economically valuable knowledge work. CharXiv Reasoning measures the ability to synthesize information from complex charts.

DeepMind’s Gemini 3.5 Flash page reinforces this focus with examples such as iterative coding loops, parallel creative concept development, interactive HTML components, brand asset generation, and multi agent orchestration.

Gemini 3.5 Flash for developers

For developers, Gemini 3.5 Flash is most interesting when paired with tools. Google’s developer documentation says the model supports a one million token context window, 65,000 maximum output tokens, thinking, and the same broad set of tools and platform features as Gemini 3 Flash. Computer Use is the notable exception and is not supported at this moment.

The one million token context window is important for codebases, document analysis, long conversation history, and retrieval augmented generation. It lets the model reason over much larger inputs without forcing aggressive summarization first. That does not mean every request should include everything. Large context can increase cost and latency. The practical advantage is selectivity. You can include enough relevant material for the model to keep its bearings across a complex task.

Google also highlights thought preservation. In Gemini 3.5 Flash, reasoning context can carry forward across turns when the conversation history is preserved properly. In the Interactions API this is automatic. In the GenerateContent API, developers should pass the full unmodified conversation history so thought signatures can be used. The result should be stronger performance on tasks such as iterative debugging and code refactoring, where the model needs to remember what it already tried.

How thinking levels affect cost and quality

Gemini 3.5 Flash uses thinking levels rather than the older numeric thinking budget approach. The default has moved from high in Gemini 3 Flash Preview to medium in Gemini 3.5 Flash. Google says medium provides strong results across many tasks while improving speed and cost efficiency.

The practical model selection inside a single model now looks like this:

Minimal fits quick answers, simple classification, and lightweight chat tasks.
Low works for lower latency code and agent tasks that do not need many steps.
Medium is the default for balanced reasoning, writing, analysis, and most tool workflows.
High is best reserved for difficult code, hard reasoning, math, and complex agentic tasks.

This matters because agentic systems can overuse tools. Higher thinking levels may encourage more exploration and verification. That can improve quality, but it can also create unnecessary tool calls. Google’s developer guidance recommends reducing the thinking level first if tool use becomes excessive, then adding explicit system instructions that constrain the action budget.

Migration notes for Gemini API users

If you are moving from Gemini 3 Flash Preview, the main model name change is from gemini 3 flash preview to gemini 3.5 flash. Google’s developer docs also recommend removing temperature, top p, and top k settings from Gemini 3.x requests because the reasoning behavior is optimized around defaults.

The migration checklist is straightforward:

Use thinking_level instead of thinking_budget.
Review quality because the default effort level is now medium rather than high.
Pass full conversation history when you rely on preserved reasoning context.
Make sure every function response matches the previous function call by id, name, and count.
Put images, audio, and other media inside multimodal function response parts rather than outside them.
Append extra instructions to function response text instead of sending them as separate instruction parts.
Continue using Gemini 3 Flash Preview for Computer Use workloads until support changes.

Enterprise use cases from Google’s launch examples

Google’s primary launch post includes several partner examples that show where Gemini 3.5 Flash is being tested or integrated. Shopify is using parallel subagents to analyze complex data over long horizons for merchant growth forecasting. Macquarie Bank is piloting the model for customer onboarding, where it must reason over documents longer than 100 pages, retrieve relevant information, and make recommendations with low latency.

Salesforce is integrating Gemini 3.5 Flash into Agentforce to automate enterprise tasks with multiple subagents that retain context and use tools across many turns. Ramp is applying the model’s multimodal understanding to improve OCR for complex invoices by combining visual interpretation with reasoning over historical patterns. Xero is using agents for multi week administrative workflows such as identifying suppliers and gathering information for 1099 tax forms. Databricks is using agentic workflows to monitor real time information, reason across large datasets, diagnose issues, and propose fixes for data scientists.

These examples share a useful lesson. Gemini 3.5 Flash is not only aimed at creative demos or chat productivity. The target workloads are messy, document heavy, tool dependent, and often spread across systems. That is where latency and token cost become decisive.

Gemini Spark and personal agents

Gemini 3.5 Flash is also the default model in the Gemini app and AI Mode in Search. Google says it powers Gemini Spark, a personal AI agent that can run continuously under your direction. Spark is starting with trusted testers and is planned for a beta rollout to Google AI Ultra subscribers in the United States.

The idea behind Spark is that an agent can watch for context, help manage digital tasks, and ask follow up questions when needed. Ars Technica reported examples such as monitoring emails, preparing daily digests, summarizing meetings, and gathering information from Google Drive or Gmail with user direction. Google stresses that high stakes actions should require approval.

This is where the usefulness and privacy questions become inseparable. A personal agent becomes more valuable when it has access to more context. The same access increases the need for transparency, permissions, auditability, and reliable refusal behavior. Gemini 3.5 Flash may make always available agents more technically feasible, but product design will decide whether people trust them.

Pricing and efficiency considerations

Gemini 3.5 Flash is positioned as a more efficient alternative to larger frontier models. Google says it can complete some long horizon agentic work at less than half the cost of other frontier models. Artificial Analysis lists API pricing for Gemini 3.5 Flash high at 1.50 dollars per million input tokens and 9.00 dollars per million output tokens, with cached and blended costs depending on usage patterns and provider details.

Output cost deserves attention. Reasoning models can be verbose, and Artificial Analysis notes that Gemini 3.5 Flash high produced more output tokens than average in its Intelligence Index run. If your workflow generates long reasoning traces, repeated summaries, or large tool outputs, the final bill may depend more on output discipline than model sticker price.

A practical cost strategy is to use high thinking only where the task justifies it, cache stable context where supported, limit unnecessary tool calls, and design agents to produce concise intermediate outputs. Speed lowers waiting time. Good orchestration lowers waste.

Safety and reliability

Google says Gemini 3.5 was developed according to its Frontier Safety Framework. The launch post specifically mentions strengthened cyber and CBRN safeguards, with the goal of reducing harmful content while also reducing mistaken refusals of safe queries. Google also refers to safety training, mitigations, and interpretability tools that help examine internal reasoning before a response is provided.

The practical takeaway

Gemini 3.5 Flash makes agentic AI more plausible because it improves the tradeoff between intelligence, speed, and cost. The model is not magic on its own. Its value appears when it is paired with clean tool design, sensible thinking levels, preserved context, and clear human oversight. The future of useful agents may depend less on a single smartest model and more on fast models that can keep acting correctly for long enough to finish real work.