For most of the past four years, generating an image with AI has felt like lighting a firework. You pack a prompt with as much detail as you can, hit generate, and hope something usable lands on screen. When it does not, you reroll. When it does, you try to edit it and watch the parts you loved drift out of shape. Reve 2.0 is built around a different premise. An image should be planned before it is rendered, and that plan should be something you can actually read, edit, and reuse.
Why prompt-only image generation hits a ceiling
Diffusion models produce beautiful pictures, but they are not particularly steerable. Large language models are intelligent and structured, but their native output is text, not pixels, and asking them to drive a separate image model through prompts adds latency and loses nuance at every step. Language is imprecise. Creativity is iterative. The two do not mesh as cleanly as they look on a slide.
The result is a familiar loop. You write a long prompt. You get an image that is almost right. You rewrite the prompt to fix the badge color, and the model also changes the lighting, the pose, and the background. You try again. You upscale at the end and watch the letterforms quietly mutate. Every iteration introduces fresh diffusion and compression artifacts, which then get baked into the next reference image. Errors do not just accumulate, they compound.
Reve 2.0 attacks that problem at the architectural level by separating planning from rendering.
Images as code, not captions
The core idea behind Reve 2.0 is that an image should be represented internally as code: a structured, detailed, manipulable description of composition, relationships, style, typography, and color. The model plans the image as a layout first, then renders it.
Reve 1.0 already proved this thesis. It was trained on structured data that defined composition and relationships, not on increasingly long captions. Reve 2.0 keeps that planning layer and adds roughly three times the parameters, more training data, more compute, and a new rendering architecture on top. The intermediate representation is no longer just a research idea. It is the surface that the editor, the API, and the agent integrations all sit on.
Native 4K as a first-class primitive
Resolution is the second major bet. Reve 2.0 renders at 4K by 4K, a true 16 megapixels, as a native generation rather than a small image followed by an upscale pass.
Anyone who has spent serious time with generative images knows what late-stage upscaling does. You finally lock in a composition you like, run the upscale, and watch small details shift. Upscaling becomes one final dice roll on top of a long line of dice rolls.
Generating at 4K from the start removes that step. What you see during iteration is what you get at the end. The same files are suitable for print workflows without a separate post-processing stage. For physical media, packaging, posters, and editorial layouts, that matters more than another half-point on a benchmark.
Iteration without progressive degradation
Reve 2.0 also addresses the slow collapse that affects iterative image work. Two behaviors are worth calling out.
- Lower degradation with image references. The rendering architecture is designed to resist the compounding artifacts that appear when generated images get fed back in as references. Edits stay more stable across longer iterative sessions.
- Zero degradation without image references. Because images are represented as code, regenerating from the same structured description locks elements in place. Similar code produces similar output, the way it does in software. No new layer of artifacts is added.
That changes the economics of exploration. Iteration stops being something the model quietly punishes you for.
Typography and graphic design that actually behave
Text rendering has been the most public weakness of image models. Reve 2.0 treats it as a layout problem rather than a generation lottery. Because composition, positioning, and spacing are defined in the structured representation before rendering, text can be placed exactly where it belongs. Layouts are coherent because they were planned, not because the model got lucky.
The same logic applies to environmental typography: handwriting, packaging, menus, street signs, license plates. These read as part of the scene rather than as smeared approximations of letters. For design-heavy work, this is the difference between a usable asset and a draft you have to fix in Photoshop.
An agent-native model with a real editor on top
Because the intermediate representation is code, agents can both see images and reason about them. That is a meaningful shift from opaque agentic image platforms where an LLM writes a prompt, the image model renders pixels, and nothing in between is inspectable.
The same representation drives the editor at reve.com. Every element in an image is addressable. You can manipulate it directly rather than describing your change in prose and hoping the model interprets you correctly. The model and the product were designed together, which is rare in this category. As Alan Kay put it, people who are serious about software should make their own hardware. Reve is applying the same principle to creative tooling.
Where Reve 2.0 fits against the rest of the market
Reve 2.0 is not pitching itself as the strongest model on every blind-vote leaderboard. Independent analysis from Trilogy AI puts gpt-image-2 ahead on broad first-pass quality and notes that Google retains an edge for grounded multimodal work tied to live information. Reve is betting on workflow.
Reve 2.0 publishes a base API price of roughly 0.04 dollar per image for create and edit (or remix) operations, at native 4K. Comparable 4K output from competing premium models are more expensive.
For teams producing ads, product visuals, ecommerce variants, educational graphics, or campaign concepts, where the real unit of work is a long sequence of iterations rather than a single hero image, that price is feasible.
The compact API covers the three operations that design workflows lean on hardest: create from text, edit an existing image with instructions, and remix prompts with reference images. OpenAI still exposes deeper controls around masks, streaming, compression, and moderation for engineering-heavy production systems. Reve covers the main creative loop and pairs it with a visual editor where designers can manipulate the layout directly.
Cost per accepted asset
The right way to judge Reve 2.0 is not cost per generation. It is cost per accepted asset. A cheaper generation that needs ten retries is not cheap. A more expensive generation that lands first time can be a bargain. Reve 2.0 is built so that layout stays controllable, edits stay stable, resolution stays consistent, and iteration stays affordable. If those properties hold up in daily production work, the model does not need to win every generic benchmark to matter for the teams it is aimed at.