AMI Labs is betting on world models

AMI Labs and JEPA sit at the center of one of the clearest alternatives to today’s language first AI race. The lab co founded by Yann LeCun is building AI systems that understand the real world, keep persistent memory, reason and plan, and remain controllable and safe. They use JEPA, short for Joint Embedding Predictive Architecture, and to the broader idea of world models.

AMI Labs frames the problem sharply: real world data is continuous, high dimensional, noisy, and often unpredictable. Cameras, wearable sensors, robots, industrial systems, and medical devices do not produce neat sentences. They produce streams of signals where some details matter deeply and many details are irrelevant. A model that tries to generate every missing pixel or every future signal can waste most of its capacity on noise. JEPA takes a different route. It predicts in representation space, not in raw data space.

Why AMI Labs starts from the real world

The most important line in AMI Labs’ public vision is the belief that real intelligence does not start in language. It starts in the world. That is a direct challenge to the assumption that scaling large language models will be enough to reach broadly capable AI.

Language models have become useful because text contains compressed knowledge about the world. But text is a second order signal. It describes experience rather than being experience. A child does not learn that a cup can fall by reading millions of sentences about gravity. A child sees objects move, collide, disappear, reappear, resist, spill, break, and roll. Over time, the child builds expectations about what can happen next.

AMI Labs is built around the idea that AI needs something closer to that kind of learning. Its site describes world models that learn abstract representations of real world sensor data, ignore unpredictable details, and make predictions in representation space. That last phrase is the key. Instead of predicting a perfect future image or reconstructing every pixel, the system tries to predict the underlying meaning of what will happen.

This matters because the real world is full of uncertainty. If a person walks behind a parked car, the exact pattern of pixels on their jacket is not the crucial fact. The relevant fact is that a person is now occluded and may reappear. If a robot reaches for a mug, it does not need to predict every reflection on the ceramic surface. It needs to predict whether the gripper will contact the mug, whether the mug will move, and whether the action is safe.

What JEPA actually does

Joint Embedding Predictive Architecture is a self supervised learning framework associated with Yann LeCun’s vision for autonomous machine intelligence. Its central move is simple but powerful. Instead of predicting raw tokens, pixels, or waveforms, JEPA predicts abstract embeddings.

In a typical JEPA setup, one part of the input is used as context and another related part is used as the prediction target. Encoders convert both into latent representations. A predictor then learns to infer the target representation from the context representation. The system is trained by reducing the gap between predicted and actual representations.

Imagine seeing the first few seconds of a video where a ball rolls toward the edge of a table. A generative model might try to produce the next frames pixel by pixel. JEPA instead tries to represent the situation in a more abstract way. The ball is moving. It is near an edge. If nothing stops it, it may fall. The exact texture of the wall behind the table can be ignored.

Prediction in representation space

Representation space is where a model stores compressed features of an input. These features might capture object identity, motion, depth, spatial layout, action, or causal structure. A good representation discards what is not useful and preserves what matters for understanding and planning.

This is why JEPA is often described as non generative. It does not need to generate a full image or video frame to learn. It learns by predicting the semantic structure hidden inside data. That can make it more efficient and less distracted by noise, at least in principle.

Why ignoring detail can improve intelligence

Ignoring detail can sound like a weakness. For world models, it may be a strength. If every uncertain detail must be predicted, the model gets punished for failing to know things no intelligent system could know. The exact future position of every leaf in the wind is not predictable. The fact that the tree remains a tree is predictable and useful.

AMI Labs’ focus on abstract representations follows this logic. In industrial process control, automation, robotics, wearables, and healthcare, the task is not to produce pretty outputs. The task is to infer what is happening, predict what could happen next, and choose actions under constraints. A useful AI system must know which uncertainties matter and which can be safely ignored.

From image JEPA to video world models

JEPA did not remain a concept on a slide. Meta AI research explored several variants, including image JEPA, motion content JEPA, and video JEPA. These systems share the same broad principle of predicting in latent space, but they apply it to different kinds of data.

Image JEPA focuses on still images. Motion content JEPA extends the idea toward video by learning motion and content features together. Video JEPA goes further by learning from temporal visual data, where the model must represent not only what objects are present but also how they change over time.

This matters for AMI Labs because a static image is only a thin slice of the physical world. Real intelligence needs time. It needs to understand before and after, cause and effect, delay, occlusion, and action. A robot, a medical device, or an industrial control system cannot reason from isolated snapshots alone.

What video JEPA 2 shows about physical reasoning

Meta’s video JEPA 2 work is especially relevant because it connects JEPA to physical reasoning and robot planning. Meta describes video JEPA 2 as a 1.2 billion parameter world model trained primarily on video. Its architecture includes an encoder that turns raw video into embeddings and a predictor that forecasts embeddings from context.

The training process has two stages. First, the model is trained without actions on more than 1 million hours of video and 1 million images. This helps it learn patterns of motion, object interaction, and visual change. Second, it receives action conditioned training with robot data, where the predictor learns to account for specific actions. Meta reported that only 62 hours of robot data were used for this second phase in its technical report.

Why AMI Labs is not just another AI startup

TechCrunch reported in March 2026 that AMI Labs raised $1.03 billion at a $3.5 billion pre money valuation. That is a large sum even by frontier AI standards. But the more revealing detail is the time horizon. AMI Labs CEO Alexandre LeBrun told TechCrunch that the company is not a typical applied AI startup expected to ship a product in three months and show major revenue within a year. The work starts from fundamental research, and commercial applications may take years.

That fits the technical challenge. World models are not wrappers around existing chatbots. They require new training objectives, new evaluation methods, large scale sensor data, and careful deployment in environments where mistakes have consequences.

AMI Labs says it will work with industry partners, product developers, and the academic research community through open publications and open source.

Where JEPA could matter most

AMI Labs names several application areas where reliability, controllability, and safety matter. JEPA is relevant to each for a different reason.

Industrial process control needs models that can monitor continuous sensor streams, detect meaningful changes, and predict consequences before failures occur.
Automation needs systems that can reason about actions, constraints, timing, and safety rather than simply classify inputs.
Wearable devices need efficient representations of noisy, continuous signals from the body and environment.
Robotics needs action conditioned world models that can predict what a movement will cause before the robot commits to it.
Healthcare needs systems that are less prone to hallucination and better grounded in real observations, especially where errors can have serious consequences.

The common thread is that these are not purely linguistic domains. They involve sensors, time, uncertainty, and physical or biological processes. A language model can explain a procedure, summarize a medical note, or generate control code. But it does not automatically understand the physical situation behind the words. AMI Labs is betting that world models can fill that gap.

The safety argument behind controllable world models

AMI Labs explicitly includes controllability and safety in its mission. This is not just a public relations phrase. If an AI system acts in the real world, prediction alone is not enough. It needs constraints. It needs memory. It needs planning. It needs ways to evaluate possible actions before executing them.

In LeCun’s broader architecture for objective driven AI, several modules work together. A perception module estimates the current state of the world. A world model predicts future states. A cost mechanism evaluates consequences. An actor proposes actions that minimize expected cost. Memory helps the system use recent history. A configurator can adapt the system to the task.

JEPA fits naturally inside this architecture because it supplies the representation and prediction machinery. But JEPA alone does not solve safety. A world model can be wrong. A cost function can be incomplete. A planner can choose an unsafe shortcut. The safety promise depends on the whole stack, including guardrails, evaluation, uncertainty handling, and deployment discipline.

The hard parts AMI Labs still has to solve

The strongest argument for JEPA is also the source of its hardest problems. If intelligence depends on learning abstract representations of the world, then researchers must prove that those representations support robust reasoning, not just better benchmark performance.

Meta’s video JEPA 2 benchmarks show the gap. The company released tests for physical plausibility, minimal video pairs, and causal video question answering. Humans perform strongly on these tasks, while current models still struggle, especially with questions about what could have happened, what might happen next, and what action should occur to reach a goal.

That gap is useful because it prevents overclaiming. World models are a promising direction, not a solved route to human level intelligence. The current systems can recognize patterns and support limited planning, but they do not yet have the flexible common sense of a child navigating a kitchen or playground.

Another challenge is multimodality. AMI Labs talks about cameras and other sensor modalities. Real world intelligence may need vision, audio, touch, proprioception, medical signals, industrial telemetry, and memory across time. A future JEPA system may need hierarchical models that predict at multiple time scales, from milliseconds in motor control to hours in industrial monitoring.

What AMI Labs’ JEPA focus means for AI

AMI Labs’ choice to focus on JEPA is a bet on a different foundation for AI. Instead of treating language as the main substrate of intelligence, it treats language as one interface to a deeper problem. The deeper problem is learning how the world evolves, what actions change, and which details matter.