Imagine losing the ability to speak after a stroke and regaining communication without anyone drilling into your skull. That is the long-term goal behind Brain2Qwerty v2, the latest brain-to-text model from Meta’s AI research team. It reads typed sentences directly from non-invasive recordings of brain activity, reaching up to 78% word accuracy for the best participant. No implants, no surgery, no electrodes touching cortical tissue.
The system builds on Brain2Qwerty v1, which was recently accepted at Nature Neuroscience. Version one could predict individual keystrokes but needed the exact timing of every keypress, ruling out real-world use. Version two removes that constraint and generates sentences from continuous brain signals, marking a meaningful step toward usable non-invasive brain-computer interfaces.
Why non-invasive decoding matters
Modern neuroprostheses already restore communication for patients with anarthria, ALS or severe paralysis. Some intracortical systems now reach typing speeds of 90 characters per minute with character error rates under 6%. The catch: those results require open-brain surgery, with all the associated risks of hemorrhage, infection and long-term hardware failure.
Because of those risks, invasive interfaces remain limited to a small group of severely affected patients. Many people with locked-in syndrome, late-stage neurodegenerative disease or disorders of consciousness will never qualify for neurosurgery. A safe, scalable alternative would change the equation for diagnosing and supporting these patients.
That is where non-invasive recordings come in. Standard EEG is widely available but suffers from a poor signal-to-noise ratio, which forces users into slow, fatiguing tasks like staring at flickering grids or imagining hand movements for minutes on end. Even then, accuracy stays modest. A public EEG benchmark hits only 43.3% on a four-class motor imagery task. Magnetoencephalography (MEG), which measures the magnetic fields produced by cortical neurons, offers far higher signal quality and forms the foundation for what Meta’s team set out to do.
Inside the Brain2Qwerty architecture
Brain2Qwerty is a deep neural network with three stacked modules, each handling a different level of language structure.
- A convolutional module processes 500-millisecond windows of MEG or EEG signal. It uses a spatial attention mechanism to encode sensor positions, a subject-specific layer to account for differences between participants, and eight convolutional blocks with skip connections and dropout.
- A transformer module operates at the sentence level, refining keystroke predictions by exploiting contextual information across the full sequence.
- A pretrained character-level language model, a 9-gram model built with KenLM on the Spanish Wikipedia corpus, corrects the transformer’s output using the statistical regularities of natural language.
The full model contains roughly 400 million parameters. Training one model takes about 18 hours on a single Quadro GV100 GPU. The convolutional and transformer modules are trained jointly with cross-entropy loss across all subjects, sharing parameters where possible and personalising where needed.
What the numbers say
The team recruited 35 healthy adult volunteers at the Basque Center on Cognition, Brain and Language in Spain. Participants typed briefly memorised Spanish sentences on a custom non-magnetic QWERTY keyboard while their brain activity was recorded. Twenty took part in EEG sessions, twenty in MEG sessions, with five completing both.
The results show a striking gap between recording technologies. With MEG, Brain2Qwerty reaches an average character error rate (CER) of 29%. With EEG, the same architecture reaches 65%. For the best MEG participant, CER drops to 18%, low enough to perfectly decode a range of sentences the model had never seen during training.
Baseline comparisons confirm the architecture’s contribution. EEGNet, a well-known compact network for EEG-based interfaces, performs worse than Brain2Qwerty by a factor of 2.5 on MEG. Ablation studies show each module pulls its weight: the transformer improves CER beyond the convolutional layer alone, and the language model adds a further significant gain. One striking example: a participant typed “EK BENEFUCUI SYOERA KIS RUESGIS” with multiple typos, and the language model still recovered the intended sentence “EL BENEFICIO SUPERA LOS RIESGOS” perfectly.
What the brain is actually telling the model
Several analyses confirm that Brain2Qwerty leans heavily on motor representations rather than abstract linguistic ones. When the model makes mistakes, the wrongly predicted character tends to sit physically close to the target key on the QWERTY layout. A Pearson correlation of 0.73 ties keyboard distance to confusion rate. When the researchers clustered the model’s internal embeddings without supervision, the algorithm cleanly separated left-hand from right-hand keys, then progressively recovered the spatial layout of the keyboard at higher cluster counts.
Typing errors tell a similar story. Participants slowed down before mistakes, with the inter-key interval doubling from 49 ms to 111 ms on incorrect keystrokes. During those moments of motor hesitation, decoding accuracy drops sharply, suggesting that neural activity reflects a conflict between the intended and erroneous movement rather than a clean motor command.
Word and character frequency also shape performance. Frequent words are decoded better than rare ones, and rare letters like Z, K and W in Spanish stay close to chance level because they account for less than 0.1% of the training data. Out-of-vocabulary words can still be decoded, though with much higher error rates around 70%.
A scaling law for brain decoding
One of the most interesting findings is that performance scales reliably with training data. Across uniformly sampled subsets of the training set, CER decreases predictably, with a Pearson correlation of −0.93 between data volume and error rate. No plateau is visible yet. The Brain2Qwerty v2 update trained on roughly ten times more data per participant than the original version, and the gains followed the same trajectory.
If the curve holds, larger datasets should keep narrowing the gap with invasive systems. That is a genuinely useful property because it suggests progress is bottlenecked by data collection rather than fundamental signal limits.
The obstacles that remain
Two hard problems stand between this proof-of-concept and clinical use.
The first is accuracy. A 29% character error rate is too high for daily communication, and the best participants still produce noticeable mistakes. The model also depends on knowing when each keystroke occurred to align brain segments correctly, so it is not yet a true real-time system. Future iterations need continuous decoding without sentence-level corrections or known keystroke timings, similar to streaming speech recognition.
The second is hardware. The MEG scanner used in this study is a room-sized machine that lives in a magnetically shielded suite. Most patients will never have practical access to one. The team points to optically pumped magnetometers as a credible path forward. These new MEG sensors operate at room temperature and can be worn on the head, opening the door to a future where MEG-quality recordings happen without the scanner.
A third issue, less often discussed, is generalisation from healthy typists to patients who can no longer move. The current protocol relies on overt finger movements. Locked-in patients would need to imagine typing, and decoding models trained on actual movement do not automatically transfer to motor imagery. Adapting the task and architecture for attempted or imagined movements is the next experimental frontier.
What this changes about the field
For years, the dominant assumption was that practical brain-to-text required implants. Brain2Qwerty v2 does not overturn that view entirely, but it shows the gap is smaller than expected when modern deep learning meets high-quality magnetic recordings. The more than two-fold advantage of MEG over EEG, combined with sentence-level transformers and a language model, produces results that were unthinkable from non-invasive signals just a few years ago.
The work also reframes what counts as the bottleneck. It is no longer purely about decoding algorithms or signal quality in isolation. It is about scaling data collection, miniaturising MEG sensors and designing tasks that generalise from healthy volunteers to patients. Those are engineering and clinical problems, not theoretical ones, which makes them solvable with sustained effort.
A quiet shift toward wearable neurotechnology
The most overlooked detail in this research is not the accuracy figure. It is the assumption baked into the discussion that wearable MEG is coming. Once optically pumped magnetometers mature into reliable headsets, the same architecture demonstrated here could run on portable hardware. At that point the question stops being whether non-invasive decoding works and starts being whether the rest of the ecosystem, including clinical trials, regulatory frameworks and patient training protocols, can keep up with the sensors.