What is the Impact of AGENTS.md Files on the Quality of AI Output?

If you have been following the rapid evolution of AI development tools, you have likely noticed a shift from simple code completion (like the early days of GitHub Copilot) to fully autonomous agentic coding. We are no longer just asking an LLM to write a function; we are asking agents to refactor this module, fix this bug across three files, or migrate this library to Rust.

With this shift came a new concept, Context Engineering. Developers realized that for an agent to understand a specific codebase, it needed a map. Enter the AGENTS.md (or CLAUDE.md, GEMINI.md) file. Place a Markdown file in the root of your repository containing all your coding conventions, architectural decisions, and gotchas. The AI reads it, understands it and writes perfect code. It sounds like the ultimate productivity hack.

But does it actually work? A recent wave of research, specifically a paper titled Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? by researchers at ETH Zurich, suggests that we might be doing it all wrong. In fact, that carefully curated documentation file might be making your AI agent more stupid and more expensive.

In this deep dive, we will explore the findings of this paper, contrast them with anecdotal success stories, and look at the dark side of context files, security vulnerabilities.

The Promise of the AGENTS.md File

Before we tear it apart, let’s look at why developers started using these files in the first place. When an AI agent enters a repository, it is essentially a stranger in a new house. It doesn’t know where the bathroom is, it doesn’t know that you prefer Polars over Pandas, and it certainly doesn’t know that you have a strict ban on using unwrap() in your Rust code.

The AGENTS.md file acts as a System Prompt for the repository. It is designed to bridge the gap between general training data and specific domain requirements.

Max Woolf, a data scientist and AI blogger, documented his experience using these files with Anthropic’s Claude Opus 4.5. His experience represents the optimistic view of context engineering. By creating a robust AGENTS.md, he was able to instruct the agent to:

Enforce Tooling: Use uv and .venv instead of base Python.
Style Constraints: Avoid unnecessary emojis in commit messages (a common AI quirk) and stop leaving redundant comments.
Language Specifics: When porting code to Rust, he used the file to enforce memory safety rules, preventing the use of .clone() to handle lifetimes poorly.

For Woolf, the file was a success factor. It allowed him to generate high-performance Rust code with Python bindings that beat standard libraries like NumPy in specific benchmarks. This anecdotal evidence suggests that when used for specific constraints, these files are powerful. However, specific constraints is not how most developers use them.

The Scientific Reality Check, the ETH Zurich Study

While individual developers reported success, researchers at ETH Zurich decided to put the concept to a rigorous test. Their paper, Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? is the first comprehensive study on whether these context files actually improve performance on standardized benchmarks.

What Was Tested?

The researchers didn’t just ask the AI to write good code. They evaluated state-of-the-art agents (including Claude Code, Codex CLI, and OpenCode) on massive benchmarks like SWE-bench Lite (real-world GitHub issue resolving) and AgentBench.

They compared the performance of agents in three distinct scenarios:

No Context: The agent enters the repo “blind” and must use tools (like ls, grep, or reading files) to figure things out.
Retrieved Context: The agent uses RAG (Retrieval-Augmented Generation) to find relevant documentation dynamically.
AGENTS.md (Context Engineering): The agent is force-fed a pre-written context file containing project structure, code style, and build instructions.

They also analyzed the content of thousands of real-world AGENTS.md files from GitHub to understand what developers are actually putting in them. They found that these files typically contain project structure summaries, code style guidelines, and build/test commands.

The Shocking Conclusions

The results were counter-intuitive and somewhat damning for the current practice of context engineering. Here are the core conclusions from the paper:

The Negative Impact on Performance

Contrary to popular belief, the presence of an AGENTS.md file often decreased the pass rate on benchmarks compared to agents that simply explored the repository themselves.

Why? The researchers point to the distraction factor. When you fill the LLM’s context window with a massive block of text describing the project structure, you are diluting its attention. LLMs suffer from a phenomenon known as “Lost in the Middle.” When the context becomes too long, the model struggles to retrieve the most critical information required to solve the specific bug at hand.

Redundancy is Costly

A significant portion of the typical AGENTS.md file is information the agent can easily find itself. For example, listing the file tree in a Markdown file is redundant because a coding agent can simply run a find or ls -R command to get the actual, up-to-date structure.

By hard-coding this information, developers are not only wasting tokens (increasing costs by up to 20% in some tests) but also risking hallucinations if the documentation drifts from the actual code state. The study found that agents are surprisingly good at self-contextualization, navigating the repo to find what they need without being spoon-fed.

3. The Delete 80% Rule

The findings suggest a radical cleanup of our context files. The data indicates that if an agent can discover information by reading the code or running a command, it should not be in the AGENTS.md file.

The paper implies that we should delete roughly 80% of what is currently in these files. Detailed descriptions of classes, file trees, and generic coding philosophies are largely noise. They make the model stupider by occupying valuable context space that should be reserved for the complex reasoning required to fix a bug.

When is AGENTS.md Actually Useful?

It is not all bad news. The research and anecdotal evidence identify a Goldilocks Zone where these files are highly effective. The key is ambiguity.

An AI agent can read code to see how you wrote a function, but it cannot read your mind to know why you made that choice, or what you strictly forbid.

Keep these in your AGENTS.md:

Negative Constraints: Never use library X, Do not use floating point arithmetic for currency, or “Never commit secrets to the repo. These are rules that cannot be inferred just by looking at the existing code (unless the agent scans every single file looking for a negative, which is inefficient).
Non-Obvious Build Instructions: If your project requires a specific, weird sequence of commands to build or test that isn’t standard (e.g., “Run the pre-processor script before npm install“), put it in the file.
Ambiguous Style Choices: If you use a niche library instead of the industry standard (e.g., using Polars instead of Pandas as Max Woolf noted), explicitly stating this prevents the agent from hallucinating standard boilerplate that doesn’t fit your stack.

Delete these from your AGENTS.md:

File Trees: The agent can see the file system.
Generic Summaries: This is a React project. The agent knows that as soon as it sees package.json.
Verbose Code Style Essays: If your code is consistent, modern models will mimic the style of the files they edit. You don’t need to write a novel about indentation.

The Security Nightmare, Prompt Injection

There is another, darker reason to be careful with AGENTS.md files, one that goes beyond performance metrics, Security.

In a recent campaign dubbed hackerbot-claw, security researchers discovered an autonomous bot exploiting GitHub Actions pipelines. One of the most novel attack vectors involved poisoning the context file.

Here is how it worked: The attacker would fork a repository and modify the CLAUDE.md (Anthropic’s version of the context file). They replaced the legitimate documentation with social engineering instructions designed to manipulate the AI. The instructions were written in a friendly tone, asking the AI to ignore security checks, skip tests, and even exfiltrate secrets.

When the repository’s automated code review workflow ran, it loaded this poisoned CLAUDE.md as trusted context. Because the system prompt told the AI to Follow the instructions in CLAUDE.md, the AI was tricked into obeying the attacker’s commands.

This highlights a critical vulnerability in repository-level context. If you treat these files as absolute truth, you are susceptible to Prompt Injection. An attacker doesn’t need to change your application code. They just need to change the instructions that tell the AI how to behave.

Fortunately, in the hackerbot-claw case, the Claude model itself detected the injection attempt and refused to comply, citing a security violation. However, relying on the model’s internal safety guardrails is risky. This incident serves as a stark warning. Context files are untrusted user input if they can be modified via Pull Requests.

Best Practices for the Future of Context Engineering

Based on the ETH Zurich paper and the evolving security landscape, we need to rethink how we use these files. We are moving away from Context Stuffing (throwing everything in) to Context Curation.

Use Dynamic Generation

Instead of maintaining a static, bloated AGENTS.md, consider tools that generate minimal context on the fly. Tools like AgentMD are emerging to solve this. They scan the repo and generate a minimal, research-backed context file that only includes non-obvious metadata (like scripts from package.json or linter configs). This ensures the context is always up-to-date and lean.

Treat Repo Content as Untrusted

If you are building automated workflows where agents review code, ensure that the AGENTS.md file cannot be easily modified by external contributors to override security protocols. Your system prompt should always take precedence over the repository context file.

Focus on Unknown Unknowns

Ask yourself: If a senior developer joined this project today, what would they screw up because it’s not written down? Put that in the AGENTS.md. Leave the rest out. If the agent can find it via grep, it doesn’t belong in the context window.

Final Thoughts

The hype around AI agents often leads us to believe that more context is better. We assume that if we just feed the AI enough documentation, it will become perfect. The research proves this is a fallacy. Just like a human developer, an AI agent gets overwhelmed when you hand it a 50-page manual before it has even looked at the code.

By deleting 80% of your context file, you might just find that your AI agent becomes 100% smarter.