New ETH Zurich Study Proves Your AI Coding Agents are Failing Because Your AGENTS.md Files are too Detailed

In the high-stakes world of AI, ‘Context Engineering’ has emerged as the latest frontier for squeezing performance out of LLMs. Industry leaders have touted AGENTS.md (and its cousins like CLAUDE.md) as the ultimate configuration point for coding agents—a repository-level ‘North Star’ injected into every conversation to guide the AI through complex codebases.

But a recent study from researchers at ETH Zurich just dropped a massive reality check. The findings are quite clear: if you aren’t deliberate with your context files, you are likely sabotaging your agent’s performance while paying a 20% premium for the privilege.

Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space

Google starts rolling out Gemini in Chrome to users in Canada, India and New Zealand

https://arxiv.org/pdf/2602.11988

The Data: More Tokens, Less Success

The ETH Zurich research team analyzed coding agents like Sonnet-4.5, GPT-5.2, and Qwen3-30B across established benchmarks and a novel set of real-world tasks called AGENTBENCH. The results were surprisingly lopsided:

The Auto-Generated Tax: Automatically generated context files actually reduced success rates by roughly 3%.
The Cost of ‘Help‘: These files increased inference costs by over 20% and necessitated more reasoning steps to solve the same tasks.
The Human Margin: Even human-written files only provided a marginal 4% performance gain.
The Intelligence Cap: Interestingly, using stronger models (like GPT-5.2) to generate these files did not yield better results. Stronger models often have enough ‘parametric knowledge’ of common libraries that the extra context becomes redundant noise.

Why ‘Good’ Context Fails

The research team highlights a behavioral trap: AI agents are too obedient. Coding agents tend to respect the instructions found in context files, but when those requirements are unnecessary, they make the task harder.

For instance, the researchers found that codebase overviews and directory listings—a staple of most AGENTS.md files—did not help agents navigate faster. Agents are surprisingly good at discovering file structures on their own; reading a manual listing just consumes reasoning tokens and adds ‘mental’ overhead. Furthermore, LLM-generated files are often redundant if you already have decent documentation elsewhere in the repo.

The New Rules of Context Engineering

To make context files actually helpful, you need to shift from ‘comprehensive documentation’ to ‘surgical intervention.’

1. What to Include (The ‘Vital Few’)

The Technical Stack & Intent: Explain the ‘What’ and the ‘Why.’ Help the agent understand the purpose of the project and its architecture (e.g., a monorepo structure).
Non-Obvious Tooling: This is where AGENTS.md shines. Specify how to build, test, and verify changes using specific tools like uv instead of pip or bun instead of npm.
The Multiplier Effect: The data shows that instructions are followed; tools mentioned in a context file are used significantly more often. For example, the tool uv was used 160x more frequently (1.6 times per instance vs. 0.01) when explicitly mentioned.+1

2. What to Exclude (The ‘Noise’)

Detailed Directory Trees: Skip them. Agents can find the files they need without a map.
Style Guides: Don’t waste tokens telling an agent to “use camelCase.” Use deterministic linters and formatters instead—they are cheaper, faster, and more reliable.
Task-Specific Instructions: Avoid rules that only apply to a fraction of your issues.
Unvetted Auto-Content: Don’t let an agent write its own context file without a human review. The study proves that ‘stronger’ models don’t necessarily make better guides.

3. How to Structure It

Keep it Lean: The general consensus for high-performance context files is under 300 lines. Professional teams often keep theirs even tighter—under 60 lines. Every line counts because every line is injected into every session.
Progressive Disclosure: Don’t put everything in the root file. Use the main file to point the agent to separate, task-specific documentation (e.g., agent_docs/testing.md) only when relevant.
Pointers Over Copies: Instead of embedding code snippets that will eventually go stale, use pointers (e.g., file:line) to show the agent where to find design patterns or specific interfaces.

Key Takeaways

Negative Impact of Auto-Generation: LLM-generated context files tend to reduce task success rates by approximately 3% on average compared to providing no repository context at all.
Significant Cost Increases: Including context files increases inference costs by over 20% and leads to a higher number of steps required for agents to complete tasks.
Minimal Human Benefit: While human-written (developer-provided) context files perform better than auto-generated ones, they only offer a marginal improvement of about 4% over using no context files.
Redundancy and Navigation: Detailed codebase overviews in context files are largely redundant with existing documentation and do not help agents find relevant files any faster.
Strict Instruction Following: Agents generally respect the instructions in these files, but unnecessary or overly restrictive requirements often make solving real-world tasks harder for the model.

Check out the Paper. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post New ETH Zurich Study Proves Your AI Coding Agents are Failing Because Your AGENTS.md Files are too Detailed appeared first on MarkTechPost.

Credit: Source link

New ETH Zurich Study Proves Your AI Coding Agents are Failing Because Your AGENTS.md Files are too Detailed

Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space

Google starts rolling out Gemini in Chrome to users in Canada, India and New Zealand

Related Posts

Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space

Google starts rolling out Gemini in Chrome to users in Canada, India and New Zealand

Social Security watchdog investigating claims that DOGE engineer copied its databases

NVIDIA AI Releases Nemotron-Terminal: A Systematic Data Engineering Pipeline for Scaling LLM Terminal Agents

Challenges in the Nancy Guthrie investigation

Leave a Reply Cancel reply

Search

After U.S.-Israeli offensive, Iran unleashes attacks on multiple Arab countries

How Congress is reacting to the strikes in Iran

Live updates: Aaron Judge, Roman Anthony homers surge Team USA baseball into lead vs. Mexico in WBC 2026 – Times Union

About

Legal

Bloggers

Contact

New ETH Zurich Study Proves Your AI Coding Agents are Failing Because Your AGENTS.md Files are too Detailed

YOU MAY ALSO LIKE

The Data: More Tokens, Less Success

Why ‘Good’ Context Fails

The New Rules of Context Engineering

1. What to Include (The ‘Vital Few’)

2. What to Exclude (The ‘Noise’)

3. How to Structure It

Key Takeaways

Related Posts

Leave a Reply Cancel reply

Search

About

Legal

Bloggers

Contact