Open problem 28 open problem
Capability without understanding: brute-force LLM PRs
The problem
A pattern I keep running into, and don’t yet have a clean answer to. Someone — often newer, or just new to this domain — opens a PR that brute-forces a solution with an LLM without understanding the system’s complexity, the domain’s real problems, the trade-offs, or why the obvious approach was avoided. They lean on the model to produce both the answer and the change, and they ship (or propose) something far beyond what their own understanding actually supports.
Before LLMs, attempting something past your depth was self-limiting in a useful way. You’d grind through the implementation, hit the walls the domain puts up, and learn — the struggle was the teacher, and the gap between ambition and ability was a teachable moment a reviewer or mentor could step into. Now the model smooths the walls away. The code appears, plausibly shaped, and the learning that used to happen in the friction simply doesn’t. The author ends up holding a change they can’t reason about, can’t defend in review, and can’t maintain — and frequently doesn’t realise it.
It looks like the Dunning–KrugerThe Dunning–Kruger effectA cognitive bias in which people with limited competence in a domain overestimate their own ability, because the very lack of expertise that produces poor decisions also denies them the metacognitive ability to recognise those decisions are poor.en.wikipedia.org ↗ effect, but amplified: the tool grants capability without the competence that capability used to require, so the feedback that would normally calibrate someone’s confidence — this is hard, I’m clearly struggling, I don’t get this yet — never fires. Power tools, and no map of the terrain.
Why it’s hard (the tension)
This isn’t simply “stop bad PRs.” Two things I care about pull against each other:
- I want to stop un-understood code. A change whose author can’t reason about its edge cases, trade-offs, or failure modes is a liability — it can’t be reviewed substantively, can’t be maintained, and quietly erodes the team’s collective understanding of its own system.
- I want to keep teaching, and I don’t want to ban the tool. The person who reached too far is doing the thing we want — trying. LLMs are genuinely good, genuinely democratising, and I don’t want to be the luddite who slows everyone down or shames the attempt (ZFN-27).
The cruel part is that the mechanism that used to build understanding — hands-on struggle — is exactly what the tool removes, and the detachment from the code is only increasing. The teaching moment is disappearing right when it’s most needed.
What I’ve considered (none of these is the answer)
- The co-sign bar. ZFN-26 already says you must be able to defend every line as if you wrote it. Necessary, and it filters the worst cases — but it polices can you defend the words, not do you understand the domain, and a good model can help someone rehearse a defence they don’t really hold.
- Review for understanding, not just correctness — ask the author to walk through the trade-offs and why this approach. Useful, but it scales poorly, can land as a gotcha, and the same tool can coach someone past the quiz.
- Reject the change, not the person. Decline the un-understood PR while keeping the door open and the tone kind — but that still leaves the actual goal (they learn) unmet.
- Turn the LLM into the teacher, not just the producer — have it explain the domain, the trade-offs, the prior art, and quiz the author. Promising, but it depends on the author wanting to learn rather than to ship.
- Separate exploration from production. Brute-forcing a prototype is fine; brute-forcing a change to a system people depend on is not. A useful line, but a lot of harm lives in what gets proposed as production.
- Invest in written domain context (ZFN-1) so both the author and the model have the why. Helps the model and the newcomer — but doesn’t substitute for earned experience.
Where I’m stuck
- “Just make people understand” doesn’t scale, and the friction that used to force understanding is the very thing the tool removes. I don’t have a replacement mechanism for the learning that happened in the struggle.
- I can’t reliably distinguish, in review at scale, genuine understanding from LLM-laundered confidence.
- The dial between “gatekeep too hard” (lose the velocity, accessibility, and the people who do learn with AI) and “gatekeep too little” (a codebase full of code no one on the team understands) is real, and I don’t know where to set it — or how to set it per-person and per-change.
- It is getting harder, not easier, as people detach further from the code.
What a good answer looks like
A norm or workflow that keeps the velocity and accessibility of these tools while ensuring — and growing — real understanding: the LLM as an amplifier of learning, not a substitute for it. Something that makes “understand before you ship” enforceable and teachable without gatekeeping people out or shaming the attempt. When I have that, this becomes a proper Field Note. Until then, it’s an honest open question — and if you’ve solved it, I’d genuinely like to hear how.
References
- ZFN-26 — the co-sign bar (defend every line) is part of the answer, but only part.
- ZFN-27 — reject the un-understood change without punishing the person for trying.
- ZFN-1 — written domain context helps the newcomer and the model, even if it can’t replace experience.
- Dunning–KrugerThe Dunning–Kruger effectA cognitive bias in which people with limited competence in a domain overestimate their own ability, because the very lack of expertise that produces poor decisions also denies them the metacognitive ability to recognise those decisions are poor.en.wikipedia.org ↗ — the bias this seems to be, amplified by capable tools.
Changelog
- 2026-06-12: Opened as an open problem.