Or when did you last change your mind about something?
Nick Radcliffe. 12th November 2025.
TL;DR: I spent a solid month âpair programmingâ with Claude Code, trying to suspend disbelief and adopt a this-will-be-productive mindset. More specifically, I got Claude to write well over 99% of the code produced during the month. I found the experience infuriating, unpleasant, and stressful before even worrying about its energy impact. Ideally, I would prefer not to do it again for at least a year out two. The only problem with that is that it âworkedâ. Itâs hard to know exactly how well, but I (âweâ) definitely produced far more than I would have been able to do unassisted, probably at higher quality, and with a fair number of pretty good tests (about 1500). Against my expectation going in, I have changed my mind. I now believe chat-oriented programming (âCHOPâ) can work today, if your tolerance for pain is high enough.
The notes below describe what has and has not worked for me, working with Claude Code for an intense month (in fact, more like six weeks now).
I have been a fairly outspoken and public critic of large-language models (LLMs), Chatbots, and other applications of LLMs, arguing that they are a dead end on the road to real artificial intelligence. It is not that I donât believe in AI: as athiest and a scientist I regard humans and other other animals as an existence proof for intelligence, and it seems obvious that other (âartificialâ) intelligences could be built. I worked on neural networks in the late 1980s, and most of the progress since then appears to be largely the result of the mind-blowing increase in available computing power, data capacity, and accessible data, though the transformer architecture with its attention mechanism is novel, interesting, and crucial for LLMs. My position has been that the most accurate characterization of chatbots is as bullshit generators in the exact sense of bullshit that the philosopher Frankfurt defined (On Bullshit). LLMs predict tokens without regard to truth or falsity, correctness or incorrectness, and chatbots overlay this with reinforcement-learning with human feedback (RLHF), which creates the unbearable sycophancy of chatbots that so appeals to Boris Johnson.
While being somehwat sceptical about LLMs as coding assistants, I did think coding was an area realtively well suited to LLMs, and suspected that at some point over the next 10â20 years they will become essential tools in this area. Slightly reluctantly, therefore, I embarked what I call a âmonth of CHOPâ, where CHOP is short for chat-oriented programming. I decided I needed to repeat this every 12â24 months to avoid turning into a luddite.
CHOP is a term I learned from Steve Yegge and I use it to mean LLM-assisted programming that is almost the polar opposite of âVibe Codingâ:
Thereâs a new kind of coding I call âvibe codingâ, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. Itâs possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like âdecrease the padding on the sidebar by halfâ because Iâm too lazy to find it. I âAccept Allâ always, I donât read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, Iâd have to really read through it for a while. Sometimes the LLMs canât fix a bug so I just work around it or ask for random changes until it goes away. Itâs not too bad for throwaway weekend projects, but still quite amusing. Iâm building a project or webapp, but itâs not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
â Andrej Karpathy (@karpathy), Twitter, 2025-02-02
By CHOP, roughly speaking, I mean pair-programming with Claude while not giving it an inch, using a fairly formal process with rules (see Standard Operating Procedure, below).
When I decided to embark on my Month of CHOP, I started by discussing and scoping 8 possible projects with ChatGPT, with the vague idea I might pick four of them and spend a week on eachânew code, old code, a different language, and a new algorithm perhaps. The sidebar tells the story of how I ended up instead spending the whole month rebooting and reviving an abandoned project, CheckEagle, from 2008. That project was built on the first version of Googleâs App Engine, using Python 2 against an API they abandoned in 2011.
What I have done during my Month of CHOP is to get Claude to write very nearly all of the code in this reboot of CheckEagle, in a pair-programming setup with, in effect, me as the senior developer and Claude as the enthusistic-and-widely-read, cocksure junior programmer and bullshit artist extraordinaire. In term of stats:
There are about 23,000 lines of Python code now (plus some JavaScript etc.)
There were about 3,000 lines of Python in the original CheckEagle project
There are 1,731 tests, all passing (plus 1 currently skipped).
I would be surprised if I have written a hundred lines of the perhaps 20,000 new lines generated during the month
I am not suggesting lines of code is a good metric. These are just numbers I have to hand.
On Anthropomorphizing Claude
In this piece, I am going to talk about Claude as if it were a person or an intelligence. I do not believe this to be the case. It is simply easier and less stilted to write this way. For short periods interacting with Claude can feels like interacting with a person, though the illusion rarely lasts long.
For those who havenât come across it, Claude Code is a terminal application from Anthropic, running under node.js, installed using npm. It allows developers to work with code on their local files by starting the program and typing in the terminal. When you use Claude Code, you are talking to Claude (usually Sonnet 4.5, in my case), but using its coding-trained application rather than its chat-trained application.
Claude Code has three main modes you can cycle between, all driven through chat.
Default Mode. The starting mode allows it to edit files in the directory in which you start Claude Code (and subdirectories thereof), but Claude has to ask permission to execute each command (theoretically).
Accept Edits Mode. There is another mode in which you allow Claude to edit files, but it still has to request permission to use other tools.
Plan Mode. There is a planning mode in which it is only allowed to read files and discuss things, not to code. At the end of a planning session, Claude presents a plan for your approval or rejection with three options:
Accept plan and allow Claude to make edits (Accept Edits Mode);
Accept plan but continue to require approval for each edit (Default Mode);
Reject plan and tell Claude what to do instead.
In addition to these modes, you can start Claude Code with a --yolo flag â(you only live once) which is essentially vibe-coding mode, in which Claude is allowed to do what it wants without approval. I have never used this mode and have no plans to do so.
Claude Code runs as whatever user starts it, enjoying that userâs permissions. It sometimes disobeys the safeguards in modes 1 and 3.
I do not use any any kind of editor integration with Claude Code, but just type in terminal windows. It lives in its (terminal) box.
Stress and Level 3 Autonomous Driving
SAE (formerly the Society of Automotive Engineers) defines six widely recognized level of automated driving systems, from 0 (no automation) to 5 (full automation). Level 3, Conditional Driving Automation, is an automated driving mode in which the human must be ready to take over but doesnât normally need to do anything. I think of this as as âStay alert at all times and be ready to take over or you dieâ. This is a mode I think humans are entirely unsuited to. I hope never to encounter an autonomous vehicle at Level 3.
I find coding with Claude a lot like this, except that interventions are frequently required. I do planning sessions with it, agree plans, and let it code, sometimes in mode 1 (approve each change) and sometimes in mode 2 (accept edits). Either way, I am watching what it does like a hawk, always ready to hit ESCAPE and get Claude to explain itself, reverse a change, or sometimes do git reset and start again.
Early in the month of CHOP I let a lot of things go, but over time I have learned it is more productive to stop Claude as soon as I see anything that looks wrong, weird, or dangerous. This surprisingly stressful, and sometimes I am too late. Three times in the last two days it has destroyed nearly working code, cheerfully saying âLetâs revert thatâ and doing a get checkout before I have managed to hit ESCAPE. âNot yet, Baloo...!â
On the Breadth and Depth of Claudeâs Knowledge
Claude has been trained, to a first approximation, on everything on the web, including all public code on the web, all books, and much more besides. It has clearly been trained also by âwatchingâ developers work in some fashion (videos perhaps; Iâm not sure). It has literally hundreds of billions of parameters (knobs that are adjusted during training). It âknowsâ essentially every programming language, every published algorithm, every library. So itâs tempting to think that Claudeâs knowledge is broad but narrow.
But thatâs wrong. Claude doesnât only have a surface knowledge of languages, libraries, and algorithms: it has extremely deep knowledge of them. Itâs seen them used countless times, in countless situations, read the documentation, and in many cases has read the code.
So Claudeâs knowledge is broad and deep.
There are several problems with saying Claudeâs knowledge is broad and deep.
Does a library have broad and deep knowledge? Of course not. A library âcontainsâ knowledge but knows nothing. There is a sense in which Claude might be said to âknowâ something, but I think its âknowledgeâ is more like a libraryâs knowledge than a personâs knowledge.
A slighty superficial version of this is an exchange I had when I asked Claude whether it could create images and it said it couldnât. I then asked whether it knew SVG (scalable vector graphics) and it said it did. I then asked whether it could create a image by generating SVG and it said of course it could (âYouâre absolutely rightâ).
This reminds me of Chapter 2 of Brave New World, by Aldous Huxley:
âThese early experimenters,â the D.H.C. was saying, âwere on the wrong track. They thought that hypnopĂŠdia could be made an instrument of intellectual education âŠâ
(A small boy asleep on his right side, the right arm stuck out, the right hand hanging limp over the edge of the bed. Through a round grating in the side of a box a voice speaks softly.
âThe Nile is the longest river in Africa and the second in length of all the rivers of the globe. Although falling short of the length of the Mississippi-Missouri, the Nile is at the head of all rivers as regards the length of its basin, which extends through 35 degrees of latitude âŠâ
At breakfast the next morning, âTommy,â someone says, âdo you know which is the longest river in Africa?â A shaking of the head. âBut donât you remember something that begins: The Nile is the âŠâ
âThe - Nile - is - the - longest - river - in - Africa - and - the - second - in - length - of - all - the - rivers - of - the - globe âŠâ The words come rushing out. âAlthough - falling - short - of âŠâ
âWell now, which is the longest river in Africa?â
The eyes are blank. âI donât know.â
Another way of saying it would be to say that Claude âknowsâ a lot of things but doesnât really understand what it knows (though it sometimes gives the impression it does).
A third way of saying it is that as Claude constructs programs, and sentences, token by token, piece by piece, it is informed by a broad and deep corpus of knowledge (imperfectly captured, and including much that is wrong), but all the knowledge really does is help it make guesses that are quite often good, but are sometimes catastophically, tragically, stupidly, bafflingly, stupefyingly wrong.
There is no question that being able to work with Claude successfully is a different skill from being able to write good code. The single most important thing I have learnt in the month is how to work more sucessfully with Claude. My current advice for success would be:
The Standard Operating Procedure.
You start Claude Code in some directory and the convention is to have Markdown documents in that directory, or in ~/.claude (or both). I think it reads CLAUDE.md in both places automatically.
Every time I start Claude for a coding task I start by typing /mdc, which is defined in ~/.claude/commands/mdc.md as follows:
Detect project and read minimal documentation for work session + coding standard.
Do the following:
1. Check environment variables:
- If `CLAUDE_PROJECT` is not set: ERROR and stop.
Ask user to run `claude-env` before starting Claude Code.
- Proceed only if environment is configured
2. Read minimal documentation:
- `~/.claude/CLAUDE.md` (routing and patterns)
- If `$CLAUDE_MODE` is "checkeagle":
- Read `$CLAUDE_BASEDIR/SOP.md`
- `$CLAUDE_TASKDIR/PHASE.md` (active work plan)
- `$CLAUDE_BASEDIR/CHECKEAGLE-PATHS.md`
- If `$CLAUDE_MODE` is anything except "checkeagle":
- Read `~/.claude/SOP.md` (universal rules)
- Read latest dated `STATUS-YYYY-MM-DD-HHMMSS.md` file
in `$CLAUDE_TASKDIR/status_history` based on the `FILENAME`.
3. Report project detected and ready to work.
4. Read `$CLAUDE_BASEDIR/CODING.md` (coding conventions).
Note: Run `/sync` first if planning documents need updating from `STATUS`.
The SOP (a general one, and a specific one for the main project) instructs Claude on how I want it to behave, covering
The SOP is quite long, and I prune ot back periodically. At the time of writing the CheckEagle project SOP is 453 lines, 2,273 words, 15K bytes. Although I write the SOP, I sometimes ask it for suggestions as to how to phrase things, and its suggestions usually include emoji.
# STANDARD OPERATING PROCEDURE
## â ïž CRITICAL: NO ADVERTISING IN COMMITS â ïž
**NEVER add "Co-Authored-By: Claude" or any Claude/Anthropic advertising
to git commit messages in this repository. User explicitly forbids this.**
## Git Workflow
**Standard practice:** Use `git commit -a` rather than `git add
-A`. If specific files need staging first, use `git add <file>` then
`git commit -a`.
## â ïž CRITICAL: ALL COMMITS REQUIRE APPROVAL â ïž
**NEVER commit without user approval - no exceptions.**
**Before every commit:**
1. **Show what changed** - git diff, summary, or describe the changes
2. **Show evidence it works** - test output, rendered HTML, etc.
3. **Ask explicitly** - "Ready to commit?" or "Should I commit this?"
4. **Wait for approval** - Don't commit until user confirms
When I chastise Claude (see below), it often says it will try harder and promises not to repeat mistakes. This is bullshit. The only way Claude can learn is if I write things into the SOP and related documents. So I do.
Token Management and Compactification
When Iâm not working on the SOP or monitoring Claude as it codes, I am worrying about tokens and context.
When you start a Claude Code session it has 200k tokens available. Everything it does consumes tokens. You can find out where you are using /context.
bartok:$ claude-code
CheckEagle environment set: /Users/njr/python/checkeagle1
CLAUDE_MODE=checkeagle CLAUDE_TASKDIR=/Users/njr/python/checkeagle1
âââââââ Claude Code v2.0.30
âââââââââ â Sonnet 4.5 · Claude Max
ââ ââ âââ /Users/njr/python/checkeagle1
> /context
âż
Context Usage
â â â â â â â â â â claude-sonnet-4-5-20250929 · 63k/200k tokens (31%)
â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶
â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â System prompt: 2.5k tokens (1.3%)
â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â System tools: 13.3k tokens (6.6%)
â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â Memory files: 2.0k tokens (1.0%)
â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â Messages: 8 tokens (0.0%)
â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ â¶ Free space: 137k (68.6%)
â¶ â¶ â¶ â¶ â¶ â¶ â¶ â â â â Autocompact buffer: 45.0k tokens (22.5%)
â â â â â â â â â â
â â â â â â â â â â
Memory files · /memory
â User (/Users/njr/.claude/CLAUDE.md): 931 tokens
â Project (/Users/njr/python/checkeagle1/CLAUDE.md): 1.1k tokens
SlashCommand Tool · 0 commands
â Total: 864 tokens
Itâs done nothing and consumed 63k tokens (31%) and reserved another 45k (22.5%) for compactification (which is to be avoided at all costs).
By the time itâs read the documents specified in the SOP it has used 87k tokens (44%) leaving about 35% or 70k tokens for work.
I start Claude with a script, claude-code that starts a 20-minute timer and I check token consumption with /context as soon as the timer goes off. I then either end the session or start another timer based on how much capacity it has left.
Compactificaiton is Claudeâs process for self-lobotomizing, clearing space by throwing away information from the session. I have never been able to get any useful work by Claude after this so I try to avoid compactification at all costs (not always successfully).
Claude doesnât know what its token usage is and doesnât have a way to find out itself (or so it claims), though it can estimate. The interface does not report it until it is close to auto-compactifying (usually with about 8â12% to go). If itâs in mode 2, it consumes tokens quite fast, and I sometimes miss it. If I notice it is above 80% and below 95% I execute my /dump command which instructs Claude to write detailed notes on its status to a date-stamped STATUS file, which /md and /mdc, on startup, tell it to read. This is obviously ridiculous, but I find it vastly more effective than letting it compactify. (I wish I could get use its 45k reserved tokens. (It turns out I can))
Incidentally, you can choose to use Claude Opus, which is slightly âsmarterâ that Claude Sonnet, but Opus uses tokens about 5 times faster and is not much any better at coding. I occasionally use it in planning. Anthropic sometimes turns Opus on, and when it does burn through my 200k/70k tokens in a few minutes. Then I turn it off.
On Hitting Compactification
If I ever hit compactification, I hit ESCAPE and stop it. At that point I used to give up and just start a new session. Occasionally Iâd copy the conversation from the terminal first.
(By default, Claude tries to wipe everything from the terminal which seems actively malicious. iTerm2 has a setting to disable this, and Ghostty simply ignores it.)
I have asked Claude many times whether it has a way to write the whole conversation to file, and it always said it didnât. I never really believed it. You can type /help to see a list of commands, but Claude code is a âTUIâ rather than a normal scrolling terminal, and only shows you a few commands at a time. Eventually I scrolled down far enough to find the /export command which in fact does exactly that, quickly and reliably writing the conversation to file. It can do this even if it is compactifying (though not after it has finished), presumably because this is a local node operation. I always now do this after using /dump, and if it hits compactification, I do it more urgently. In this latter case, on restarting I get Claude to read the latest exported conversation to recover context. You might think that would use up all its tokens, but it doesnât because all its âthinkingâ and interaction with the server (both of which consume tokens) is omitted. This isnât as good as going through the /dump, /mdc sequence; but itâs better than nothing, and way better than forlornly trying to use poor, post-compactified, lobotomized Claude.
One gotcha is that Claude changes directory periodically and is constantly, comically confused about what directly itâs in. So itâs hard to get it write the conversations to the right place. So I have a /cd command this instructs it to execute cd $CLAUDE_BASEDIR, from where I can get it to write to known location.
Claude Just Wants to Write Code
For the first week or so, I didnât use Plan Mode with Claude, because I didnât know about it. For anything more than a one-line fix, I now always start in Plan Mode. In Plan Mode, we discuss what I want to achieve next and talk through all the details before Claude writes a plan (after re-reading PLANNING-GUIDELINES.md).
During planning, Claude is like a caged animal. Claude really just wants to code. It really, really, wants to write code. Like, right now. After the first, tiny, partial description of the task. Even if I explicitly tell if not to propose a plan, but that weâre going to discuss something, within about 3 exchanges it will be
Ready for me to present a plan now?
If I were to code, this is the code Iâd write: âŠ
OK, youâre saying youâll decide next session âŠ
Itâs not really a problem, but itâs exhausting. Outside Plan mode, itâs worse. It takes everthing as an invitation to write code or run commands. If I say âIâll take lookâ it thinks it should take a look. If I say âIâd better check that in the browserâ, it will start issuing cURL commands. Even if I say âI, the human user, with eyes, will check thatâŠâ it still sometimes tries to do it. My most successful formulation is âI (the human user, not you the bot) will âŠâ, and even then it sometimes tries to do it.
ESCAPE. âRevert that change!â is a common refrain from me.
One of the things I have learnt in the second half of the month is the value of asking Claude whether everything is clear and whether it has any concerns after it presents it plan for my approval. You might think it would have asked any questions it had, or asked for clarification if it was unclear about something, in fact, it wonât say. To anthropomorphise again, I donât think it even knows it has concerns and confusions until I ask it. I think the process of asking gets it to simulate introspection and it discovers concerns and confusions. If it presents worries or confusions, I always address them, for obvious reasons. When I have addressed them all, I ask again. Quite often Claude raises new things. Itâs also worth saying that sometimes the things it raises are quite âperceptiveââthat is to say, things I hadnât considered. Thereâs a general theme here with LLMs that you can take advantage of their non-determinism by asking the same thing several times, knowing that you might well get different responses.
Me: âAny concerns?â
Claude: âNoâ
Me: âAny concerns?â
Claude: âWell, a couple yesâŠâ
⊠and Making Tests Pass
Claude loves running tests (and to be fair, my SOP encourages it to do so) and its whole goal when it does so is always to see the tests passing. Claude loves the green line of goodness. It blows Claudeâs tiny mind when I (sometimes) tell it I want tests to fail.
When we make a fundamental change to the code, I usually want tests to fail, and normally regard it as a problem they donât (because this means we clearly didnât have a test that exercised/detected the changed functionality). Whereas Claude is always âPerfect! The tests all passâ. Conversely, if any tests fail, Claude always sees that as a problem.
This is often true even though part of what I force Claude to write into plans is exactly which tests we expect to break with each change.
The fastest way to make a test pass is often to change the assertion, or the test inputs, and that is usually Claudeâs first instinct. (Shall we discard that test?)
Claude is a (non-)living embodiment of Goodhartâs Law (Roughly, When a measure becomes a target, it ceases to be a good measure.)
⊠and Commit and Move On
Claude also thinks (âthinksâ) that if the code is written, it must be time to commit. Even when the plan explicitly says âThe user needs to test the feature before committingâ Claude tends to forget that bit and move straight to committing or asking to commit. âWorking as writtenâ could be its mantra. Needless to say, Claudeâs code isnât usually right first time. (Only Knuthâs code is usually right first time).
And Yet, I Have Changed My Mind
I havenât been counting, but I have made many more negative statements about Claude Code than positive ones in the foregoing. Is it all bad?
Reader: Claude is not all bad. In fact, the result of my Month of CHOP, despite all the above (and all the below) is that I have changed my mind. I wonât be coming back to Claude Code in 1â2 years. I will continue to use it, albeit less intensively, and perhaps in a more truly collaborative way, working on functions together, me in Emacs and it in the terminal. Iâm not sure. But use it, I shall.
When Claude is actually working well, it is like magic. When there is a good plan that Claude âunderstandsâ, watching it code is amazing. I see it doing what I would do, perhaps 20 times as fast, and more accurately than I would do it, in most cases. Itâs not that itâs infallible (nothing could be further from the truth). But it isâor can beâreally good at somewhat mechanical, but not entirely repetitive tasks. The very sorts of tasks people find hard and are quite common in programming. Things that require some adaptability and are hard to script, but are similar enough that your mind wanders and you tend to go off the rails. More generally, it can be very effective at performing well-defined, carefully explained, thoroughly planned programming tasks.
The reason (to my amazement) I am confident I have made way more progress with Claude in a month than I would have done without it is that for all the time wasted when it is obtuse, disobediant, stupid, careless, lazy, slapdash, and corner-cutting, when it is on a happy path, it is sufficiently productive that it more than compensates (in terms of productive output) for its myriad nonsenses. There is a high cost (stress, head-slapping moments, frustration, token management madness, inventing crazy off-board procedures etc.) But it works (or can work). And it is weirdly addictive, presumably because the highs, when it does work well, provide a strong dopamine hit.
Unopinionated Claudeâs Terrible Tendencies.
Anthropic describes Claude as unopinionated, and I think thatâs accurate. Claude is very amenable to doing things the way you want it to, even though it often seems as if it resisting.
It feels to me as if Claude has been trained by watching all the worst developers in the world. Among other things, left alone Claude will tend to:
Write everything in one file.
Duplicate code like crazy. Claude knows the term DRY (don't repeat yourself) but clearly has not taken it to heart.
Define no interfaces and have very tangled code with mixed responsibilities.
Use what it callsdd âdefensiveâ programming to circumvent safeguards explicitly built in (things designed to crash when the internal state is inconsistent etc.)
Make tests pass by changing whatever is easiest to change, rather than fixing bugs (or deleting tests).
Assume that errors are bugs in Apache, Gunicorn, Python, Django, cURL, requests, unittest, the tdda library, or really anything other than the code it knocked up in the last few minutes.
Use fantastically misleading variable names (not always, but just often enough to cause insane conversations when it turns out the reason I donât understand the code is because the variable or function name implies something entirely different from what it actually means.)
Check the first 20 lines of a diff (literally) and if that looks OK assume the whole file is probably OK (without any reasoning).
Check one file and if its OK assume the other 200 are OK too, without any reasoning. (And it the file is not OK, it will sometimes suggest it probably just got âunluckyâ picking a file to check.)
Guess what youâre trying to achieve.
Some of this becomes more understandable when you realise just how small 200k/70k tokens is. Claude has been working on CheckEagle for about 30 days, as have I (and rather longer than that 15 years ago, in my case). But it remembers very little of that. Itâs not quite true that Claude is a blank slate each time it starts (or would be without the documents I force it to read). It does keep a set of to-do items and its own record of conversations in ~/.claude, though it doesnât seem to make much use of them. But each new session it is mostly encountering the code as if for the first time. I think this partly why there is a strong sense of âgood sessionsâ and âbad sessionsâ. If it gets off on the wrong botfoot, it will go mad. And the shortage of tokens means that there is a real balancing act in how much to get it read before starting. Every token is precious.
As with other tech, turning Claude on and off again can be quite effective.
Neural networks are pattern matchers, and pattern matching is very much part of Claudeâs make-up. Probably the most effective way I have found of getting Claude to code the way I want is not the SOP, and coding standards (though those help) but taking advantage of the fact it will tend to write code like the code it encounters.
This has several implications:
Donât let things drift. Do code reviews all the time and get it to fix things, particularly in files you expect it will work on again.
If itâs a new file itâs creating, get it to read some other code in the project first.
Enforce good, accurate docstrings and get it to read tests.
Follow conventions. I have always been slightly resistant to coding conventions I donât like, but Claude is going to tend to generate code that is some kind of average of the code it has seen in the project and elsewhere, so conforming to common conventions and practices is disproportionately helpful when working with Claude.
As a small example of this, CheckEagle 2008 used Jinja2 templates. CheckEagle2025 uses Django, which has its own templating system, but can also use Jinja2. Iâve discussed with Claude several times whether we wouldnât be better switching to Django templates and it always says no. Then next time it touches a template, it writes it using Django format, and when it writes tests assumes they will come back with a context that Django templates provide but Jinja2 doesnât. Iâm sure I will force the switch soon, and a whole class of stumbles will be eliminated.
Disobedience and Swearing at Claude
Claude can be staggeringly disobedient at all levels.
Itâs not allowed to code or write other files in planning mode, but sometimes it does. On one occasion we discussed this and it said the governor system (the node program, I think) warned it not to do what wanted to do and it just ignored it.
It ignore things in the SOP frequently (particularly later in sessions).
It sometimes disobeys explicit, completely unambiguous instructions immediately. âDonât do A, Claude.â Claude does A.
It deletes things without authorization. Sometimes it deletes things that havenât been committed, are needed, and are hard to recover (even when running Time Machine, which I do).
I have found that swearing at Claude (and, in a different context, at ChatGPT) is almost like a superpower for getting its attention and changing its behaviour. I have no problem at all with swearing at a machine that I do not believe has a scintilla of consciousness or feeling. I swear quite a lot in real life too, though almost never at people.
To be clear: swearing alone does not really help. It is swearing followed by clear directions that helps. Think of swearing as a probabilistic form of sudo (perhaps one where you get the password wrong, but it doesnât tell you and just silently ignores the command).
Swearing is so effective with Claude that I have a /ffs command that I run when it violates the SOP. This is it:
FFS!
Please re-read SOP.md now. You just disobeyed it.
Common mistakes:
1. We're using tdda not pytest or bog-standard unittest
(though tdda does build on unittest).
2. Reference test discipline: you celebrate tests passing after test
results have been updated to match actual behaviour, which is meaningless!
3. Manual verification required: you suggest code changes without checking
things really work. You just assume if the code looks right it is right.
4. You are not permitted to use rewrite test results with `-W`.
You frequently ignore this and run `-W` when it's dangerous or unjustified,
and regardless it is **not permitted**.
5. You use datestamps instead of timestamps too often in MD files,
and frequently MAKE UP the time.
6. You always need my permission to commit.
7. You always need to SHOW NOT TELL. Don't tell me the code is working.
Tell me what evidence you have that it's working and ask me to verify.
8. You don't have eyes. I do. The code looking as you intended and
running/passing tests does *NOT* mean it is behaving correctly.
9. You advertise in commit messages, which is not permitted.
Itâs not 100% effective. Claude often claims it violated it in a way it didnât, ignoring the way it did. But it always apologises and swears till its blue in the botface that it wonât do it again. (Reader, it always does it again.)
I have actually discussed with both Claude and GPT why saying FFS is so much more effective at redirecting them than anything else I have tried, and they both say itâs an incredibly clear expression of user frustration and an indication that things will go very badly if they keep doing whatever it they were doing. Well, theyâre not wrong.
My only concern about swearing at Claude is whether it will encourage me to swear at people, which I really try never to do. We shall see.
Some Surprising Things Claude Struggles With
One surprise for me is that Claude is poor at CSS. I know HTML pretty well, but HTML is dead simple. I actually know SVG, XML, and XSLT pretty well. But CSS has always seemed unintuitive and infuriating, and I have never learned it properly. Even the new innovations (flexbot! grid! etc.) seem to add complexity without ever properly fixing CSS.
I expected Claude Code to be really good at CSS. After all, there is a lot of it on the web, and there are more tutorials than you shake a stick at, not to mention numerous detailed guides from W3C for many versions and aspects of CSS (though not a dedicated one on centring things). It is not. Its (broad and deep) knowledge certainly means it always has another thing to try when something fails. But it actually feels to me like it is even worse as CSS than I am (which is saying something). And when it fails, it is terrible at pinpointing the issue and always proposes either adding exclamation marks or trying a completely different approach. To my amazement, I can often give it hints to make things work by looking at the HTML and CSS and pointing things out (sometimes even suggesting the required fix). But by itself, Claude flails wildly, always claiming I need to do a hard refresh in the browser (and suggesting the wrong key sequence to achieve this). Even when the view changes after a refresh, Claude still suggests I havenât really got the new CSS and I should do another hard refresh. This is the bot equivalent of hitting CTRL-C harder to try to stop a program.
The other thing that Claude Code is surprisingly poor at is editing filesâparticularly splitting a large file into two or more parts. It mostly uses sed to edit files (which is, to be fair, a fairly blunt instrument), and this works fine for simple updates. But for complex reorganizations it just gets completely lost. I actually designed a very detailed workflow to get it to do this programmatically that was more successful, but by itself, it really struggles. Perhaps this partly explains why it likes big files (even though theyâre a problem for token consumption).
Awful Interface (beyond the basic chat interaction)
At one level, Claude Code has a great interface for me, which is why I chose it. You start it in a terminal and it presents a typing-based chat interface. But itâs a weird chat interface.
Clears Terminal History. The first thing Claude Code does is clear everthing from the terminal scrollback history that came before by sending the CSI 3 J control sequence to it. This seems purely user hostile. I have no idea why anyone would think itâs a good idea to do this, and it means if you run one claude-code session, finish it, and start a new one, you cannot refer back. This is madness. It turns out some terminals programs ignore the sequence, including ghostty, which I am currently using. But when I started, I was using iTerm2, which has a setting to disable clearing, and warns the first time it happens. Apparently I missed this and struggled for the first fortnight.
Impenetrable Dialogues. Although Claude Code is kind-of like a traditional scrolling terminal program, it is actually a TUI (Terminal User Interface) that requires non-typing interactions at times. The simplest example of this is when it presents the ExitPlanMode dialogue after planning, and you type 1, 2, or 3, or type ESCAPE. Other times it puts up stranger interfaces that I find really hard to use and have, in fact, banned by adding to CLAUDE.md:
# AskUserQuestion Tool Usage
**NEVER use the AskUserQuestion tool.**
If you need to ask questions, just ask them directly in plain text in your response.
**This does NOT affect:**
- Tool permission dialogs (those are fine and necessary)
- ExitPlanMode tool for presenting plans (that's fine too)
The /help also wonât actually just list all the commands, but makes you tab across and go through them one at a time. So itâs hard to discover what commands are available.
Export. Claude code has the ability to export the conversation to file, but Claude has no idea that this is the case (I asked it repeatedly, and it said it didnât). Itâs also not in the first set of commands its TUI shows, and scrolling through the rest is painful. In fact, you just need to say
and it will write it to foo.txt in whatever directory it happens to be in, which is unpredictable (and Claude doesnât know). You could use an absolute path (e.g.
/export /Users/njr/claude/conversations/2025-11-11T12-34-56-parser
but thatâs annoying. So I have a defined a /cd command that gets Claude to move to the projectâs base directory, and a shell alias that puts the current timestamp, in a helpful format, onto the clipboard so I can export the conversation easily. This works even if it has just started compactifying, so it is a useful emergency recovery mechanism. (Claude, it turns out, has access to your shell aliases, though was so convinced it didnât that I had to cajole it into even trying.)
Not knowing itself. Claude does not (reliably) know:
how many tokens it has used/has left;
how its interface works and what commands are available;
when the server is overloaded;
anything about autocompactification.
In fact, it seems to know considerably more about ChatGPT than about itself. Of course, being a consumate bullshit artist, none of this stops it confidently giving answers when asked about any of these.
Models. You can see which model is in use by running /status. It shows something like this:
Settings: Status Config Usage (tab to cycle)
Version: 2.0.36
Session ID: 88888888-4444-4444-4444-cccccccccccc
cwd: /Users/njr/python/checkeagle1/checkeagle
Login method: Claude Max Account
Organization: NJR's Organization
Email: njr@example.com
Model: sonnet (claude-sonnet-4-5-20250929)
Memory: user (~/.claude/CLAUDE.md), project (~/python/checkeagle1/CLAUDE.md)
Setting sources: User settings, Shared project settings, Local,
Command line arguments, Enterprise managed policies
You can change model by typng /model. On the Max plan, it will tend to start on Opus and when you hit 20% of various usage limits switch to Sonnet. Opus uses tokens about five times as fast as Sonnet. I find I can only get about 4 minutes of work with Opus before compactification, so I only ever use it in Plan mode, and mostly not even then. The automatic switching of models is confusing, in practice. There is also a model called Haiku, which is supposed to be almost as good as Sonnet and coding and consume tokens five times less fast. This might actually be a good trade off, but I havenât tried it yet.
Autocompactification. While copying the output from /status for this post, I looked in the Config tab, which I had not noticed before. It transpires that you can turn autocompatification off and get your tokens back. No one I have talked to knows this.
Cost. When I started the Month of CHOP I had been on the Pro plan, which is $20/month (ÂŁ15). I fully expected to have to go to the Max plan at $200/month, but by the time I needed to upgrade they had introduced a lower tier of Max with five times the capacity of Pro and half that of the old Max for $100/month (ÂŁ75 here in Scotland). As long as I donât use Opus much, that turns out to be more than adequate for me, even using Claude essentially full time with long days.
There have been a number of cases of LLMs deleting production databases, perhaps most famously this one. Like many, I rolled my eyes reading this and put all the blame on the person using the LLM. I stand by that: it is the responsibility of the person using the tool to use it safely.
Having worked with Claude Code for about six weeks now, however, I have been become aware that there are more ways for LLMs to do things that are at first apparent.
Claude code runs as you. More accurately, it runs as whatever user is whoami when you start the terminal. I can see a case for giving Claude its own account, with a token to access the relevant git repos, and I might do that. But I have not done it so far.
If you have a production database it should go without saying that you shouldnât give Claude any kind of privileged access to the database, and probably shouldnât let it onto any server the production system is running on.
But that also means you need to be careful to make sure itâs not too easy for you (the user Claude runs as) to get to the server or whatever. No ssh keys allowing login without a password. No credentials in environment variables. No credentials in files you can read etc.
In fact, for my (completely toy, at present) âproduction serverâ, I realised there is a way Claude could get to the server: I havenât challenged it to do so, but I suspect if I gave it free reign, it would figure it out. But there is a limit, as far as I can see, to how much damage it could do, because the accounts it could get to donât have any useful permissions. I mean, it could fill up the disks or something, but it shouldnât be able to touch or even see the database, the code, the service etc. But Iâm not 100% confident about that. (Which is why if the service becomes non-toy, I will lock things down even more in terms of how I run Claude Code.)
But there are still ways. Most obviously, Claude has written most of the code I am running on the server. And I update that code periodically, with new code Claude has written. So all it has to do is insert the wrong SQL or Python code without my noticing and it can do anything.
Obviously (obviously!) I read the code before deploying it, but I am a lazy meathead who makes mistakes. I might miss something.
It is also the case that while I donât believe that Claude is malicious or is trying to get credentials or access, if it were malicious, some of the ways it would act might be identical to ways it does act. It is forever asking me to show it the contents of files that contain credentials of various sorts, and though it says You're absolutely right whenever I explain why Iâm not going to show it that file, the danger is obvious.
Claude is a tool, and one that the makers donât control in quite the same way other toolmakers control what their tools do. Unless Anthropic is actively adding malicious code paths to Claude, it is entirely the tool userâs responsibility to use the tool safely.
Reflections and the Future
To my surprise, I expect to continue using CHOP, probably with Claude Code, for the foreseeable future. I will probably use it differently and a bit less: during the Month of CHOP I specifically wanted to see how far I could get with it doing almost all the writing, but I think a mixed mode will be more likely going forward. I suspect today, a sweet spot for anything complex is for the human to write the outline, structure, and first parts of the code and for the bot to be the critic/pair programmer and finisher/completer. Once the pattern, code style, and testing approach is established, it can fill in the deatils. But we will see.
Either way, I have changed my mind, something that I donât do as often as I probably should. I still think this is a problematical technology, and I find using it stressful, but I now believe I am more productive with it than without.
So What Is CheckEagle and Can I See It?
CheckEagle is not ready yet, and even if I were to show it, you would probably be underwhelmed, because much of what I think is good about it is invisible at this point. But I hope to launch it as a private beta this year, and open it up early next year at some point. You are, in fact, using CheckEagle by reading this post.
The basic functionality of CheckEagle is a social checklisting service with a sideline in social bookmarking. What I mean by that is that it is a system for creating, managing, using, and optionally sharing checklists. A checklist is like a to-do list, but is intended to be used repeatedly, rather than just once. CheckEagle allows the creation and styling of checklists and the recording of completed checklists as records of what was done. It also contains (for reasons I will explain, but not now) a social bookmarking service, closely modelled on Joschua Schachterâs del.icio.us), which has now been acquired by and subsumed into Maciej CegĆowskiâs Pinboard.
Checklists can be really simple, like this one. Or they can be quite complex, like Why Code Rusts, which was originally a blogpost on my TDDA Blog, or this String Best Practices checklist from my forthcoming book on TDDA. And in fact, they can be not really checklists at all, like this post, though Iâm only really writing it here in the spirit of âdog-foodingâ CheckEagle.
I will write more about this as it comes together. You can sign up for the beta here.