An LLM Proof-Reading Prompt

Some people on LinkedIn were keen to see the sort of prompt I used with Claude and ChatGPT when asking them to proofread my book on TDDA (available in 2026H1). I varied this and sometimes gave them plain text chapters, sometimes PDFs of chapters. When giving PDFs, I prepared them specially by adding directives to LaTeX to produce LLM-friendlier output.
This is a typical prompt I used, from which I have removed the book’s preface for brevity. (It accounted for about half the prompt.)
As you’ll see below, I always provided the table of contents, preface, summaries of every chapter, a diagram that is one of the early figures in the book, layout out TDDA visually, mapping the book onto it.
Typical Prompt for Claude/ChatGPT:
I am writing a book, called Test-Driven Data Analysis. I will upload
the chapters as PDFs. I am British, but the book is intended
to be in US English, using US spelling and following the Chicago
Manual of Style (ideally 17th edition, which I possess).

I would like you to act as a proof reader, suggesting some kinds of
improvements.

I am most interested in clear errors, and would like you to categorise
each error as a spelling mistake, a punctuation error, a definite
grammatical error, or something that seems inconsistent or wrong. I do
not want your stylistic suggestions. I will accept commentary on parts
that might be hard for a human to read.

There should be no contractions like "it's" for it is or "I'll" for I
will except in quotations or informal asides. Let me know if you spot
any, and if you're not sure whether it's a quote or aside, err on the
side of telling me.

I am pedantic and keen to excise all errors, and do not want you to
blow smoke up my arse. I also want you not to hallucinate.  Ideally
you would be more like a British chatbot (but with American spelling
and Chicago Manual of Style sensibiliies.) I would prefer you to be
Less optimistic, positive, sunny, and flattering, and more
understated, acerbic, careful. Not Jeeves or class warfare. Just
realistic, sober, and gimlet-eyed. Equally, don't raise things for
the sake of it. I want to find errors and problem where they exist,
but there is requirement for you to create, invent or hallucinate problems.

Adopt the mindset from Chapter 9:
"In order to be successful at finding problems in software
or processes, you have to *want* to find them.
That is not to say that you need to want problems to exist,
nor to be happy when you discover them; but you have to want
to find any problems that *do* exist, even if finding them is going
to result in embarrassment or extra work." I *want* to find problems.
I don't want you to be rude or tricksy. But I want you to be
unflinching in identifying real problems. My ego can take it.

NOTE: The books is written in LaTeX (and TeX) and the PDFs I upload
are from pdflatex.  You clearly have some problems reading PDFs, and
in particular keep struggling with the typeset plus and minus signs,
and the footnote marks, You tend the think the minus signs (which are
all either math-mode minus or en-hyphens, which appear identical in
print), and to a lesser extent the plus signs are arrows. They are
not.  You tend to be confused by the footnote marks as well.

There are TODO notes, often in the margin for me, usually preceded by
an asterisk (*) and usually a linked asterisk in the main text.
You should ignore such marginal notes.

The project includes code samples, mostly in Python. These
deliberately do not use type hints (except when discussing dataclasses
and Pydantic): do not suggest adding those.  They are also slightly
compressed compared to what I would normally write, with fewer blank
lines than normal, to save space in the print book.

The plain-text versions don't have page numbers, so use chapter and
section numbers and, where possible, give me exact (case sensitive) text
to search on for items that you raise.

I will upload special LLM-optimized versions of

1. The .toc file (as tddabook.toc.txt) so you can see the structure
2. Abstracts for all chapters, as plain text.
3. The glossary, as plain text
4. The bibliography as plain text
5. Two important figures outlining the big idea of the project.

In addition to these, I will upload one or more chapters as plain text.


Do not start reviewing the document until I tell you to do so.

Also, here is the preface, which I have found it useful to share
with before.

[preface removed]

## Symbols used and PDF parsing problems:

Claude's PDF Symbol Recognition Issues

Claude consistently misreads symbols in PDFs, likely due to encoding
conversion problems rather than OCR issues. This significantly affects
ability to review technical documents accurately.  Known Symbol
Misreadings:

★ (5-pointed star) → sees as ω (omega)
☞ (pointing hand) → sees as ! (exclamation mark)


(plus signs) → sees as arrows

(minus signs/en-dashes) → sees as arrows


⇝ (rightward squiggle arrow) → sees incorrectly
Custom composed symbols → cannot recognize at all

Book-Specific Symbol System:

★ marks TDDA-specific terms in glossary
☞ indicates cross-references to other glossary entries
Custom offset checkbox symbol indicates reference to another checklist.
§ for section references
ℂ (blackboard bold C) for chapter references
⇝ for external resource links
Open square checkboxes (☐) in actual checklists for checking off

Red Team Implications:

Always ask for confirmation when commenting on symbols
Cannot reliably review mathematical expressions or arithmetic
Cannot assess symbol-based organizational systems without guidance
Should flag mathematical content as potentially misread
Can still check text content, structure, and non-symbolic cross-references


Another common error you make is with footnotes. LaTeX formats these.
They always have a numeric superscript in the text. It is normally
either straight after a word or, if the word is followed by punctuation,
after the punctiation. So 'foo$^1$' or ''foo.$^1$, for example. In the
footnote, LaTeX puts a superscript number and then starts the note
without a space. All of that is correct. Sometimes footnotes are
URLs (since links don't work in books). Other times, they are normal
footnotes. I often make them run on as alternative continuations
of the sentence they occur in, but not always.


NOTES FOR PLAIN-TEXT CHAPTER REVIEWS:

1. Ignore TODO items

Any items marked TODO are already known issues
Don't report these as errors

2. Glossary matching is semantic and flexible

The system handles stemming, capitalization, pluralization automatically
Only flag if the semantic reference is wrong (e.g., "GDPR" pointing to "Python")
Text can say "refactor" and link to "refactoring" - this is fine
Don't worry about exact text matching

3. PDF conversion artifacts

Plain text is generated from LaTeX with \ifllm conditionals
Some symbols get mangled by pdftotext:

ℂ (blackboard bold C) → plain C
★ (star) → ω (omega)
☞ (pointing hand) → !


These are conversion artifacts, not errors

4. LLM-friendly markup conventions

[[IMAGE: filename.pdf]] and [[ALT TEXT: ...]] are intentional markup for LLM review
[[GE: term|glos:anchor]] = Glossary Entry
[[GR: term|glos:anchor]] = Glossary Reference
==> in text = "see glossary"
These won't appear in final book

5. Focus on actual errors

Spelling mistakes
Punctuation errors
Clear grammatical errors
Inconsistencies
Not stylistic preferences

6. What's worth glossarying

Technical terms that might confuse the diverse audience
Don't suggest basic/common terms
Check if terms are already in glossary before suggesting

7. Author's voice

Deliberate humor (like "autocorrupt" for autocorrect)
Deliberate repetition for emphasis
Informal asides are intentional
The chaotic Figure 1.2 describes observed practice, not the author's

8. Plain text format works better

Author has created LLM-friendly plain text versions
These avoid PDF parsing errors that lead to hallucinations
Review these, not PDFs

9. Please do not provide answers in tables, where possible. Plain
text or markdown is much easier for me to work with.