The context-window mistake most consultants make with Claude

Here's a pattern most solo consultants who use Claude daily will recognize. You open a fresh conversation Monday morning, work on a client deliverable, get great output. By Wednesday afternoon, in the same conversation, Claude is producing weaker drafts — repeating earlier mistakes, contradicting things you established on Monday, missing the voice rules you set up. By Friday, you've started a new conversation because the old one feels stuck.

The conversation didn't get dumber. You stuffed it past its useful capacity. Claude has a context window — the total amount of conversation, files, and instructions it can hold and reason over at once. Modern Claude has a generous one, but generous isn't infinite, and the way solo consultants typically work guarantees they'll hit the wall on long engagements.

This article is what the context window actually is in practice, the three failure modes that produce muddled output, and the discipline that keeps long workflows sharp.

What the context window actually is

The simplest mental model: every Claude conversation has a fixed-size working memory. Everything in the conversation — your messages, Claude's responses, any files you uploaded, any system instructions or skills loaded — counts against that memory. When the memory fills up, the oldest content gets dropped or compressed, and the conversation starts losing precision.

Modern Claude handles a substantial amount of context — hundreds of thousands of tokens in a single conversation. For a solo consultant, that's roughly equivalent to several hundred pages of mixed input/output. Sounds like a lot. It isn't, once you've been working a single engagement for six weeks in the same conversation, with uploaded transcripts and accumulated back-and-forth.

The failure mode isn't reaching the absolute cap. It's reaching the useful capacity, which is much lower than the cap. Claude can technically remember everything in a 100,000-token conversation, but it can't reason over all of it equally well. Important earlier instructions get blended with later context; nuance from early in the conversation degrades into approximation; the model starts confabulating connections between unrelated parts of the history.

The threshold where this starts is well before the absolute limit. The threshold where it gets bad is roughly when the conversation contains more accumulated context than the specific task in front of you requires.

The three failure modes

Failure mode 1: One long conversation for an entire engagement. The consultant starts a Claude conversation on day 1 of the engagement and keeps using it for everything related to that client — discovery debrief, scope work, weekly updates, deliverable drafts, meeting summaries. By week three, the conversation is a soup of unrelated tasks. Claude pulls phrasing from the discovery debrief into the deliverable draft. The weekly update accidentally references context from a meeting summary that's six weeks old. Outputs get fuzzier without obvious cause.

Failure mode 2: Re-pasting context that's already in the conversation. The consultant re-pastes the kickoff brief at the top of every new task because they don't trust the conversation to remember it. The kickoff brief now appears three or four times in the same conversation. Claude weights it more heavily than it should; the brief crowds out more recent context that matters more. Earlier paste also makes the conversation longer overall, accelerating the slide into muddled output.

Failure mode 3: Carrying conversations across days that should have been new. The consultant pauses a conversation Friday and resumes it Monday with "OK, where were we?" Claude reconstructs by re-reading the entire history, which costs context budget. The fresh Monday work then has to share working memory with everything from Friday — most of which doesn't matter for Monday's task. Outputs are slightly worse than they would be in a fresh conversation, even though Monday's task is independent.

All three failure modes share a root cause: treating the Claude conversation as a notebook (persistent, append-only, everything in one place) when it's actually closer to a working memory (sized for the immediate task, expensive to keep loaded).

The discipline that keeps it sharp

The fix is operational, not technical. Three habits separate consultants who get consistent Claude output from those who don't.

Habit 1: One conversation per task, not per engagement. A discovery debrief is its own conversation. The weekly client update is a fresh conversation each Friday. The deliverable draft is its own conversation, separate from the meeting summaries that informed it. The engagement itself lives in a Project (or a folder of saved outputs); each task gets a clean conversation that ends when the task ends.

This sounds wasteful. It isn't. Fresh conversations get better outputs because the working memory is sized for the immediate task. The cost of "starting over" is paying ~30 seconds of context-paste at the top of each new conversation. The benefit is consistently sharp output across weeks of work.

Habit 2: Persist context outside the conversation, not inside it. The engagement's accumulated context (the kickoff brief, the client's industry notes, the running list of open items) lives in a file or a Project. You reference it as needed by uploading or pasting the relevant chunk at the start of a new conversation. You don't carry it forward by leaving it in an open conversation.

This is the move most consultants don't make. They treat the Claude conversation itself as the place where the engagement lives. Wrong shape. The engagement lives in clients/<name>/ (a folder of Markdown files) or in a Claude Project (with uploaded files). The conversation is the working surface for one task.

Habit 3: When the conversation feels muddy, start a new one. This is the highest-value habit and the hardest to install. The instinct when output gets weak is to push back — "that's not quite what I meant, try again." The right move when output gets weak is to open a fresh conversation, paste the minimum context for the task, and try again with a clean working memory. The output usually improves immediately.

The signal that a conversation has gone muddy: Claude repeats mistakes you've corrected, contradicts instructions you set up earlier, or produces output that feels generic when the conversation started sharp. Any of those, start fresh. Don't try to fix the conversation; replace it.

What "muddy" actually looks like in practice

It helps to have concrete signals for when a conversation has gone past its useful capacity. Five of them, in roughly the order they appear:

The model contradicts an instruction you established earlier. You told it on Monday to keep outputs under 300 words. By Wednesday, it's producing 600-word drafts. The instruction is still in the conversation; it's just no longer weighted strongly enough to influence output. Start fresh.

The model invents details that weren't in the inputs. A meeting summary contains action items nobody assigned. A scope draft includes a deliverable the client never mentioned. This is confabulation, and it almost always means the conversation has accumulated enough context that the model is reaching for plausible-sounding completions instead of grounded ones.

The model becomes oddly formal or oddly casual. Voice drift in either direction is a context-window signal. Your skill or initial prompt established a voice; if Claude's outputs are drifting away from it, the voice rules have been crowded out by accumulated content.

Re-asking a question produces a different answer. You asked Claude on Tuesday for a take on the engagement's biggest risk; you ask again Thursday in the same conversation and get a different risk, with different reasoning. The model is sampling from a muddled state.

You find yourself adding "as I mentioned earlier" qualifiers to your prompts. This is the consultant's tell. If you're working around the model's memory by re-establishing things you already said, the conversation has reached the point where its working memory is less efficient than starting fresh.

When two or more of these signal at once, the cost of starting a new conversation is lower than the cost of continuing the current one.

When persistent state actually is the right call

There's one exception worth naming. Some workflows genuinely benefit from a single long-running conversation: workflows where the back-and-forth itself is the point, like working through a strategy problem with Claude as a sounding board, or iterating on a deliverable where each draft builds on the previous one's feedback.

For those, a long conversation is correct. The accumulated context is the work product. The signal is whether the earlier conversation directly informs the next move. If yes, keep going. If the next move is a different task that just happens to be on the same engagement, new conversation.

The picking question echoes the Claude Projects vs. skills vs. prompts decision: persistent state in one place, independent tasks in another. A six-week engagement is persistent. The weekly status update inside that engagement is independent. Mixing those is what creates the context-window failure modes above.

Five small mistakes that compound

Beyond the three failure modes, five smaller habits add up to the same outcome.

Pasting transcripts in full when you only need the relevant excerpt. A 45-minute meeting transcript is 6,000+ tokens. If you only need the part where the CEO mentioned the Q3 target, paste those five sentences. The rest is noise the model has to reason around.

Asking Claude to summarize the conversation so far. Sometimes useful. Most of the time, it just creates another summary that now also lives in the conversation, doubling the relevant tokens.

Leaving the system prompt or skill instructions unchecked. If the skill activated at the start of the conversation has 2,000 tokens of instructions and the conversation is now about a different task, those instructions are still loaded. Start a new conversation; the skill will only re-activate if its triggers match the new task.

Treating Claude like it has perfect recall of everything you said. It doesn't. If something earlier in the conversation is load-bearing for the current task, paste it again — but only that, not the whole earlier exchange.

Working in giant conversations because "tokens are cheap." Cheap on the API side. Not cheap on the output quality side. Quality degrades long before the absolute cap, and you only notice in retrospect.

Where this connects to the rest of the practice

The discipline above is one piece of the broader operating model — what "AI-native" actually means in practice: AI is part of the work, but the work itself is yours. The Solo Consultant's AI Playbook covers the broader frame — when to reach for AI, when to keep it out, how to structure the work so AI is leverage instead of friction. The context-window discipline is the version of that principle applied to one specific tool.

The practical move this week: pick the longest-running Claude conversation you have. Look at how many distinct tasks it covers. If the answer is more than three, that conversation is doing too many jobs. Start fresh conversations for the next round of tasks. Move accumulated engagement context into a clients/<name>/ folder or a Claude Project. Watch the next week's outputs sharpen.

The conversation is the working surface. The engagement lives somewhere else.

Back to blog

Country/region