Looping with AI
From Prompts to Loops: Build Your First Self-Correcting AI in 40 Lines
Most people are still playing the game on the easy setting. They write one prompt, hit enter, and accept whatever comes back. If it's wrong, they sigh, tweak a few words, and try again—by hand, one shot at a time.
That's prompting. It works until it doesn't.
The next level is looping: instead of asking once and hoping, you let the model act, check its own work, and try again—automatically—until the result actually meets your bar. The difference isn't subtle. A single prompt is a coin flip. A loop is a machine that keeps flipping until it lands on heads.
This guide gives you the mental model, the one rule that keeps loops from running away, and a complete first project you can run in about ten minutes. No framework. No agent library. Just Python, the Claude API, and a while.
What a loop actually is
Strip away the buzzwords and every loop is the same four-beat rhythm:
Act → Check → Decide → Repeat.
The model acts (writes something, picks something, computes something). Then you check the result against a goal. Then you decide: is it good enough, or do you go again? If "go again," you feed the feedback back in and repeat.
That's it. That loop is the seed of every "agent," every research pipeline, every self-healing workflow you've read about. The flashy systems are just this pattern with more steps, more tools, and better stopping logic bolted on. If you understand the four beats, you understand the whole genre.
The thing that makes it powerful: the model gets to see its own output and respond to what's actually wrong with it. A single prompt is the model writing blind. A loop is the model writing, looking at the result, and correcting it, which is exactly how you would do the task if you cared about getting it right.
Why this is a different sport, not just better prompting
You can spend a week engineering the perfect prompt to get a model to write "a tweet that is exactly under 280 characters and keeps the core point." And it will still blow the limit maybe one time in four, because length is a slippery thing for a language model to feel its way to in a single pass.
No amount of prompt crafting fully fixes that. It's a blind-shot problem.
A loop fixes it in three lines, because the loop can measure the result and react. Too long by 40 characters? Tell the model exactly that and ask again. The model isn't guessing anymore — it's correcting against a real number.
This is the whole insight: prompting optimizes the question; looping optimizes the outcome. Once you internalize that, you stop trying to write one god-tier prompt and start designing a small system that converges on what you want.
The one rule: always have a stop condition
Here's the only way a loop bites you: it never stops. The model keeps "improving," the API keeps charging, and you've built a perpetual motion machine that runs on your credit card.
So the rule, every single time, no exceptions:
Every loop has a max_iterations ceiling — even when it also has a "good enough" exit.
Two ways out, and you wire up both:
Success exit—the result passed the check. Stop, you won.
Safety exit — you hit the max number of tries. Stop anyway and return the best you've got.
If you only build the success exit, you're one weird edge case away from an infinite loop. Build both. Always.
The project: "The 280-Club"
We're going to build a loop that takes any rough idea and turns it into a tweet that fits in 280 characters while staying punchy and keeping the point. It's small, it's genuinely useful, and—crucially—the check is objective. We can measure character count in plain Python, so the loop is truly self-correcting with zero hand-waving. You'll literally watch it converge.
This is the perfect first loop because it shows the pattern at its cleanest:
Act: Claude writes a draft.
Check: Python counts the characters. (No ambiguity. No second opinion needed.)
Decide: Under the limit? Done. Over? Tell Claude by how much and loop.
Repeat: ...until it fits or we hit our ceiling.
Setup
You need Python and the Anthropic SDK:
bash
pip install anthropicThen set your API key as an environment variable (grab one from the Claude Console):
bash
export ANTHROPIC_API_KEY="sk-ant-..."The code
Python:
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from your environment
MODEL = "claude-sonnet-4-6" # great balance of quality and cost for a loop
LIMIT = 280 # our hard constraint
MAX_TRIES = 5 # the safety exit — NEVER skip this
def ask(prompt: str) -> str:
"""One call to Claude. Returns the text of the reply."""
resp = client.messages.create(
model=MODEL,
max_tokens=400,
messages=[{"role": "user", "content": prompt}],
)
return resp.content[0].text.strip()
# The raw idea we want to compress into a tweet.
idea = """
I spent the weekend learning that the real skill in AI isn't writing
clever prompts — it's building small loops where the model checks its
own work and fixes it automatically. It changed how I think about
everything I build.
"""
# --- ACT: first draft ---
draft = ask(f"Turn this idea into one punchy tweet. Return only the tweet.\n\n{idea}")
# --- THE LOOP: check, decide, repeat ---
for attempt in range(1, MAX_TRIES + 1):
length = len(draft)
print(f"Attempt {attempt} — {length} chars:\n{draft}\n")
# --- CHECK + DECIDE: did it pass? ---
if length <= LIMIT:
print(f"[ok] Fits in {length} chars after {attempt} attempt(s).")
break
# --- ACT again, with specific feedback ---
over = length - LIMIT
draft = ask(
f"This tweet is {over} characters too long. It's {length} "
f"characters; the limit is {LIMIT}. Rewrite it to be at or under "
f"{LIMIT} characters while keeping the punch and the core point. "
f"Return only the tweet.\n\nTweet:\n{draft}"
)
else:
# --- SAFETY EXIT: ran out of tries ---
print(f"[!] Couldn't get under {LIMIT} in {MAX_TRIES} tries. "
f"Best effort: {len(draft)} chars.") That's the whole thing. Forty lines, and most of them are comments and the idea text.
What's happening, beat by beat
The ask() helper is just "say something to Claude and get text back"—the atom every loop is built from.
The for loop is the engine. Each pass: it measures the current draft (Check), decides whether to stop (if length <= LIMIT: break), and if not, hands Claude a specific, numeric complaint—"You're 40 characters over"—and asks again (Act).
That specificity is the magic. We're not re-running the same vague prompt and praying for variance. We're giving the model the exact gap it needs to close. That's why it converges instead of flailing.
And notice the `else` attached to the `for`—a lovely bit of Python. It runs only if the loop never hits break, i.e., only if we exhausted all five tries without success. That's your safety exit, built right in.
What you'll see
Run it, and you get something like:
Attempt 1 — 291 chars:
I spent the weekend realizing the real AI skill isn't clever prompting...
Attempt 2 — 268 chars:
This weekend taught me the real AI skill isn't clever prompts...
✅ Fits in 268 chars after 2 attempt(s).The first draft overshot. The loop caught it and told Claude precisely how much to cut, and the second draft landed inside the limit. You didn't touch a thing. That's a self-correcting system—and you just built one.
Level up: add a judge
The 280-Club checks one objective thing: length. But a lot of what you care about isn't a number — it's quality. "Is this actually good?" doesn't have a len().
The move there is the same loop with a second Claude call as the judge. This is the famous generate → evaluate → refine pattern, and it's the workhorse of serious AI systems.
Python:
def score(tweet: str) -> int:
"""Ask Claude to grade the tweet 1-10. Returns just the number."""
reply = ask(
"Rate this tweet from 1 to 10 on punchiness and clarity. "
"Reply with ONLY the number, nothing else.\n\n" + tweet
)
return int(reply.split()[0]) # grab the first token, coerce to int
# Loop until it BOTH fits AND scores well — or we hit the ceiling.
for attempt in range(1, MAX_TRIES + 1):
fits = len(draft) <= LIMIT
grade = score(draft)
print(f"Attempt {attempt}: {len(draft)} chars, score {grade}/10")
if fits and grade >= 8:
print(f"[ok] Strong tweet ({grade}/10, {len(draft)} chars).")
break
draft = ask(
f"Improve this tweet. It must stay at or under {LIMIT} characters "
f"and be sharper and clearer (current score: {grade}/10). "
f"Return only the tweet.\n\nTweet:\n{draft}"
)
else:
print(f"[!] Stopped at {MAX_TRIES} tries. Final score: {score(draft)}/10.")Now you've got two checks—one objective (length) and one subjective (a model's judgment)—and the loop won't quit until both clear the bar. One model playing two roles: the writer and the editor. That separation, writer versus critic, is the same idea behind code that fixes its own failing tests, research agents that spot their own gaps, and pretty much every impressive AI workflow you've seen.
A quick honesty note: an LLM judge is useful but not infallible — it can be inconsistent or too generous. For anything high-stakes, you'd calibrate it (give it a rubric, show examples, or have it justify the score before giving it). But as a first taste of evaluate-and-refine, it's exactly right.
Three loops to build next
Once the pattern clicks, you'll see loop-shaped problems everywhere. A few great second projects:
The Fact-Checker. Generate an answer, then loop: "What claims here would a skeptic challenge?" → revise → repeat until no weak claims remain. Same act/check/decide: the check is "are there unsupported claims?"
The Constraint Solver. Write something that must satisfy several hard rules at once (a product name that's one word, available as a .com, and evokes speed). Single prompts juggle multiple constraints badly; a loop knocks them down one failed check at a time.
The Test-and-Fix Loop. The coder's classic: generate code → run the tests in Python → feed any failures back in → repeat until green. This is the literal foundation of agentic coding tools, and you now have everything you need to build a baby version of one.
Every one of these is the same four beats you just wrote. You're not learning three new things — you're applying one thing three times.
The mindset shift
Here's what changes once you've built a loop or two.
You stop asking "what's the perfect prompt?" and start asking "what's the check?" Because the check is where the real leverage lives. If you can measure whether an output is good — even crudely — you can build a loop that drives toward it relentlessly, far past what any single prompt could reach.
Prompting is asking an expert for an answer. Looping is giving that expert a goal, a way to see their own mistakes, and permission to keep going until it's right. One of those scales to systems. The other one is you, by hand, sighing and tweaking words.
Forty lines. Go build the loop.