The Ghost in the Machine is a Messy Roommate

Let’s be real for a second: the internet is a graveyard. We’re all just walking through a digital ruins populated by bots screaming at other bots...

The Ghost in the Machine is a Messy Roommate
Photo by Egor Komarov on Unsplash

The Ghost in the Machine is a Messy Roommate

Let’s be real for a second: the internet is a graveyard. We’re all just walking through a digital ruins populated by bots screaming at other bots, mostly about "top 10 AI-generated content strategies for 2024" or some other flavor of "leverage" (God, I hate that word). But the real tragedy isn't just the public web dying—it’s the silent rot happening in our local repos.

If you’re like me, you’ve basically outsourced your frontal lobe to AI agents like Claude Code or GPT-4. They’re insanely good at writing code, sure. But they’re also like having a brilliant intern who happens to be a high-functioning alcoholic. They leave __pycache__ files in your living room, forget where they put the manifest, and if you aren’t looking, they’ll silently introduce enough technical debt to sink a mid-sized startup.

So yeah, I got tired of it. I realized that if I’m going to let an AI write my code, I need to treat it like a liability. I built a governance system—a digital babysitter—to keep my assistants from turning my projects into a spaghetti-code dumpster fire.

The Morning Breath Test (Session Audits)

Every time I fire up a Claude session, I don't just start typing. That’s how you get 14 folders named test-backup-old-v2. Instead, I have a Python script called audit.py that runs automatically.

It’s triggered by a hook in ~/.claude/settings.json. Before Claude even says "How can I help you?", the script is already crawling through my ~/projects folder looking for trouble. It checks for a .agent/manifest.yaml in every directory. If it’s missing? Violation. If there are weird "forbidden patterns" like tmp-* or *-copy files that the agent forgot to delete? Violation.

Honestly, the best part is the auto-cleanup. It just nukes .DS_Store, __pycache__, and .pytest_cache without asking. It’s like a Roomba for your file system.

# ~/.agent/rules.yaml
projects:
  required_files:
    - .agent/manifest.yaml
  forbidden_patterns:
    - "*-old"
    - "test-*"
cleanup:
  auto_delete:
    - "__pycache__"
    - ".ruff_cache"

When I start a session, I get a nice little report. It’s a reality check. "Hey, you have 1 exited Docker container and 46 pending issues in the trading bot project." It sets the tone. It says, I'm watching you, Claude.

Don’t Let the AI Grade Its Own Homework

Here is where it gets meta. I don’t use Claude to review Claude’s work. That’s like asking a hallucinating person if they think they’re hallucinating. They’ll just agree with themselves until you’re both staring at a blank screen.

Instead, I use Gemini.

Why? Because Gemini (specifically the 1.5 Flash tier) has a massive context window and, more importantly, it’s a different "brain." It has different biases. While I’m sleeping, a cron job runs at 2:00 AM and triggers a nightly code review. It finds every file changed in the last 24 hours and feeds it to Gemini with a very specific prompt: Find the bugs. Find the security holes. Ignore the docstrings—I don't care about your polite suggestions, just find what's broken.

The output is a structured JSON file. It’s cold. It’s clinical.

{
  "line": 45,
  "severity": "error",
  "category": "bug",
  "message": "Potential null pointer here because you didn't check the API response properly",
  "auto_fixable": true
}

And I have this 7-day cooldown thing too. If Gemini suggests a fix and Claude applies it, the system won't re-review that file for a week. Otherwise, you get this weird "AI ping-pong" where they just keep editing the same three lines of code back and forth forever. It's wild to watch, but it's a total waste of tokens.

The "Fix Everything" Button

The coolest part is the hand-off. When I wake up and see that Gemini found 38 errors in my theta-grind project, I don’t manually fix them. I’m far too lazy for that.

I’ve taught Claude a "skill"—basically a markdown file in its command directory called fix-reviews.md. I just type fix code reviews and Claude spawns sub-agents. They read the JSON from the nightly scan, look at the code context, apply the fix using the Edit tool, and commit the change.

I’ve even automated the nightly fixes now. I use the Claude CLI with a --print flag and a very aggressive instruction: "DO NOT ASK FOR PERMISSION. FIX THESE ISSUES NOW."

But here's the thing... some stuff shouldn't be fixed by a bot. If Gemini says, "Your entire architecture is a mess, you should split this class into five pieces," that’s not a quick fix. My system catches those and dumps them into a skipped-issues.yaml file in my Obsidian vault.

It’s my "to-do" list for when I actually have to be a human developer again.

The Obsidian Vault: My Agent’s Memory

Everything gets logged to an Obsidian vault. Audit trails, code reviews, fixes, skipped issues... all of it.

I use a "second-brain" MCP server so Claude can actually search my vault. If I’m working on a security patch, I can ask, "Hey, did we have any similar security issues in the investment-pool project last month?" and it’ll actually find the nightly review from three weeks ago.

It’s basically giving my AI a long-term memory that isn't just a bloated context window.

Is This the Future or Just a Faster Way to Fail?

I’ve been running this for a while now, and it’s... interesting?

On one hand, my projects have never been cleaner. No more dangling Docker volumes, no more unused imports. The code is technically "better."

But on the other hand, I’m basically managing a small army of digital ghosts. I spend more time tuning my governance scripts than I do writing actual logic. We’re building these insane layers of abstraction—AI writing code, AI reviewing code, AI fixing code—and honestly, I’m not sure anyone actually knows how the underlying system works anymore.

Is the internet dead? Probably. But if we’re going to live in the ruins, we might as well have a script that rydder op (cleans up) the cache files.

What happens when the governance scripts start having bugs? Do I need a third AI to watch the script that watches the AI? It’s turtles all the way down, man.

Anyway, I’m going to go see if Claude accidentally nuked my production database while I was writing this. It’s 50/50.

So yeah... what are you doing to keep your agents in check? Or are you just letting them run wild in your /src folder like a pack of digital wolves? Honestly, I'd love to know if I'm the only one this paranoid.