Slop is a standards problem

AI slop is real. The diagnosis is wrong. Slop is what AI does when no one sets the standard — and the same technology can elevate the bar instead, if you choose to use it that way.

Me being hit with slop in a sewer-type setting.

Holy shit, it’s everywhere. My LinkedIn feed is full of it. Most of my conversations at work end up here. Even chatting to people at social events ends up here. Bloody AI.

I see huge potential in it. It reminds me of the web first gaining momentum, when we were working it all out — experimenting, getting personal sites up, making sites for friends just to get the practice. Back then there wasn’t this overwhelming noise about how it was going to change the world.

I’ve been having a lot of conversations recently about how the SDLC is going to change, and what that looks like. I find it exciting — I enjoy the ambiguous areas where the processes don’t exist yet and you have to work it out as you go.

In those conversations there’s also an undeniable worry: a tidal wave of slop. Unmaintainable code. Crushing tech debt. I feel that concern myself. If anyone can write a prompt and let Claude run for a few minutes to generate a website, I worry about the expectations of just taking that code and popping it out there.

I’ll confess: I’ve been responsible for some of that slop myself. A few Medium posts about small apps and working with AI in particular ways. Looking back, some of it has a sloppy smell. At the time I thought, yeah, this is great, I’m moving so fast.

But what if the technology we worry will firehose slop could also be the technology that elevates us? Lets us deliver to a higher standard than we could before?

Maybe slop always existed. Maybe we just cut corners and shipped something suboptimal. We dropped the security requirement. We didn’t plan for accessibility up front. We wrote code without tests first. When I was an individual contributor, I’d throw important things out under delivery pressure all the time.

AI didn’t introduce slop. AI just accelerates whatever path you’re already on.

The signal I now hear, through all the noise, is this: AI shouldn’t only amplify what I do. It should elevate the standard of what I can do.

Setting the standard

AI on its own doesn’t drive for the standard — it drives for done. You ask for a thing and it produces a thing that resembles the answer. Whether the path is rigorous is a separate question, and not one it’s reaching for by default. So it takes the shortest credible path: code that compiles, tests that pass, CI that goes green. Not the same as what I’d ship after a careful review.

If I want AI to elevate me, I have to give it a different target. And vibes don’t survive a long task — the target has to be written down, specific, in a doc someone can point at. I’ve found it comes in four shapes:

Hard rules. “No third-party scripts on the page.” “WCAG 2.2 AA on every page.” Yes-or-no things, where if the answer isn’t yes, it’s a no.

Measurable thresholds. Lighthouse 95+ across Performance, Accessibility, Best Practices, and SEO. Page weight under 150 KB. Numbers, not adjectives.

Rubrics for the subjective stuff. Comments explain why, not what. Don’t extract an abstraction until there are three call sites. Still rules — just written as a checklist someone can apply.

A definition of done. A short list of what has to be true before I call something finished.

Once those exist, the AI has something to reach for that isn’t the median of its training data. The standard pushes back on me, too — it stops me cutting the same corners I used to under delivery pressure.

Two validation layers, and what each does

I’ve learnt that you can’t enforce all of this with AI review. AI is the wrong tool for half the job.

Layer 1 is deterministic. Anything a machine can check, let a machine check. Format, lint, typecheck, test, build, audit dependencies, run Lighthouse, scan the built output for third-party origins. Facts, not opinions. They block the merge.

Layer 2 is advisory AI review. Anything that needs human-language judgement gets handed to AI with a written rubric. Does this abstraction cost more than it saves? Is this function doing two jobs? Would a future reader figure out why this fallback exists? AI is good at this with a rubric to cite, and pretty bad at it without one. The reviewer flags. I still decide.

Machine checks should never be optional. AI review should never be blocking.

What I’m trying right now

This is a hypothesis still in flight — I’m a few weeks into trying it on my own personal site. Something I’ve reworked many times over the decades, often cut corners on, often not been 100% happy with the result. It tends to be something I only get round to when I’m between jobs. (Checks when the last redesign was… yep, 2020, when I was made redundant during COVID.) The rebuild I’m working on now is the first time I’ve tried it with a real standards apparatus underneath.

The plumbing:

  • Standards docs at the repo root, each owning one dimension.
  • One npm run check that runs format, lint, typecheck, test, dependency audit, build, and a custom gate that fails if any third-party origin appears in the shipped output.
  • The same chain in CI, plus a Lighthouse run that fails if any of the four scores drops below 95.
  • A rubric describing what a reviewer should look for, written so any LLM or human can apply it.

Most of the work has been writing the docs. So far, before I write a single post: every page passes WCAG 2.2 AA, scores 95+ on every Lighthouse category, ships zero third-party origins. Higher bar than I’ve ever set my personal site at — and I haven’t done anything heroic to clear it.

Whether this scales or holds up over time is the bigger question. I’m a few weeks in. Early signs are good, but ask me in six months.

A few honest caveats

Standards-first feels slow. Writing the rules before any code feels backwards. It isn’t, but you don’t get the dopamine of “running a thing” until later.

Subjective dimensions still need humans. Taste, judgement, when to abstract — the apparatus doesn’t solve those. It makes them tractable. The reviewer reduces “is this any good?” to specific, citable findings. I still decide.

AI defaults to the median if you let it. Last week I asked Claude to write a build script and it produced plain JavaScript — because most build scripts in the wild are JavaScript — even though my own standards mandate TypeScript strict everywhere. Lint passed, no flag. AI reaches for the median unless you make it actively audit against the standard.

What changes

The fear of AI slop is right about the slop. It’s wrong about the cause. Slop is what you get when the standard isn’t stated. Done wins by default, and done isn’t the same as what you’d ship if your full standards were in play.

The fix isn’t less AI. The fix is configuration. You write the standard once and reuse it.

Set the standard. The AI will help you reach it.