Advanced AI Code Automation

Michael Aguilar

September 01, 2025

Page content

So, you’ve got a great process down for developing your code. You’ve got great documentation, 100% test coverage, good git branching and tagging, and all the other detailed steps for generating code.

It sure is a lot of typing, though, isn’t it? Very repititious. Obviously, you should automate those steps! You’re only a little bit away from typing in “Write me an app that does X,Y, and Z!” and then taking a nap while it does the work!

Not so fast…

Automating an AI-driven process sounds great on paper. You feed commands to the AI, it spits out code, another script runs the tests, and round and round it goes. It’s the dream of frictionless development.

The problem is, when automation goes wrong, it can go wrong in a big, expensive, and baffling way.

When it goes wrong, it goes really wrong

An automated loop that includes an AI is like giving a junior developer who has never seen your project the keys to the car, and then telling them to just keep driving. What could possibly go wrong?

Files will be created all over the place

One of the first things you’ll notice is a complete mess in your project structure. The AI doesn’t have a “map” of your project in its head. It only works with the context you give it. If you tell it to “create a new service for processing widgets,” it might create src/services/widget_processing_service.py one time, and src/widget_processor.py the next.

It will then build upon the mess it just made. Your automation script, unaware of the chaos, will feed this new, misplaced file back into the context for the next step. The AI sees the new file and assumes that’s where things are supposed to go now.

This is where the term “AI slop” comes from. It’s just crap piled on top of crap, all at machine speed. You can end up with a tangled, un-maintainable directory structure in minutes.

Tests will fail, over and over again

With all these new, misplaced files and hastily written code, your test suite will start to light up like a Christmas tree. Your automation script, trying to be helpful, might try to fix them.

“Fix the failing test,” it commands the AI. The AI, again with limited context, might “fix” it by changing an assertion from True to False, or by commenting out the entire test. Congratulations, the test passes now.

Debugging real failures becomes a nightmare. An automated loop can get stuck on the same subtle error, trying slightly different - but equally wrong - solutions over and over. Worse, you can get into a “whack-a-mole” situation. A fix for one test breaks another. The automated fix for that test then breaks the first one it fixed.

Uh, Does the app even work?

Let’s say you manage to get past the file mess and the looping test failures. You have an app that the automation says is “done.” Does it actually do what you wanted?

Maybe. But what about the details you didn’t think to specify?

Your design may have left out critical parts of a real-world application. Things like:

What happens the very first time it runs, when there’s no data or config file?
How does it handle a sudden crash? Does it clean up after itself?
Is there any logging or observation instrumentation to see what it’s doing?

The automation built exactly what you asked for, but what you asked for wasn’t complete. It may do what you designed, but the design may not be what you were thinking. This is why user validation - with you as the first user - is so important. You have to run the thing and see if it’s even close to what you intended.

Not only is time wasted, but so is money

This whole automated misadventure isn’t just wasting your time; it’s costing you money.

AI APIs aren’t free. Every time your script gets caught in a loop trying to fix a test, it’s making another API call. It’s entirely possible to burn through your budget for the month in a single afternoon while your automated script dutifully tries to solve a problem it doesn’t have the context to understand.

Mitigation

This doesn’t mean you shouldn’t automate. It just means you can’t go from zero to a fully autonomous, napping-while-it-works system in one step. You need to be smart about it.

Automate incrementally. Don’t try to automate the entire development lifecycle at once. Start with one small, repetitive task. Maybe it’s just generating a boilerplate file based on a spec. Get that working reliably, then move to the next small piece.
Build breaks into the workflow. Your automation should not be a continuous loop. It should run a step, then stop. It should present the results to you for verification. Did it create the files in the right place? Does the code look sane? Once you approve, you can trigger the next step.
Constantly supervise. You have to shift your role from typist to supervisor. You are the quality gate. Your job is to review the AI’s output at every step, catching the small mistakes before they get baked into the foundation and amplified by the next automated step.