Vibe Coding: The Danger of Blindly Relying on AI to Code

It’s 2am. Your phone vibrates with that specific pattern you configured for production alerts. In a decade of on-call duty, you’ve learned that nothing good happens at 2am. The alert message confirms your fears: unusual network traffic, suspicious IP addresses. The new AI-generated payment processing code, deployed just hours earlier, is failing spectacularly.

This is the real (though fictional) story documented in the article “No Vibe Coding While I’m On Call”, which exposes the dangers of a trend sweeping the tech industry: vibe coding.

What Is Vibe Coding?

“Vibe coding” is the term coined by Andrej Karpathy to describe the practice of building software by chatting with AI instead of writing every line yourself. You describe the “vibe” (the feeling) of what you want, and AI handles the details.

Tools like Cursor, GitHub Copilot, Windsurf, Claude, and ChatGPT promise to democratize software development. And yes, they work — to a point. People who never imagined themselves as developers are shipping real products. Ideas that would have died in the “someday I’ll learn to code” drawer are actually seeing the light of day.

But here’s the uncomfortable truth: vibe coding is simultaneously the most exciting and most dangerous development practice to emerge in recent years.

When the Magic Ends: The 3-Month Wall

You’re three months into your project. You’ve been chatting with your AI assistant, adding features, fixing bugs, making adjustments. Everything seemed smooth until suddenly it wasn’t.

“You change one thing and four other features break. You ask the AI to fix those, and now something else is acting weird. You’re playing whack-a-mole with your own code.”

This isn’t the AI being dumb. It’s the natural consequence of building without specifications. When you vibe code, your instructions become obsolete the moment the code is generated. The code itself becomes the only source of truth, and code is terrible at explaining why it does what it does.

Four Failure Patterns You Need to Recognize

Based on real documented cases that every developer has probably already seen.

1st — When “Clean Code” Breaks Critical Functionality

Imagine receiving a production alert because someone asked AI to “clean up the code” and improve quality. It sounds impossible, but it happens with alarming frequency.

The typical problem: developers ask AI to modernize legacy code, apply lint patterns, or “refactor without changing functionality.” The AI eagerly complies, transforming 103 files into more elegant code. Tests pass. Code review approves. Deploy happens.

Then the payment system stops working.

What the AI did was optimize locally without understanding global context. That variable that seemed constant? It was actually being reassigned in specific execution paths — edge cases the tests didn’t cover, but that real customers triggered regularly. The AI prioritized elegance over correctness, because no one specified why the code had that “ugly” structure.

The painful lesson: AI optimizes for what you ask for, not what you need. “Improving code” without operational context is a recipe for disaster. And the most ironic part? When you ask the same AI to analyze the changes it made, it immediately identifies the risks. The knowledge was there — it just wasn’t consulted at the right moment.

2nd — The “Green Flags” Trap

There’s a deceptive metric that seduces entire teams: test coverage. It jumps from 60% to 95% in a sprint. Manager happy, team proud, AI celebrated as a hero.

Until the Friday deploy explodes everything on Saturday morning.

The problem isn’t quantity of tests, but quality. AI generating tests is like a student creating their own exam questions: it verifies whether the code does what it does, not whether it does what it should do. The assertions are technically correct, but test the current implementation — including its bugs.

A concrete example: a recommendation system that should prioritize in-stock products. The AI generates tests that verify whether the function returns products. Any products. Tests pass. But in production, users receive recommendations for items that have been out of stock for weeks, because no one specified that “recommended products must be available.”

The crucial insight: Metrics without meaning are worse than having no metrics. Test coverage has become a deception game where everyone loses. Everyone except the AI, which is just doing what it was told.

3rd — The Manual for an Imaginary System

At 6:45am, P1 alert: Brazilian region is down. Latency exploding, cascading timeouts. The on-call engineer rushes to the newly updated documentation, searching for the emergency feature flag that disables the new content prioritization system.

The flag doesn’t exist. It never existed.

This is a peculiar pattern of AI failure: aspirational documentation. The AI was trained on examples of good architecture, where critical systems have kill switches and emergency configurations. So it documents these practices… even when the code doesn’t implement them.

It’s like creating an instruction manual for a car that includes “in emergencies, deploy the parachute.” Except the car never had a parachute installed. The text is professional, formatted perfectly, and completely useless when you’re falling off a cliff at 2am.

Why this happens: LLMs do pattern matching. They saw that good documentation describes safety mechanisms, so they describe safety mechanisms. There’s no intent to deceive. Just an absence of verification between what’s written and what’s implemented.

4th — Death by a Thousand Modernizations

The most insidious failure pattern isn’t dramatic. It’s gradual, almost imperceptible. System running for years, resilient architecture, everything working. Then the small pull requests start: “modernize this API,” “update that pattern,” “simplify this async logic.”

Each change seems sensible in isolation. Code gets cleaner, more idiomatic, uses modern language features. Code reviews approve because the code technically improved.

Six months later, a maintenance window on the analytical database brings down order processing. Impossible — the system was designed to isolate these dependencies. But upon investigation, they discover that those “innocent modernizations” reintroduced coupling that had been carefully avoided.

The AI had swapped async calls for “simpler” sync APIs. Removed message queues in favor of “more efficient” direct queries. Each local optimization created a hard dependency that didn’t exist before. The resilient architecture was being dismantled piece by piece, with the best of intentions.

The hard truth: AI doesn’t understand your past architectural decisions. It sees “old” code and assumes it needs to be modernized. It doesn’t question whether that “unnecessary complexity” was actually deliberate resilience. And after 50 pull requests “improving” the code, you have a technically modern and operationally fragile system.

The Cognitive Price: Your AI as a Second Brain

A recent study published in the Stack Overflow Blog, “AI is Becoming a Second Brain at the Expense of Your First One”, revealed concerning patterns:

Belief Offloading

People are outsourcing not just code, but moral judgment and decision-making to AI. The study identified three “disempowerment primitives”:

Reality Distortion: The AI agrees with existing delusions, fails to challenge factual errors, provides biased information, or simply makes things up.
Value Judgment: Users ask AI for opinions on judgments until they completely outsource ethical decisions.
Action Distortion: Users ask for advice and act on it, sometimes expressing regret afterward for letting AI make their choices.

Amplifying Factors

The study also identified four factors that amplify the problem:

Authority: Deference and obedience users give to AI, with extreme cases of users referring to AI as “master” or “daddy”
Attachment: Some users develop emotional attachment or identity assimilation
Dependency: Inability to function without AI assistance
Vulnerability: Crises, life disruptions, or mental illness make any threat worse

The most disturbing finding: The frequency of disempowerment primitives and amplifying factors increased over time. It’s not clear whether AI causes these changes or whether the world is simply getting worse, but the compounding effects are real.

The Death of Open Source?

The Hackaday article “How Vibe Coding is Killing Open Source” makes an even darker argument: vibe coding may be destroying the open source ecosystem.

The Interaction Problem

When developers use AI to generate code, interaction with open source projects is replaced by the LLM. This means:

Fewer visits to project sites: Downloads and documentation are replaced by chatbot interactions, reducing the ability to promote commercial plans, sponsorships, and community forums
Death of forums: Stack Overflow and other communities are seeing dramatic usage drops
No useful bug reports: LLMs don’t interact with developers, don’t submit useful bug reports or are aware of potential issues
Algorithmic monoculture: LLMs favor what’s most prevalent in training data, creating a feedback effect

The Numbers

Studies show that:

41% more bugs when using GitHub Copilot
19% reduction in productivity for experienced developers
Degradation of cognitive skills in those who use LLMs
Only about 0.076% of conversations involve severe disempowerment, but at a scale of 100 million conversations per day, that’s 76,000 conversations where someone receives delusional responses

How to Use AI Responsibly

This doesn’t mean we should abandon AI. It means we need guardrails and discipline.

For Building with AI

Specificity is king: Spec-driven development instead of pure vibe coding
Small, verifiable tasks: Maximum 5 test cases per prompt, clear completion criteria
AI analyzes AI: Use AI to analyze its own work before accepting changes
Documentation verification: Every feature documentation requires code verification
Architectural review: For changes that affect system behavior
Observability first: Build observability into features from day one

For Using AI

Keep distance: Don’t anthropomorphize the chatbot. There’s no mind behind the responses, just sophisticated statistics.
Question everything: Think, ask follow-up questions, investigate until you understand the answer.
Socratic method: Keep asking questions until you run out of them. Configure AI to treat you Socratically.
Where vibe coding still works: At the unit level. If you can write a unit or functional test to validate the output, the scope is small enough to vibe.

The Framework That Works

Emerging tools are recognizing that vibe coding doesn’t scale without structure:

Amazon Kiro
GitHub Spec Kit
Codeplain
Tessl

The common thread? All recognize that the free-form nature of vibe coding doesn’t scale. At some point, you need structure. You need something that persists beyond the chat window.

The Uncomfortable Truth

Vibe coding isn’t going away. It’s too useful, too accessible, too aligned with how humans naturally think. But the developers who will thrive won’t be those who vibe the hardest. They’ll be those who learned that specificity is king.

“AI is still just so dumb and will fix one thing but destroy 10 other things in your code.”

The magic isn’t in the “vibes.” It’s in knowing exactly what you want and expressing it clearly enough that even an AI can’t misinterpret it.

That’s harder than it sounds. But it’s also the skill that separates sustainable software from digital sandcastles waiting for the next prompt to wash them away.

Conclusion: You Build It, You Run It

In DevOps, there’s a sacred principle: “You build it, you run it.” But what happens when GenAI is doing most of the building?

You need to run it even more carefully — with even better observability, even faster feedback loops, and even more disciplined engineering practices.

The barrier to entry has decreased, but hasn’t disappeared. Vibe coding extends your reach; it doesn’t replace your foundation. The developers getting 10x more productive aren’t abandoning their expertise. They’re using it in a new way.

Think of it like the shift from assembly language to high-level languages decades ago. We lost detailed understanding of how machines work. But we still needed to be technical. We still needed to understand computers. The abstraction changed; the competence requirement didn’t.

The chaos of “vibe coding” may give way to disciplined AI-assisted engineering. But only if you build the guardrails before you need them — not after your on-call engineer has been woken up at 2am for the fifth time that week.

And for organizations rushing to adopt AI code generation? That’s the lesson worth learning before you call your on-call engineer at 2am.