There is a term gaining traction in the software industry: vibe coding. The idea is simple. Describe what you want to an AI tool, watch it generate an entire application, ship it. One of our own team built a feature-rich application “without writing any code” using these tools. It was impressive. It was also, on closer inspection, not production-ready.
This post explains why AI-generated code has real, measurable problems — and how we built a system that captures the speed of AI while enforcing the discipline that keeps software safe, maintainable, and fit for production.
The Vibe Coding Explosion
Tools like Cursor, Copilot, Windsurf, and Lovable.dev have made it possible for anyone to generate working software from a text prompt. Entire SaaS applications materialise in an afternoon. Prototypes that once took weeks appear in hours.
The appeal is obvious. Clients see it too. If AI can build an app in a day, why does a proper project take months?
The answer is the same reason a building that goes up in a week without an architect tends to fall down in a year. Speed without structure produces something that looks right but fails under pressure.
What AI-Generated Code Actually Produces
We have used every major AI coding tool extensively. Cursor, Copilot, Windsurf, Claude Code, Lovable.dev. We are not speculating about their output. We have reviewed thousands of lines of AI-generated code across client projects and internal experiments.
Here is what we consistently find when AI writes code without engineering process around it:
Security vulnerabilities are the default, not the exception. AI models optimise for “working code,” not “secure code.” They generate SQL queries without parameterisation, authentication flows with token handling flaws, and API endpoints with no rate limiting. The code runs. It also exposes your users’ data.
Tests are an afterthought or entirely absent. Ask an AI to build a feature and it builds the feature. It does not build the test suite that proves the feature works correctly, handles edge cases, and does not break existing functionality. When tests do appear, they often test the happy path only — the software equivalent of checking that a bridge holds weight by walking across it once on a sunny day.
Architecture degrades with every prompt. The first feature looks clean. By the tenth, you have duplicated logic, circular dependencies, inconsistent patterns, and a codebase that fights you on every change. AI does not maintain a mental model of your system architecture. Each prompt is a fresh start bolted onto accumulated technical debt.
Dependency choices are plausible but not vetted. AI reaches for popular packages regardless of whether they are maintained, licensed appropriately, or necessary at all. We have seen AI-generated projects pull in 400+ npm dependencies for functionality that needed 30.
None of these problems are theoretical. They are what we find during security assessments with Sotaria, our penetration testing system. Applications built with “vibe coding” practices consistently produce more critical findings than traditionally developed software.
The Problem Is Not AI. The Problem Is the Absence of Process.
Here is the contrarian position: we use AI to write code on every project. Aggressively. It is a genuine force multiplier. Our team ships faster because of it.
But we do not hand AI a prompt and ship whatever comes back. That would be like hiring a fast typist and calling them a novelist.
AI is exceptional at generating code quickly. It is not capable of deciding what code should be written, evaluating whether that code is secure, verifying it integrates correctly with the existing system, or judging whether it solves the actual business problem. Those are engineering decisions. They require process.
The firms producing AI slop — and the freelancers shipping Cursor output directly to production — are not failing because AI is bad. They are failing because they removed the engineering process and kept only the typing.
How We Solved It: Five Phases, No Shortcuts
We built Janus, a development orchestration system with a team of specialist AI agents covering architecture, security, QA, development, and operations. It enforces a five-phase workflow on every piece of work, with no mechanism to skip steps.
Phase 1: Setup. Before a single line of code is written, Janus runs architecture planning. What is the approach? How does it fit the existing system? What are the security implications? This is where most vibe coding projects fail — they skip straight to implementation and pay for it later.
Phase 2: Implementation. AI writes the code, but within the architectural constraints established in Phase 1. Not a blank canvas prompt. A directed implementation against a reviewed plan.
Phase 3: Review. This is where we diverge most sharply from the industry. Janus coordinates five specialist AI agents in parallel: a software architect checks the design and patterns, a security engineer analyses for vulnerabilities, a QA specialist verifies test coverage and edge cases, a UX reviewer checks the interface, and an SEO engineer validates search optimisation. Five expert perspectives simultaneously, covering ground that would take a traditional team a week. Then — and this is critical — a human engineer reviews on top of that. The AI review surfaces issues. The human makes the judgement call.
Phase 4: QA Verification. Automated testing runs with Playwright-based verification, producing video evidence of functionality. Not a developer clicking through the app and saying “looks good.” Recorded proof that the software does what it is supposed to do.
Phase 5: Merge. Only after architecture review, five-agent AI review, human code review, and QA verification does code reach the main branch. There is no override. There is no “we’ll fix it later.” The process is the product.
We run this process on every engagement. Talk to us about your project →
The Result: AI Speed With Engineering Quality
The outcome is measurable. We ship faster than a traditional team because AI accelerates the writing. We ship with higher quality because the review process catches what AI misses — which is a lot.
Our security assessments on our own code consistently return fewer findings than assessments on codebases built without this process. Our clients do not get emergency calls about vulnerabilities discovered in production. The architecture stays clean over months and years because every change is reviewed against the whole system, not just the immediate prompt.
This is not about being anti-AI. We built four AI systems. We are possibly the most AI-dependent consultancy in Australia. But we are pro-discipline, and we believe the firms that will win in the long run are the ones that use AI as a tool within a rigorous process, not as a replacement for having one.
Book a free 30-minute call → — we will show you how our process applies to your project.
Frequently Asked Questions
Does Two Red Kites use AI to write code? Yes, extensively — but every piece of AI-generated code passes through architecture review, security analysis, human code review, and QA before it ships.
What’s wrong with using AI coding tools like Cursor or Copilot? Nothing inherently. The risk is using them without engineering process — no architecture planning, no security review, no human oversight.
What is Janus? Janus is our development orchestration system that enforces a five-phase workflow with mandatory review gates and human sign-off before code reaches production.