227 points by kiwieater 13 hours ago | 304 comments on HN
| Neutral High agreement (3 models)
Mixed · v3.7· 2026-03-15 22:28:40 0
Summary Insufficient Content Neutral
The requested URL returns only CSS stylesheets and no evaluable substantive content. No text, editorial material, or structural indicators of human rights engagement are present. HRCB scoring cannot proceed due to absence of observable content material.
I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.
My thoughts on vibe coding vs production code:
- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs
- This is partly b/c it is good at things I'm not good at (e.g. front end design)
- But then I need to go in and double check performance, correctness, information flow, security etc
- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)
- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs
- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.
I think there's a lot to pick apart here but I think the core premise is full of truth. This gap is real contrary to what you might see influencers saying and I think it comes from a lot of places but the biggest one is writing code is very different than architecting a product.
I've always said, the easiest part of building software is "making something work." The hardest part is building software that can sustain many iterations of development. This requires abstracting things out appropriately which LLMs are only moderately decent at and most vibe coders are horrible at. Great software engineers can architect a system and then prompt an LLM to build out various components of the system and create a sustainable codebase. This takes time an attention in a world of vibe coders that are less and less inclined to give their vibe coded products the attention they deserve.
I’ve had a similar experience. I’ve been vibecoding a personal kanban app for myself. Claude practically one-shotted 90% of the core functionality (create boards, lanes, cards, etc.) in a single session. But after that I’ve now spent close to 30 hours planning and iterating on the remaining features and UI/UX tweaks to make the app actually work for me, and still, it doesn’t feel "ready" yet. That’s not to say it hasn’t sped up the process considerably; it would’ve taken me hours to achieve what Claude did in the first 10 minutes.
I started working on one of my apps around a year ago. There was no ai CLI back then. My first prototype was done in Gemini chat. It took a week copy and pasting text between windows. But I was obsessed.
The result worked but that's just a hacked together prototype. I showed it to a few people back then and they said I should turn it into a real app.
To turn it into a full multi user scaleable product... I'm still at it a year later. Turns out it's really hard!
I look at the comments about weekend apps. And I have some of those too, but to create a real actual valuable bug free MVP. It takes work no matter what you do.
Sure, I can build apps way faster now. I spent months learning how to use ai. I did a refactor back in may that was a disaster. The models back then were markedly worse and it rewrote my app effectively destroying it. I sat at my desk for 12 hours a day for 2 weeks trying to unpick that mess.
Since December things have definitely gotten better. I can run an agent up to 8 hours unattended, testing every little thing and produce working code quite often.
But there is still a long way to go to produce quality.
Most of the reason it's taking this long is that the agent can't solve the design and infra problems on its own. I end up going down one path, realising there is another way and backtracking. If I accepted everything the ai wanted, then finishing would be impossible.
> Late in the night most problems were fixed and I wrote a script that found everyone whose payment got stuck. I sent them money back (+ extra $1 as a ‘thank you for your patience’ note), and let them know via DMs.
(emphasis added)
Not sure if it was actually written by hand or AI was glossed over, but as soon as giving away money was on the table, the author seems to have ditched AI.
The more I evaluate Claude Code, the more it feels like the world's most inconsistent golfer. It can get within a few paces of the hole in often a single strike, and then it'll spend hours, days, weeks trying to nail the putt.
There's some 80-20:ness to all programming, but with current state of the art coding models, the distribution is the most extreme it's ever been.
The gap is definitely real. But I think most of this thread is misdiagnosing why it exists. It's not that AI cannot produce production quality code, it's that the very mental model most people have of AI is leading them to use the wrong interaction model for closing that last 20% of complexity in production code bases.
The author accidentally proved it: the moment they stopped prompting and opened Figma to actually design what they wanted, Claude nailed the implementation. The bottleneck was NEVER the code generation, it was the thinking that had to happen BEFORE ever generating that code. It sounds like most of you offload the thinking to AFTER the complexity has arisen when the real pattern is frontloading the architectural thinking BEFORE a single line of code is generated.
Most of the 100-hour gap is architecture and design work that was always going to take time. AI is never going to eliminate that work if you want production grade software. But when harnessed correctly it can make you dramatically faster at the thinking itself, you just have to actually use it as a thinking partner and not just a code monkey.
When we start selling the software, and asking people to pay for/depend upon our product, the rules change -substantially.
Whenever we take a class or see a demo, they always use carefully curated examples, to make whatever they are teaching, seem absurdly simple. That's what you are seeing, when folks demonstrate how "easy" some new tech is.
A couple of days ago, I visited a friend's office. He runs an Internet Tech company, that builds sites, does SEO, does hosting, provides miscellaneous tech services, etc.
He was going absolutely nuts with OpenClaw. He was demonstrating basically rewiring his entire company, with it. He was really excited.
On my way out, I quietly dropped by the desk of his #2; a competent, sober young lady that I respect a lot, and whispered "Make sure you back things up."
I'm having somewhat good experiences with AI but I think that's because I'm only half-adopting it: instead of the full agentic / Ralphing / the-AI-can-do-anything way, I still do work in very small increments and review each commit. I'm not as fast as others, but I can catch issues earlier. I also can see when code is becoming a mess and stop to fix things. I mean, I don't fix them manually, I point Claude at the messy code and ask it to refactor it appropriately, but I do keep an eye to make sure Claude doesn't stray off course.
Honestly, seeing all the dumb code that it produces, calling this thing "intelligent" is rather generous...
The interesting part about vibe coding is the spectrum of experiences and attitudes. I have been playing with it for 2-3hrs a day for the last 4 months now. None of my friends who are using it are using it in the same way. Some people vibe and then refactor, some spec-everything and micro-prompt the solutions. Nobody is feeling like this thing can go unsupervised.
And then there is one guy, a friend of mine, who is planning to release a "submit a bug report, we will fix it immediately" feature (so, collect error report from a user, possibly interview them, then assess if its a bug or not with a "product owner LLM", and then autonomously do it, and if it passes the tests - merge and push to prod - all under one hour. Thats for a mid cap company, for their client-facing product. F*** hell! I have a full bag of bug reports ready for when this hits prod :->
Everyone keeps saying 80/20 but that undersells what's going on. The last 20% isn't just hard. It's hard because of what happened during the first 80%.
When an agent takes a shortcut early on, the next step doesn't know it was a shortcut. It just builds on whatever it was handed. And then the step after that does the same thing. So by hour 80 you're sitting there trying to fix what looks like a UI bug and you realize the actual problem is three layers back. You're not doing the "hard 20%." You're paying interest on shortcuts you didn't even know were taken. (As I type this I'm having flashbacks to helping my kid build lego sets.)
The author figured this out by accident. He stopped prompting and opened Figma to design what he actually wanted. That's the move. He broke the chain before the next stage could build on it. The 100 hours is what it costs when you don't do that.
My non-technical client has totally vibe coded a SaaS prototype with lots of features, way bigger product than OP and it sort of works. They spent like 200 hours on it. I wonder what would have been the time needed to clean it up and approve it is secure. I declined to work on it, as I was not sure if it's even possible or if it would be better to rewrite the entire thing from scratch with better prompts. I was not that sure about it given the cost and the fact that they had a product that sort of worked and I let them go to find someone to clean it up. My reasoning is that if the client took 200h to develop this without stopping to check the code, it would take me 2 - 3 x to rewrite it with AI, but the right way, while the cleanup may be so painful it would be way better value for money to rewrite it from scratch.
My experience is that Claude Code, when used appropriately, can produce work better than most programmers.
"when used appropriately" means:
- Setting up guardrails: use a statically typed language, linters, CLAUDE.md/skills for best practices.
- Told to do research when making technical decisions, e.g. "look online for prior art" or "do research and compare libraries for X"
- Told to prioritize quality and maintainability over speed. Saying we have no deadline, no budget, etc.
- Given extensive documentation for any libraries/APIs it is using. Usually I will do this as a pre-processing step, e.g. "look at 50 pages of docs for Y and distill it into a skill"
- Given feedback loops to check its work
- Has external systems constraining it from making shortcuts, e.g. "ratchet" checks to make sure it can't add lint suppressions, `unsafe` blocks, etc.
And, the most important things:
- An operator who knows how to write good code. You aren't going to get a good UI/app unless you can tell it what that means. E.g. telling it to prioritize native HTML/CSS over JS, avoiding complexity like Redux, adding animations but focus on usability, make sure the UI is accessible, etc.
- An operator who is steering it to produce a good plan. Not only to make sure that you are building the right thing, but also you are explaining how to test it, other properties it should have (monitoring/observability, latency, availability, etc.)
A lot of this comes down to "put the right things in the context/plan". If you aren't doing that, then of course you're going to get bad output from an LLM. Just like you would get bad output from a dev if you said "build me X" without further elaboration.
I built my latest side project (a Wasm to Go "transpiler") precisely as a way to push the limits of what I could do with an LLM/agent.
It sped me up (and genuinely helped with some ideas) but not 10x.
The bits I didn't design myself I definitely needed to inspect and improve before the ever eager busy beaver drove them to the ground.
That said, I'm definitely impressed by how a frontier model can "reason" about Go code that's building an AST to generate other Go code, and clearly separate what's available at generation time vs. at runtime. There's some sophistication there, and I found myself telling them often "this is the kind of code I want to generate, build the AST."
I also appreciated how faster models are good enough at slightly fuzzy find and replace. Like I need to do this refactor, I did two samples of it here, can you do these other 400? I have these test cases in language X, converted 2, can you do the other 100? Even these simple things saved me a lot of time.
In return I got something that can translate SQLite compiled to Wasm into 500k lines of Go in about a month of my spare time.
This is the article I want to send to every non-technical stakeholder who's watched a demo and said "so can we ship this next week?". The prototype hides all the error handling, edge cases, auth flows, deployment config... the stuff that is actually the product.
The 100 hours aren't a vibecoding tax. They're an engineering knowledge tax.
I built 4 AI products to hundreds of thousands of users, working with AI agents as collaborators, not autopilots. The difference isn't the tool. It's whether you can tell the AI is wrong and stop it before it wastes 10 hours going down the wrong path.
The author watched Claude create new S3 buckets for several rounds before catching it. An experienced engineer catches that on the first diff. Most of those 100 hours were spent not knowing you're lost.
"Vibecoding" as a concept is the problem. It implies you can vibe your way through engineering. You can't. AI is a force multiplier, not a replacement for knowing what good looks like.
I built a jira with attachments and all sorts of bells and whistles. Purrs like a kitten. Saas are going extinct. At least the jobs that charged $1000 a day to write jira plugins.
I dont want that though, I want someone to spend much more time than I can afford thinking about and perfecting a product that I can pay for and dont worry about it
This is a pipe dream and “sufficiently advanced” is doing a lot of heavy lifting. You really think people would rather spin up and debug their own self-made software rather than pay for something that has been tested, debugged, and proven by thousands of users? Why would anyone do that for anything more than a very simple script? It makes zero sense unless the LLM outputs literally perfect one-shot software reliably.
Author admittedly didn’t know how to scale his app for thousands or hundreds of thousands of users. He jokes about it working great on localhost or “my machine”.
Not knocking the premise of the post. It probably works well for one single user if it’s an iPhone or Android app. But his 100 power hours are probably just right for what he ended up launching as he iterated through the requirements and learned how to set this up through reinforced learning and user feedback.
Related anecdote: My 12yo son didn't like the speed cubing online timer he was using because it kept crashing the browser and interrupted him with ads. Instead of googling a better alternative we sat down with claude code and put together the version of the website that behaved and looked exactly as he wanted. He got it working all by himself in under an hour with less than 10 prompts, I only helped a bit putting it online with github pages so he can use it from anywhere.
I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.
Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD operations on large sets of data. NIO being better than FFM + mmap for file reading.
You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.
There’s a big gap between reality and the influencer posts about LLMs. I agree with you that LLMs do provide some significant acceleration, but the influencers have tried to exaggerate this into unbelievable numbers.
Even non-influencers are trying to exaggerate their LLM skills as a way to get hired or raise their status on LinkedIn. I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company). Many of these posts come from people who were all in on crypto companies a few years ago.
The world really is changing but there’s a wave of influencers and trend followers trying to stake out their claims as leaders on this new frontier. They should be ignored if you want any realistic information.
I also think these exaggerated posts are causing a lot of people to miss out on the real progress that is happening. They see these obviously false exaggerations and think the opposite must be true, that LLMs don’t provide any benefit at all. This is creating a counter-wave of LLM deniers who think it’s just a fad that will be going away shortly. They’re diminishing in numbers but every LLM thread on HN attracts a few people who want to believe it’s all just temporary and we’re going back to the old ways in a couple years.
I've got a few projects I've generated, along with a wholly handwritten project started in Dec.
The difference I've noticed is that the act of actually typing out code made me backtrack a few times refining the possible solutions before even starting the integration tests, sometimes before even doing a compile.
When generating, the LLM never backtracked, even in the face of broken tests. It would proceed to continue band-aiding until everything passed. It would add special exceptions to general code instead of determining that the general rule should be refined or changed.
The reason that some devs are reporting 10x productivity is because a bunch of duct-taped, band-aided, instant-legacy code is acceptable. Others who dont see that level of productivity increase are spending time fixing the code to be something they can read.
Not sure yet if accepting the spaghetti is the right course. If future LLMs can understand this spaghetti then theres no point in good code. If we still need human coders, then the productivity increase is very small.
I don't know how other people work, but writing the code for me has been essential in even understanding the problem space. The architecture and design work in a lot of cases is harder without going through that process.
YES YES YES!! I so wish that we could go back in time and never, ever have even suggested anything other that what you say here. AI doesn't do it for you. It does it with you.
You have to figure out what you want before the AI codes. The thinking BEFORE is the entire game.
Though I will also say that I use Claude for working out designs a lot. Literally hours sometimes with long periods of me thinking it through.
And I still get a ton more done and often use tech that I would never have approached before these glory days.
An advantage I have enjoyed is that I am insanely careful about my fundamental architecture and I have a project scaffold that works correctly.
It has examples of all the parts of a web app written, over many years, to be my own ideal structure. When the LLM era arrived, I added a ton of comments explaining what, why and how.
It turns out to serves as a sort of seed crystal for decent code. Though, if I do not remind it to mimic that architecture, it sometimes doesn't and that's very weird.
Still, that's a tip I suggest. Give it examples of good code that are commented to explain why its good.
Products where the only value was the code are definitely under pressure. But, how many products are really like that? I suggest everyone look up HALO that’s so popular in investing right now, and start looking at companies with the assumption that the value of the code is zero so what other value is there. There’s often a lot more there than people realize.
This is exactly my experience at Lovable. For some parts of the organization, LLMs are incredibly powerful and a productivity multiplier. For the team I am in, Infra, it's many times distraction and a negative multiplier.
I can't say how many times the LLM-proposed solution to a jittery behavior is adding retries. At this point we have to be even more careful with controlling the implementation of things in the hot path.
I have to say though, giving Amp/Claude Code the Grafana MCP + read-only kubectl has saved me days worth of debugging. So there's definitely trade-offs!
Additionally, the author seems to build an app just for the sake of building an app / learning, not to solve any real serious business problem. Another "big" claim on LLM capabilities based on a solo toy project.
> Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
Actually I had some terrible experiences when asking the agent to do something simple in our codebase (like, rename these files and fix build scripts and dependencies) but it spent much longer time than a human, because it kept running the full CI pipelines to check the problems after every attempted change.
A human would, for example, rely on the linter to detect basic issues, run a partial build on affected targets, etc. to save the time. But the agent probably doesn't have a sense of time elapsed.
I had less luck with "What does sharp tails mean in «HFT. You want low deterministic latency with sharp tails»". But I suspect the source sentence is the problem.