0.00 100 hour gap between a vibecoded prototype and a working product

Name: HRCB Evaluation: 100 hour gap between a vibecoded prototype and a working product
Item: 100 hour gap between a vibecoded prototype and a working product
Rating: 0
Author: Human Rights Observatory

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

Model: @cf/meta/llama-4-scout-17b-16e-instruct lite ND @cf/meta/llama-4-scout-17b-16e-instruct lite 0.00 @cf/meta/llama-3.3-70b-instruct-fp8-fast lite ND @cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 claude-haiku-4-5-20251001 0.00 Compare

0.00	100 hour gap between a vibecoded prototype and a working product (kanfa.macbudkowski.comS:ND)
	231 points by kiwieater 14 hours ago \| 308 comments on HN \| Neutral High agreement (3 models) Mixed · v3.7 · 2026-03-15 22:28:40 0

Summary Insufficient Content Neutral

The requested URL returns only CSS stylesheets and no evaluable substantive content. No text, editorial material, or structural indicators of human rights engagement are present. HRCB scoring cannot proceed due to absence of observable content material.

Article Heatmap

Negative Neutral Positive No Data

Aggregates

0.00

Weighted Mean	0.00	Unweighted Mean	0.00
Max	0.00 N/A	Min	0.00 N/A
Signal	0	No Data	31
Volatility	0.00 (Low)
Negative	0	Channels	E: 0.6 S: 0.4
SETL ℹ	ND
Agreement	High	3 models · spread ±0.000

Evidence 0% coverage ℹ

    31 ND 

Theme Radar

HN Discussion 20 top-level · 29 replies

niemandhier 2026-03-15 13:26 UTC link

With sufficiently advanced vibe coding the need for certain type of product just vanishes.

I needed it, I quickly build it myself for myself, and for myself only.

alexpotato 2026-03-15 13:56 UTC link

I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.

My thoughts on vibe coding vs production code:

- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs

- This is partly b/c it is good at things I'm not good at (e.g. front end design)

- But then I need to go in and double check performance, correctness, information flow, security etc

- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)

- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs

- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.

skyberrys 2026-03-15 14:10 UTC link

If you ask for something complicated this headline is more than true. But why complicate things, keep it simple and keep it fast.

Also this article uses 'pfp' like it's a word, I can't figure out what it means.

I'm able to vibe code simple apps in 30 minutes, polish it in four hours and now I've been enjoying it for 2 months.

rhoopr 2026-03-15 14:12 UTC link

This seems more like he is bad at describing what he wants and is prompting for “a UI” and then iterating “no, not like that” for 99 hours.

carterparks 2026-03-15 14:44 UTC link

I think there's a lot to pick apart here but I think the core premise is full of truth. This gap is real contrary to what you might see influencers saying and I think it comes from a lot of places but the biggest one is writing code is very different than architecting a product.

I've always said, the easiest part of building software is "making something work." The hardest part is building software that can sustain many iterations of development. This requires abstracting things out appropriately which LLMs are only moderately decent at and most vibe coders are horrible at. Great software engineers can architect a system and then prompt an LLM to build out various components of the system and create a sustainable codebase. This takes time an attention in a world of vibe coders that are less and less inclined to give their vibe coded products the attention they deserve.

raincole 2026-03-15 14:49 UTC link

They're... launching an NFT product in 2026...

I know it's not the point of this article, but really?

hebrides 2026-03-15 14:58 UTC link

I’ve had a similar experience. I’ve been vibecoding a personal kanban app for myself. Claude practically one-shotted 90% of the core functionality (create boards, lanes, cards, etc.) in a single session. But after that I’ve now spent close to 30 hours planning and iterating on the remaining features and UI/UX tweaks to make the app actually work for me, and still, it doesn’t feel "ready" yet. That’s not to say it hasn’t sped up the process considerably; it would’ve taken me hours to achieve what Claude did in the first 10 minutes.

tim-projects 2026-03-15 15:02 UTC link

I started working on one of my apps around a year ago. There was no ai CLI back then. My first prototype was done in Gemini chat. It took a week copy and pasting text between windows. But I was obsessed.

The result worked but that's just a hacked together prototype. I showed it to a few people back then and they said I should turn it into a real app.

To turn it into a full multi user scaleable product... I'm still at it a year later. Turns out it's really hard!

I look at the comments about weekend apps. And I have some of those too, but to create a real actual valuable bug free MVP. It takes work no matter what you do.

Sure, I can build apps way faster now. I spent months learning how to use ai. I did a refactor back in may that was a disaster. The models back then were markedly worse and it rewrote my app effectively destroying it. I sat at my desk for 12 hours a day for 2 weeks trying to unpick that mess.

Since December things have definitely gotten better. I can run an agent up to 8 hours unattended, testing every little thing and produce working code quite often.

But there is still a long way to go to produce quality.

Most of the reason it's taking this long is that the agent can't solve the design and infra problems on its own. I end up going down one path, realising there is another way and backtracking. If I accepted everything the ai wanted, then finishing would be impossible.

dehrmann 2026-03-15 15:12 UTC link

> Late in the night most problems were fixed and I wrote a script that found everyone whose payment got stuck. I sent them money back (+ extra $1 as a ‘thank you for your patience’ note), and let them know via DMs.

(emphasis added)

Not sure if it was actually written by hand or AI was glossed over, but as soon as giving away money was on the table, the author seems to have ditched AI.

marginalia_nu 2026-03-15 15:13 UTC link

The more I evaluate Claude Code, the more it feels like the world's most inconsistent golfer. It can get within a few paces of the hole in often a single strike, and then it'll spend hours, days, weeks trying to nail the putt.

There's some 80-20:ness to all programming, but with current state of the art coding models, the distribution is the most extreme it's ever been.

phillipclapham 2026-03-15 15:17 UTC link

The gap is definitely real. But I think most of this thread is misdiagnosing why it exists. It's not that AI cannot produce production quality code, it's that the very mental model most people have of AI is leading them to use the wrong interaction model for closing that last 20% of complexity in production code bases.

The author accidentally proved it: the moment they stopped prompting and opened Figma to actually design what they wanted, Claude nailed the implementation. The bottleneck was NEVER the code generation, it was the thinking that had to happen BEFORE ever generating that code. It sounds like most of you offload the thinking to AFTER the complexity has arisen when the real pattern is frontloading the architectural thinking BEFORE a single line of code is generated.

Most of the 100-hour gap is architecture and design work that was always going to take time. AI is never going to eliminate that work if you want production grade software. But when harnessed correctly it can make you dramatically faster at the thinking itself, you just have to actually use it as a thinking partner and not just a code monkey.

ChrisMarshallNY 2026-03-15 15:36 UTC link

"working" != "shipping."

When we start selling the software, and asking people to pay for/depend upon our product, the rules change -substantially.

Whenever we take a class or see a demo, they always use carefully curated examples, to make whatever they are teaching, seem absurdly simple. That's what you are seeing, when folks demonstrate how "easy" some new tech is.

A couple of days ago, I visited a friend's office. He runs an Internet Tech company, that builds sites, does SEO, does hosting, provides miscellaneous tech services, etc.

He was going absolutely nuts with OpenClaw. He was demonstrating basically rewiring his entire company, with it. He was really excited.

On my way out, I quietly dropped by the desk of his #2; a competent, sober young lady that I respect a lot, and whispered "Make sure you back things up."

youknownothing 2026-03-15 16:08 UTC link

I'm having somewhat good experiences with AI but I think that's because I'm only half-adopting it: instead of the full agentic / Ralphing / the-AI-can-do-anything way, I still do work in very small increments and review each commit. I'm not as fast as others, but I can catch issues earlier. I also can see when code is becoming a mess and stop to fix things. I mean, I don't fix them manually, I point Claude at the messy code and ask it to refactor it appropriately, but I do keep an eye to make sure Claude doesn't stray off course.

Honestly, seeing all the dumb code that it produces, calling this thing "intelligent" is rather generous...

aenis 2026-03-15 16:29 UTC link

The interesting part about vibe coding is the spectrum of experiences and attitudes. I have been playing with it for 2-3hrs a day for the last 4 months now. None of my friends who are using it are using it in the same way. Some people vibe and then refactor, some spec-everything and micro-prompt the solutions. Nobody is feeling like this thing can go unsupervised.

And then there is one guy, a friend of mine, who is planning to release a "submit a bug report, we will fix it immediately" feature (so, collect error report from a user, possibly interview them, then assess if its a bug or not with a "product owner LLM", and then autonomously do it, and if it passes the tests - merge and push to prod - all under one hour. Thats for a mid cap company, for their client-facing product. F*** hell! I have a full bag of bug reports ready for when this hits prod :->

mrothroc 2026-03-15 18:03 UTC link

Everyone keeps saying 80/20 but that undersells what's going on. The last 20% isn't just hard. It's hard because of what happened during the first 80%.

When an agent takes a shortcut early on, the next step doesn't know it was a shortcut. It just builds on whatever it was handed. And then the step after that does the same thing. So by hour 80 you're sitting there trying to fix what looks like a UI bug and you realize the actual problem is three layers back. You're not doing the "hard 20%." You're paying interest on shortcuts you didn't even know were taken. (As I type this I'm having flashbacks to helping my kid build lego sets.)

The author figured this out by accident. He stopped prompting and opened Figma to design what he actually wanted. That's the move. He broke the chain before the next stage could build on it. The 100 hours is what it costs when you don't do that.

devld 2026-03-15 18:14 UTC link

My non-technical client has totally vibe coded a SaaS prototype with lots of features, way bigger product than OP and it sort of works. They spent like 200 hours on it. I wonder what would have been the time needed to clean it up and approve it is secure. I declined to work on it, as I was not sure if it's even possible or if it would be better to rewrite the entire thing from scratch with better prompts. I was not that sure about it given the cost and the fact that they had a product that sort of worked and I let them go to find someone to clean it up. My reasoning is that if the client took 200h to develop this without stopping to check the code, it would take me 2 - 3 x to rewrite it with AI, but the right way, while the cleanup may be so painful it would be way better value for money to rewrite it from scratch.

shepherdjerred 2026-03-15 20:20 UTC link

My experience is that Claude Code, when used appropriately, can produce work better than most programmers.

"when used appropriately" means:

- Setting up guardrails: use a statically typed language, linters, CLAUDE.md/skills for best practices.

- Told to do research when making technical decisions, e.g. "look online for prior art" or "do research and compare libraries for X"

- Told to prioritize quality and maintainability over speed. Saying we have no deadline, no budget, etc.

- Given extensive documentation for any libraries/APIs it is using. Usually I will do this as a pre-processing step, e.g. "look at 50 pages of docs for Y and distill it into a skill"

- Given feedback loops to check its work

- Has external systems constraining it from making shortcuts, e.g. "ratchet" checks to make sure it can't add lint suppressions, `unsafe` blocks, etc.

And, the most important things:

- An operator who knows how to write good code. You aren't going to get a good UI/app unless you can tell it what that means. E.g. telling it to prioritize native HTML/CSS over JS, avoiding complexity like Redux, adding animations but focus on usability, make sure the UI is accessible, etc.

- An operator who is steering it to produce a good plan. Not only to make sure that you are building the right thing, but also you are explaining how to test it, other properties it should have (monitoring/observability, latency, availability, etc.)

A lot of this comes down to "put the right things in the context/plan". If you aren't doing that, then of course you're going to get bad output from an LLM. Just like you would get bad output from a dev if you said "build me X" without further elaboration.

ncruces 2026-03-15 20:33 UTC link

I built my latest side project (a Wasm to Go "transpiler") precisely as a way to push the limits of what I could do with an LLM/agent.

It sped me up (and genuinely helped with some ideas) but not 10x.

The bits I didn't design myself I definitely needed to inspect and improve before the ever eager busy beaver drove them to the ground.

That said, I'm definitely impressed by how a frontier model can "reason" about Go code that's building an AST to generate other Go code, and clearly separate what's available at generation time vs. at runtime. There's some sophistication there, and I found myself telling them often "this is the kind of code I want to generate, build the AST."

I also appreciated how faster models are good enough at slightly fuzzy find and replace. Like I need to do this refactor, I did two samples of it here, can you do these other 400? I have these test cases in language X, converted 2, can you do the other 100? Even these simple things saved me a lot of time.

In return I got something that can translate SQLite compiled to Wasm into 500k lines of Go in about a month of my spare time.

https://github.com/ncruces/wasm2go

anesxvito 2026-03-15 22:15 UTC link

This is the article I want to send to every non-technical stakeholder who's watched a demo and said "so can we ship this next week?". The prototype hides all the error handling, edge cases, auth flows, deployment config... the stuff that is actually the product.

eongchen 2026-03-15 22:29 UTC link

The 100 hours aren't a vibecoding tax. They're an engineering knowledge tax.

I built 4 AI products to hundreds of thousands of users, working with AI agents as collaborators, not autopilots. The difference isn't the tool. It's whether you can tell the AI is wrong and stop it before it wastes 10 hours going down the wrong path.

The author watched Claude create new S3 buckets for several rounds before catching it. An experienced engineer catches that on the first diff. Most of those 100 hours were spent not knowing you're lost.

"Vibecoding" as a concept is the problem. It implies you can vibe your way through engineering. You can't. AI is a force multiplier, not a replacement for knowing what good looks like.

keyle 2026-03-15 13:42 UTC link

I built a jira with attachments and all sorts of bells and whistles. Purrs like a kitten. Saas are going extinct. At least the jobs that charged $1000 a day to write jira plugins.

lacedeconstruct 2026-03-15 13:54 UTC link

I dont want that though, I want someone to spend much more time than I can afford thinking about and perfecting a product that I can pay for and dont worry about it

etothet 2026-03-15 14:11 UTC link

I noticed this as well. I had to look it up. Apparently ‘pfp’ means ‘profile picture’.

IAmGraydon 2026-03-15 14:13 UTC link

This is a pipe dream and “sufficiently advanced” is doing a lot of heavy lifting. You really think people would rather spin up and debug their own self-made software rather than pay for something that has been tested, debugged, and proven by thousands of users? Why would anyone do that for anything more than a very simple script? It makes zero sense unless the LLM outputs literally perfect one-shot software reliably.

stavros 2026-03-15 14:15 UTC link

Apparently it means profile photo.

firesteelrain 2026-03-15 14:20 UTC link

Author admittedly didn’t know how to scale his app for thousands or hundreds of thousands of users. He jokes about it working great on localhost or “my machine”.

Not knocking the premise of the post. It probably works well for one single user if it’s an iPhone or Android app. But his 100 power hours are probably just right for what he ended up launching as he iterated through the requirements and learned how to set this up through reinforced learning and user feedback.

sieste 2026-03-15 14:45 UTC link

Related anecdote: My 12yo son didn't like the speed cubing online timer he was using because it kept crashing the browser and interrupted him with ads. Instead of googling a better alternative we sat down with claude code and put together the version of the website that behaved and looked exactly as he wanted. He got it working all by himself in under an hour with less than 10 prompts, I only helped a bit putting it online with github pages so he can use it from anywhere.

matt_heimer 2026-03-15 14:47 UTC link

I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.

Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD operations on large sets of data. NIO being better than FFM + mmap for file reading.

You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.

Aperocky 2026-03-15 14:48 UTC link

The magic is testing. Having locally available testing and high throughput testing with high amount of test cases now unlocks more speed.

The test cases themselves becomes the foci - the LLM usually can't get them right.

s1mon 2026-03-15 14:52 UTC link

Yep. As much as the rest of it resonated with LLM coding experiences I'm having, the NFT thing is unfortunate.

PunchTornado 2026-03-15 15:15 UTC link

Yeah but if you have to describe in very much details in english, you're better of just writing it with autocomplete.

I find that vibe coding is useful when it can be build with little details and it makes the right assumptions.

Aurornis 2026-03-15 15:24 UTC link

There’s a big gap between reality and the influencer posts about LLMs. I agree with you that LLMs do provide some significant acceleration, but the influencers have tried to exaggerate this into unbelievable numbers.

Even non-influencers are trying to exaggerate their LLM skills as a way to get hired or raise their status on LinkedIn. I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company). Many of these posts come from people who were all in on crypto companies a few years ago.

The world really is changing but there’s a wave of influencers and trend followers trying to stake out their claims as leaders on this new frontier. They should be ignored if you want any realistic information.

I also think these exaggerated posts are causing a lot of people to miss out on the real progress that is happening. They see these obviously false exaggerations and think the opposite must be true, that LLMs don’t provide any benefit at all. This is creating a counter-wave of LLM deniers who think it’s just a fad that will be going away shortly. They’re diminishing in numbers but every LLM thread on HN attracts a few people who want to believe it’s all just temporary and we’re going back to the old ways in a couple years.

lelanthran 2026-03-15 15:30 UTC link

I've got a few projects I've generated, along with a wholly handwritten project started in Dec.

The difference I've noticed is that the act of actually typing out code made me backtrack a few times refining the possible solutions before even starting the integration tests, sometimes before even doing a compile.

When generating, the LLM never backtracked, even in the face of broken tests. It would proceed to continue band-aiding until everything passed. It would add special exceptions to general code instead of determining that the general rule should be refined or changed.

The reason that some devs are reporting 10x productivity is because a bunch of duct-taped, band-aided, instant-legacy code is acceptable. Others who dont see that level of productivity increase are spending time fixing the code to be something they can read.

Not sure yet if accepting the spaghetti is the right course. If future LLMs can understand this spaghetti then theres no point in good code. If we still need human coders, then the productivity increase is very small.

jopsen 2026-03-15 15:37 UTC link

Yeah, communicating what you want can be hard.

I'm doing a simple single line text editor, and designing some frame options. Which has a start end markers.

This was really hard to get the LLM to do right.. until just took a pen and paper, drew what I wanted, took a photo and gave it to the llm

AstroBen 2026-03-15 15:47 UTC link

I don't know how other people work, but writing the code for me has been essential in even understanding the problem space. The architecture and design work in a lot of cases is harder without going through that process.

tqwhite 2026-03-15 16:22 UTC link

YES YES YES!! I so wish that we could go back in time and never, ever have even suggested anything other that what you say here. AI doesn't do it for you. It does it with you.

You have to figure out what you want before the AI codes. The thinking BEFORE is the entire game.

Though I will also say that I use Claude for working out designs a lot. Literally hours sometimes with long periods of me thinking it through.

And I still get a ton more done and often use tech that I would never have approached before these glory days.

tqwhite 2026-03-15 16:26 UTC link

I would love it if someone explained what their ten agents Ralphing away were actually told to do.

I suppose if you are doing something that truly can be decided based on a test but, I just don't see it, at least for anything I do.

tqwhite 2026-03-15 16:39 UTC link

An advantage I have enjoyed is that I am insanely careful about my fundamental architecture and I have a project scaffold that works correctly.

It has examples of all the parts of a web app written, over many years, to be my own ideal structure. When the LLM era arrived, I added a ton of comments explaining what, why and how.

It turns out to serves as a sort of seed crystal for decent code. Though, if I do not remind it to mimic that architecture, it sometimes doesn't and that's very weird.

Still, that's a tip I suggest. Give it examples of good code that are commented to explain why its good.

Gud 2026-03-15 16:56 UTC link

Absolutely. You need to treat it like a real program from the very beginning.

tqwhite 2026-03-15 17:11 UTC link

Back then, also around May, I had Claude 3.old destroy a working app. Those were sad old days.

Hasn't happened in a long time. Opus 4.6 is a miracle improvement.

matwood 2026-03-15 17:38 UTC link

Products where the only value was the code are definitely under pressure. But, how many products are really like that? I suggest everyone look up HALO that’s so popular in investing right now, and start looking at companies with the assumption that the value of the code is zero so what other value is there. There’s often a lot more there than people realize.

chamomeal 2026-03-15 17:53 UTC link

I have friends (well, friends of friends) who still play the NFT lottery. People love gambling lol

bittermandel 2026-03-15 18:18 UTC link

This is exactly my experience at Lovable. For some parts of the organization, LLMs are incredibly powerful and a productivity multiplier. For the team I am in, Infra, it's many times distraction and a negative multiplier.

I can't say how many times the LLM-proposed solution to a jittery behavior is adding retries. At this point we have to be even more careful with controlling the implementation of things in the hot path.

I have to say though, giving Amp/Claude Code the Grafana MCP + read-only kubectl has saved me days worth of debugging. So there's definitely trade-offs!

itomato 2026-03-15 19:10 UTC link

And the viewpoint is from the development of such "product" with "manufactured virality".

It's bunk.

cyk21 2026-03-15 19:23 UTC link

This.

Additionally, the author seems to build an app just for the sake of building an app / learning, not to solve any real serious business problem. Another "big" claim on LLM capabilities based on a solo toy project.

pindab0ter 2026-03-15 20:44 UTC link

Where do you keep the skills it generates from docs? Does this not become a mess?

cvhc 2026-03-15 20:53 UTC link

> Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

Actually I had some terrible experiences when asking the agent to do something simple in our codebase (like, rename these files and fix build scripts and dependencies) but it spent much longer time than a human, because it kept running the full CI pipelines to check the problems after every attempted change.

A human would, for example, rely on the linter to detect basic issues, run a partial build on affected targets, etc. to save the time. But the agent probably doesn't have a sense of time elapsed.

robocat 2026-03-15 21:20 UTC link

AI is usually better than traditional search for working out acronyms and jargon.

My prompt "What does PFP mean on this page: https://kanfa.macbudkowski.com/vibecoding-cryptosaurus" gave a good answer and it described extra relevant context within crypto.

I had less luck with "What does sharp tails mean in «HFT. You want low deterministic latency with sharp tails»". But I suspect the source sentence is the problem.

mattmanser 2026-03-15 22:28 UTC link

Was a kanban board ever that hard?

Trello was written by interns as a summer project, when SPAs were just becoming a thing and React didn't even exist.

With 30 hours I bet I could get a pretty good one up without vibe coding it.

In a single afternoon I could get boards, cards, lanes, etc done. React, MaterialUI uaing Grid + Card and you're almost done.

Editorial Channel

What the content says

Preamble Preamble

No text content available to assess preamble-related statements about human dignity, freedom, or justice.

Article 1 Freedom, Equality, Brotherhood