+0.41 When AI Writes the Software, Who Verifies It? (leodemoura.github.io S:+0.36 )
305 points by todsacerdoti 12 days ago | 299 comments on HN | Moderate positive Contested Low agreement (3 models) Editorial · v3.7 · 2026-03-16 00:13:44 0
Summary Digital Security & Scientific Knowledge Advocates
This technical essay advocates for formal mathematical verification of AI-generated software as essential infrastructure for protecting human rights in critical systems (healthcare, finance, defense, transportation). The author, creator of the Lean theorem prover, frames verification not as a cost but as a catalyst for secure, reliable systems, and documents emerging breakthroughs in AI-assisted proof construction. The content champions open-source, vendor-independent verification platforms as necessary for collective security and scientific progress.
Rights Tensions 2 pairs
Art 19 Art 27 Content advocates open-source verification platforms (Article 27 - scientific knowledge) to protect critical infrastructure, but this may restrict competitive innovation by vendors who want proprietary verification systems (implicit tension between collective knowledge ownership and commercial freedom).
Art 3 Art 27 The goal of verified software protects right to life in critical systems (Article 3), but the high complexity and specialized knowledge required for formal verification may concentrate power in hands of specialized researchers, potentially restricting broader participation in scientific advancement (Article 27).
Article Heatmap
Preamble: +0.31 — Preamble P Article 1: +0.26 — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: +0.23 — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: ND — Equality Before Law Article 7: No Data — Equality Before Law 7 Article 8: ND — Right to Remedy Article 8: No Data — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: +0.38 — Privacy 12 Article 13: ND — Freedom of Movement Article 13: No Data — Freedom of Movement 13 Article 14: ND — Asylum Article 14: No Data — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: ND — Marriage & Family Article 16: No Data — Marriage & Family 16 Article 17: ND — Property Article 17: No Data — Property 17 Article 18: ND — Freedom of Thought Article 18: No Data — Freedom of Thought 18 Article 19: +0.87 — Freedom of Expression 19 Article 20: ND — Assembly & Association Article 20: No Data — Assembly & Association 20 Article 21: ND — Political Participation Article 21: No Data — Political Participation 21 Article 22: ND — Social Security Article 22: No Data — Social Security 22 Article 23: ND — Work & Equal Pay Article 23: No Data — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: +0.33 — Standard of Living 25 Article 26: ND — Education Article 26: No Data — Education 26 Article 27: +0.81 — Cultural Participation 27 Article 28: +0.38 — Social & International Order 28 Article 29: +0.43 — Duties to Community 29 Article 30: ND — No Destruction of Rights Article 30: No Data — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
E
+0.41
S
+0.36
Weighted Mean +0.48 Unweighted Mean +0.44
Max +0.87 Article 19 Min +0.23 Article 3
Signal 9 No Data 22
Volatility 0.22 (Medium)
Negative 0 Channels E: 0.6 S: 0.4
SETL +0.12 Editorial-dominant
FW Ratio 52% 22 facts · 20 inferences
Agreement Low 3 models · spread ±0.199
Evidence 20% coverage
2H 7M 22 ND
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.29 (2 articles) Security: 0.23 (1 articles) Legal: 0.00 (0 articles) Privacy & Movement: 0.38 (1 articles) Personal: 0.00 (0 articles) Expression: 0.87 (1 articles) Economic & Social: 0.33 (1 articles) Cultural: 0.81 (1 articles) Order & Duties: 0.41 (2 articles)
HN Discussion 20 top-level · 30 replies
acedTrex 2026-03-03 17:53 UTC link
No one does currently, and its going to take a few very painful and high profile failures of vital systems for this industry to RELEARN its lesson about the price of speed.

In fact it will probably need to happen a few times PER org for the dust to settle. It will take several years.

oakpond 2026-03-03 18:04 UTC link
You do. Even the latest models still frequently write really weird code. The problem is some developers now just submit code for review that they didn't bother to read. You can tell. Code review is more important than ever imho.
_pdp_ 2026-03-03 18:14 UTC link
I think the issue goes even deeper than verification. Verification is technically possible. You could, in theory, build a C compiler or a browser and use existing tests to confirm it works.

The harder problem is discovery: how do you build something entirely new, something that has no existing test suite to validate against?

Verification works because someone has already defined what "correct" looks like. There is possible a spec, or a reference implementation, or a set of expected behaviours. The system just has to match them.

But truly novel creation does not have ground truth to compare against and no predefined finish line. You are not just solving a problem. You are figuring out what the problem even is.

muraiki 2026-03-03 19:03 UTC link
The article says that AWS's Cedar authorization policy engine is written in Lean, but it's actually written in Dafny. Writing Dafny is a lot closer to writing "normal" code rather than the proofs you see in Lean. As a non-mathematician I gave up pretty early in the Lean tutorial, while in a recent prototype I learned enough Dafny to be semi-confident in reviewing Claude's Dafny code in about half a day.

The Dafny code formed a security kernel at the core of a service, enforcing invariants like that an audit log must always be written to prior to a mutating operation being performed. Of course I still had bugs, usually from specification problems (poor spec / design) or Claude not taking the proof far enough (proving only for one of a number of related types, which could also have been a specification problem on my part).

In the end I realized I'm writing a bunch of I/O bound glue code and plain 'ol test driven development was fine enough for my threat model. I can review Python code more quickly and accurately than Dafny (or the Go code it eventually had to link to), so I'm back to optimizing for humans again...

madrox 2026-03-03 19:51 UTC link
I encourage everyone to RTFA and not just respond to the headline. This really is a glimpse into where the future is going.

I've been saying "the last job to be automated will be QA" and it feels more true every day. It's one thing to be a product engineer in this era. It's another to be working at the level the author is, where code needs to be verifiable. However, once people stop vibing apps and start vibing kernels, it really does fundamentally change the game.

I also have another saying: "any sufficiently advanced agent is indistinguishable from a DSL." I hadn't considered Lean in this equation, but I put these two ideas together and I feel like we're approaching some world where Lean eats the entire agentic framework stack and the entire operating system disappears.

If you're thinking about building something today that will still be relevant in 10 years, this is insightful.

chromaton 2026-03-03 22:15 UTC link
TFA seems to be big on mathematical proof of correctness, but how do you ever know you're proving the right thing?
faitswulff 2026-03-04 00:19 UTC link
Someone actually did fuzz the claude c compiler:

https://john.regehr.org/writing/claude_c_compiler.html

lateforwork 2026-03-04 00:25 UTC link
The first thing you should have AI write is a comprehensive test suite. Then have it implement the main functionality. If the tests pass that is one level of verification.

In addition you can have one AI check another AI's code. I routinely copy/paste code from Claude to ChatGPT and Gemini have them check each other's code. This works very well. During the process I have my own eyes verify the code as well.

maltalex 2026-03-04 01:07 UTC link
Maybe I'm missing something, but isn't this the same as writing code, but with extra steps?

Currently, engineers work with loose specifications, which they translate into code. With the proposed approach, they would need to first convert those specifications into a formally verifiable form before using LLMs to generate the implementation.

But to be production-ready, that spec would have to cover all possible use-cases, edge cases, error handling, performance targets, security and privacy controls, etc. That sounds awfully close to being an actual implementation, only in a different language.

roadbuster 2026-03-04 01:59 UTC link
> The Claude C Compiler illustrates the other side: it optimizes for

> passing tests, not for correctness. It hard-codes values to satisfy

> the test suite. It will not generalize.

This is one of the pain points I am suffering at work: workers ask coding agents to generate some code, and then to generate test coverage for the code. The LLM happily churns out unit tests which are simply reinforcing the existing behaviour of the code. At no point does anyone stop and ask whether the generated code implements the desired functional behaviour for the system ("business logic").

The icing on the cake is that LLMs are producing so much code that humans are just rubber stamping all of it. Off to merge and build it goes.

I have no constructive recommendations; I feel the industry will keep their foot on the pedal until something catastrophic happens.

vicchenai 2026-03-04 02:09 UTC link
The verification problem scales poorly with AI complexity. Current approaches rely on test suites, but AI-generated code tends to optimize for passing existing tests rather than correctness in the general case.

What's interesting is this might be the forcing function that finally brings formal verification into mainstream use. Tools like Lean and Coq have been technically impressive but adoption-starved. If unverified AI code is too risky to deploy in critical systems, organizations may have no choice but to invest in formal specs. AI writes the software, proof assistants verify it.

The irony: AI-generated code may be what makes formal methods economically viable.

wyum 2026-03-04 02:31 UTC link
I believe there is a Verification Complexity Barrier

As you add components to a system, the time it takes to verify that the components work together increases superlinearly.

At a certain point, the verification complexity takes off. You literally run out of time to verify everything.

AI coding agents hit this barrier faster than ever, because of how quickly they can generate components (and how poorly they manage complexity).

I think verification is now the problem of agentic software engineering. I think formal methods will help, but I don't see how they will apply to messy situations like end-to-end UI testing or interactions between the system and the real world.

I posted more detailed thoughts on X: https://x.com/i/status/2027771813346820349

neya 2026-03-04 03:59 UTC link
The fundamental problem is the verification loop for the average developer is grounded not in tests, but with results. Write code, reload browser, check output. Does it work the way I want? Good. We're done here.

Not write code, write tests, ensure all test-cases are covered. Now, imagine such a flaky foundation is used to build on top of even more untested code. That's how bad quality software (that's usually unfixable without a major re-write) is born.

Also, most vibe-coders don't have enough experience/knowledge to figure out what is wrong with the code generated by the AI. For that, you need to know more than the AI and have a strong foundation in the domain you're working on. Here is an example: You ask the AI to write the code for a comment form. It generates the backend and frontend code for you (let's say React/Svelte/Vue/whatever). The vibe-coder sees the UI - most likely written in Tailwind CSS and thinks "wow, that looks really good!" and they click approve. However, an experienced person might notice the form does not have CSRF protection in place. The vibe-coder might not even be aware of the concept of CSRF (let alone the top 10 OSWAP security risks).

Hence, the fundamental problem is not knowing about the domain more than the AI to pick up the flaw. Unless this fundamental problem is solved - which I don't think it will anytime soon because everyone can generate code + UI these days, I don't see a solution to the verification problem.

However, this is good news for consultants and the like because it creates more work down the line to fix the vide-coded mess because they got hacked the very next day and we can charge them a rush fee on top of it, too. So, it's not all that bad.

SurvivorForge 2026-03-04 04:17 UTC link
The uncomfortable truth is that most teams were already bad at verification before AI entered the picture. The difference is that AI-generated code comes in faster, so the verification bottleneck becomes painfully obvious. I think the real opportunity here is that AI forcing us to think harder about specifications and correctness might actually improve the quality of human-written code too — it's making us confront the fact that "it works on my machine" was never real verification.
cagz 2026-03-04 06:26 UTC link
> No one is formally verifying the result

This might be the case for a hobby project or a start-up MVP being created in a rush, but in reality, there are a few points we may want to take into account:

1. Software teams I work with are maintaining the usual review practices. Even if a feature is completely created by AI. It goes through the usual PR review process. The dev may choose "Accept All", although I am not saying this is a good practice, the change still gets reviewed by a human.

2. From my experience, sub-agents intended for code and security review do a good job. It is even possible to use another model to review the code, which can provide a different perspective.

3. A year ago, code written by AI was failing to run the first time, requiring a painful joint troubleshooting effort. Now it works 95% of the time, but perhaps it is not optimal. Given the speed at which it is improving, it is safe to expect that in 6-9 months' time, it will not only work but will also be written to a good quality.

phyzix5761 2026-03-04 07:52 UTC link
I found my best use of AI for writing code is with a TDD approach. I write 1 test, watch it fail, let the AI implement the solution, and then if I like it I commit. Write the next test and repeat.

It keeps me in the loop, I'm testing actual functionality rather than code, and my code is always in a state where I can open a PR and merge it back to main.

thorn 2026-03-04 07:59 UTC link
I don’t think many people are interested in producing 100% correct code. I saw this at big companies and small startups. It is all the same: ship feature asap before competitors. Writing correct code is almost always punished indirectly in the way they only praise the feature delivering heroes that got promoted. Nobody got promoted for preventing bugs.

Maybe in some other circles it is not like that, but I am sure that 90% of industry measures output in the amount of value produced and correct code is not the value you can show to the stockholders.

It is sad state of affairs dictated by profit seeking way of life (capitalism).

toastal 2026-03-04 08:52 UTC link
Note that Lean doesn’t have a monopoly on verification languages. Dafney, Rocq, Why3, ATS, Agda, Idris… these all can do it too. The fact that Lean is controlled by Microsoft should cause folks pause considering how they are trying to monopolize so many other spaces… & the author works on Lean. Rather than comparing Lean to performance like OCaml/Haskell like the article does, Why3 is a superset of OCaml, Agda can compile to Haskell, & ATS2 compiles to C—rather than needing to adopt an entirely new language.
voxleone 2026-03-04 10:55 UTC link
What’s striking here is the convergence on a minimal axiomatic kernel (Lean) as the only scalable way to guarantee coherent reasoning. Some of us working on foundational physics are exploring the same methodological principle. In the “Functional Universe” framework[0], for example, we start from a small set of axioms and attempt to derive physical structure from that base.

The domains are different, but the strategy is similar: don’t rely on heuristics or empirical patching; define a small trusted core of axioms and generate coherent structure compositionally from there.

[0] https://voxleone.github.io/FunctionalUniverse

randusername 2026-03-04 14:42 UTC link
> Most people think of verification as a cost, a tax on development, justified only for safety-critical systems. That framing is outdated.

> The value is not in the verification workforce. It is in what verified delivery enables

These takes misrepresent safety-critical software verification. The verification workforce is indispensable. Whereas the software engineers are usually highly specialized, the verification team owns the domain and project knowledge to make sure the whole thing integrates. They consult on requirements, anticipate oversights, build infrastructure and validate it, and execute the verification.

So when the business needs a go / no-go call, they ask the verification lead.

Automation and proofs have their place, but I don't see how verification without human accountability is workable.

MrDarcy 2026-03-03 18:09 UTC link
It is remarkably effective to have Claude Code do the code review and assign a quality score, call it a grade, to the contribution derived from your own expectations of quality.

Then don’t even bother looking at C work or below.

arscan 2026-03-03 18:15 UTC link
Sure but industry cares about value (= benefit - price), not just price. Price could be astronomical, but that doesn’t matter if benefit is larger.
sausagefeet 2026-03-03 18:16 UTC link
I agree with you. But I have to say, it is an uphill battle and all the incentives are against you.

1. AI is meant to make us go faster, reviews are slow, the AI is smart, let it go.

2. There are plenty of AI maximizers who only think we should be writing design docs and letting the AI go to town on it.

Maybe, this might be a great time to start a company. Maximize the benefits of AI while you can without someone who has never written a line of code telling you that your job is going to disappear in 12 months.

All the incentives are against someone who wants to use AI in a reasonable way, right now.

jcgrillo 2026-03-03 18:33 UTC link
I feel like people used to talk about nines of uptime more. As in more than one. These days we've lost that: https://bsky.app/profile/jkachmar.com/post/3mg4u3e6nak2p

I recall a time, maybe around 2013-2017, when people were talking about 4 or 5 nines. But sometime around then the goalposts shifted, and instead of trying to make things as reliable as possible, it started becoming more about seeing how unreliable they can get before anyone notices or cares. It turns out people will suffer through a lot if there's some marginal benefit--remember what personal computers were like in the 1990s before memory protection? Vibe coding is just another chapter in that user hostile epic. Convenient reliability, like this author describes, (if it can be achieved) might actually make things better? But my money isn't on that.

xienze 2026-03-03 18:34 UTC link
> The problem is some developers now just submit code for review that they didn't bother to read.

Can you blame them? All the AI companies are saying “this does a better job than you ever could”, every discussion topic on AI includes at least one (totally organic, I’m sure) comment along the lines of “I’ve been developing software for over twenty years and these tools are going to replace me in six months. I’m learning how to be a plumber before I’m permanently unemployed.” So when Claude spits out something that seems to work with a short smoke test, how can you blame developers for thinking “damn the hype is real. LGTM”?

Avshalom 2026-03-03 18:35 UTC link
Well that's a problem the software industry has been building for itself for decades.

Software has, since at least the adoption of "agile" created an industry culture of not just refusing to build to specs but insisting that specs are impossible to get from a customer.

bradleykingz 2026-03-03 18:58 UTC link
But it's so BORING. AI gets to do the fun part (writing code) and I'm stuck with the lame bits.

It's like watching someone else solve a puzzle, or watching someone else play a game vs playing it yourself (at least that's half as interesting as playing it through)

throwaw12 2026-03-03 20:46 UTC link
> You do

I really want to say: "You are absolutely right"

But here is a problem I am facing personally (numbers are hypothetical).

I get a review request 10-15/day by 4 teammates, who are generating code by prompting, and I am doing same, so you can guess we might have ~20 PRs/day to review. now each PR is roughly updating 5-6 files and 10-15 lines in each.

So you can estimate that, I am looking at around 50-60 files, but I can't keep the context of the whole file because change I am looking is somewhere in the middle, 3 lines here, 5 lines there and another 4 lines at the end.

How am I supposed to review all these?

charlieflowers 2026-03-03 21:08 UTC link
> "any sufficiently advanced agent is indistinguishable from a DSL."

I don't quite follow but I'd love to hear more about that.

bwestergard 2026-03-03 22:36 UTC link
I am as enthusiastic about formal methods as the next guy, but I very much doubt any LLM-based technique will make it economical to write a substantial fraction of application software in Lean. The LLM can play a powerful heuristic role in searching for proof-bearing code in areas where there is good training data. Unfortunately those areas are few and far between.

Moreover, humans will still need to read even rigorously proved code if only to suss out performance issues. And training people to read Lean will continue to be costly.

Though, as the OP says, this is a very exciting time for developing provably correct systems programming.

fmbb 2026-03-03 22:51 UTC link
There are still no successful useful vibe codes apps. Kernels are pretty far away I think.
void-star 2026-03-04 00:35 UTC link
The advice that everyone seemed to agree on at least just a few months ago was to make sure _you_ write the comprehensive tests/specs and this is what I still would recommend doing to anyone asking. I guess even this may be falling out of fashion though…
dranov 2026-03-04 00:41 UTC link
Cedar used to be written in Dafny, but AWS abandoned that implementation and rewrote it in Lean.

https://aws.amazon.com/blogs/opensource/lean-into-verified-s...

cpeterso 2026-03-04 02:18 UTC link
Looks like LLMs also find Dafny easier to write than Lean. This study, “A benchmark for vericoding: formally verified program synthesis”, reports:

> We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications … We find vericoding success rates of 27% in Lean, 44% in Verus/Rust and 82% in Dafny using off-the-shelf LLMs.

https://arxiv.org/html/2509.22908v1

hwayne 2026-03-04 03:17 UTC link
The key bit is that specifications don't need to be "obviously computable", so they can be a lot simpler than the code that implements them. Consider the property "if some function has a reference to a value, that value will not change unless that function explicitly changes it". It's simple enough to express, but to implement it Rust needs the borrow checker, which is a pretty heavy piece of engineering. And proving the implementation actually guarantees that property isn't easy, either!
nextos 2026-03-04 04:06 UTC link
Formal specifications can be easier to write than code. Take refinement types in Dafny, for example. Because they are higher-level, you don't need to bother with tedious implementation details. By shifting our focus to formal specifications rather than manual coding, we could possibly leverage automation to not only develop software faster but also achieve a higher degree of assurance.

Of course, this remains largely theoretical for now, but it is an exciting possibility. Note high-level specifications often overlook performance issues, but they are likely sufficient for most scenarios. Regardless, we have formal development methodologies able to decompose problems to an arbitrary level of granularity since the 1990s, all while preserving correctness. It is likely that many of these ideas will be revisited soon.

Spivak 2026-03-04 04:18 UTC link
My impression has been with frontend tests is they come in two flavors: extremely hard to write to the point of impracticality, and useless. Most orgs settle for the latter and end up testing easily testable things like the final rendered dom and rely on human qa for all the hard bits like "does the page actually look like it's supposed to, does interacting with the UI elements behave correctly, and do the flows actually work."

All largely stemming from the fact that tests can't meaningfully see and interact with the page like the end user will.

gwern 2026-03-04 04:29 UTC link
> I encourage everyone to RTFA and not just respond to the headline.

This is an example of an article which 'buries the lede'†.

It should have started with the announcement of the new zlib autoformalization (!) https://leodemoura.github.io/blog/2026/02/28/when-ai-writes-... to get you excited.

Then it should have talked about the rest - instead of starting with rather graceless and ugly LLM-written generic prose about AI topics that to many readers is already tiresomely familiar and doubtless was tldr for even the readers who aren't repelled automatically by that.

† or in my terms, fails to 'make you care': https://gwern.net/blog/2026/make-me-care

WhyNotHugo 2026-03-04 04:53 UTC link
This is why you write the tests first and then the code. Especially when fixing bugs, since you can be sure that the test properly fails when the bug is present.
IAmGraydon 2026-03-04 05:20 UTC link
Yeah this is the exact kind of ridiculousness I've noticed as well - everything that comes out of an LLM is optimized to give you what you want to hear, not what's correct.
galaxyLogic 2026-03-04 05:53 UTC link
There's two ways of thinking about tests:

A) They let you verify that implementation passes its spec, more or less.

B) They are a (trustworthy) description of how the system behaves, they allow you to understand the system better.

porphyra 2026-03-04 05:57 UTC link
At my job we have a requirement for 100% test coverage. So everyone just uses AI to generate 10,000 line files of unit tests and nobody can verify anything.
kaicianflone 2026-03-04 06:07 UTC link
Thank you for the post. It's a good read. I'm working on governance/validation layers for n-LLMs and making them observable so your comments on runaway AIs resonated with me. My research is pointing me to reputation and stake consensus mechanisms being the validation layer either pre inference or pre-execution, and the time to verify decisions can be skipped with enough "decision liquidity" via reputation alone aka decision precedence.
globular-toast 2026-03-04 07:18 UTC link
This is actually how a lot of software is written, sadly. I used to call it "trial and error programming". I've observed people writing C++ who do not have a mental model of memory but just try stuff until it compiles. For some classes of software, like games, that is acceptable, but for others it would be horrifying.

Now, it is actually completely possible to write UI code without any unit tests in a completely safe way. You use the functional core, imperative shell approach. When all your domain logic is in a fully tested, functional core, you can just go ahead and write "what works" in a thin UI shell. Good luck getting an LLM to rigidly conform to such an architecture, though.

scotty79 2026-03-04 07:23 UTC link
> The LLM happily churns out unit tests which are simply reinforcing the existing behaviour of the code.

I always felt like that's the main issue with unit testing. That's why I used it very rarely.

Maybe keeping tests in the separate module and not letting th Agent see the source during writing tests and not letting agent see the tests while writing implemntation would help? They could just share the API and the spec.

And in case of tests failing another agent with full context could decide if the fix should be delegated to coding agent or to testing agent.

Freak_NL 2026-03-04 08:40 UTC link
Not value produced, but easily quantifiable value produced, I would say. There are lots of cases where some perfectly tweaked internal tool saves hours of work every week, or some mostly invisible (but crucial) project, application, or module just somehow always works. But the user never twigs they exist, and the customer doesn't care. The value is there (in saved hours, frustration, and stability), but it won't count.
thorn 2026-03-04 09:01 UTC link
Very deep post on the problem. AI seems to worsen the issue of software correctness and given the nature of business, it won’t be ever solved.
JuniperMesos 2026-03-04 09:43 UTC link
There's multiple Lean tutorials, some of which are more mathy than others. One of the things I like about Lean is precisely that it's an ordinary, Haskell-style functional programming language in addition to having all the Curry-Howard-isomorphism-based mathematical proof machinery. You can write `cat` in Lean.
littlestymaar 2026-03-04 11:26 UTC link
Long time ago in France the mainstream view by computer people was that code or compute weren't what's important when dealing with computers, it is information that matters and how you process it in a sensible way (hence the name of computer science in French: informatique. And also the name for computer: “ordinateur”, literally: what sets things into order).

As a result, computer students were talked a lot (too much for most people's taste, it seems) about data modeling and not too much about code itself, which was viewed as mundane and uninteresting until the US hacker culture finally took over in the late 2000th.

Turns out that the French were just right too early, like with the Minitel.

kubanczyk 2026-03-04 12:25 UTC link
> RTFA

Sigh. Is there any LLM solution for HN reader to filter out all top-level commenters that hadn't RTFA? I don't need the (micro-)shitstorms that these people spawn, even if the general HN algo scores these as "interesting".

Editorial Channel
What the content says
+0.65
Article 27 Cultural Participation
High A: Participation in scientific advancement and benefits of formal verification technology F: Framing AI-generated code verification as a knowledge commons that should benefit all
Editorial
+0.65
SETL
+0.25

Content extensively documents scientific progress in AI-assisted formal verification (AlphaProof, zlib conversion, Prime Number Theorem formalization, E8 sphere packing proof). Author advocates for shared platform (Lean) as scientific commons, positioning verification as collective scientific achievement. The call for 'the largest possible library of formalized knowledge' explicitly invokes knowledge as shared resource.

+0.55
Article 19 Freedom of Expression
High A: Freedom to impart scientific knowledge and information about software verification F: Framing AI-generated code as a transparency/accountability problem that requires open-source solutions
Editorial
+0.55
SETL
-0.17

Content is explicitly editorial expression of ideas, opinion, and scientific reasoning. Author freely expresses analysis of AI risks and advocates for specific technological solutions. The argument for open-source, independent verification platforms reflects commitment to freedom of information and expression about technical systems.

+0.45
Article 29 Duties to Community
Medium A: Community responsibility to maintain integrity and security of shared digital infrastructure
Editorial
+0.45
SETL
+0.15

Content frames software verification as a duty owed to community: engineers have responsibility to ensure critical systems are correct. The argument that 'manual friction' should be replaced with 'mathematical friction' invokes collective commitment to responsible development.

+0.40
Article 12 Privacy
Medium A: Privacy protection through supply-chain security in AI-generated code
Editorial
+0.40
SETL
+0.14

Content addresses privacy threats from AI-generated code: 'When AI writes the software, the attack surface shifts: an adversary who can poison training data or compromise the model's API can inject subtle vulnerabilities into every system that AI touches.' Formal verification is positioned as a defense for privacy in critical systems.

+0.40
Article 28 Social & International Order
Medium A: Social and international order required to realize verification rights
Editorial
+0.40
SETL
+0.14

Content implicitly calls for social and technical order that enables software verification to be practical at scale. The call for independent platforms, open standards, and international collaboration reflects commitment to institutional arrangements that support collective rights.

+0.35
Preamble Preamble
Medium A: Advocates for mathematical proof as defense against systemic risk from AI-generated code
Editorial
+0.35
SETL
+0.19

Content implicitly grounds advocacy in human dignity and collective welfare: protecting critical infrastructure (financial, medical, defense) preserves conditions for all individuals to exercise rights without catastrophic technological failure. The framing connects software verification to preventing systemic harm.

+0.35
Article 25 Standard of Living
Medium A: Social and technical infrastructure for adequate standard of living (through secure, verified software)
Editorial
+0.35
SETL
+0.13

Content frames verification as essential infrastructure for maintaining adequate standard of living in systems that enable healthcare, financial security, and economic function. Unverified AI code undermines the social and technical conditions for adequate living standards.

+0.30
Article 1 Freedom, Equality, Brotherhood
Medium A: Equal dignity through equal access to verification technology
Editorial
+0.30
SETL
+0.17

Content frames verification as a platform-level requirement that, when made practical and cheap, applies equally to all software creators and users regardless of economic status. The argument that 'verification is no longer a cost. It is a catalyst' implies potential for universal benefit.

+0.25
Article 3 Life, Liberty, Security
Medium A: Right to life and security protected through verified critical infrastructure
Editorial
+0.25
SETL
+0.11

Content argues that unverified AI-generated code in critical systems (medical, defense, transportation) poses risks to life and security. Mathematical proof is positioned as a defense mechanism for systems humans depend on for survival.

ND
Article 2 Non-Discrimination

No observable content on discrimination or rights without distinction.

ND
Article 4 No Slavery

No observable content on slavery or servitude.

ND
Article 5 No Torture

No observable content on torture or cruel treatment.

ND
Article 6 Legal Personhood

No observable content on right to recognition as a person.

ND
Article 7 Equality Before Law

No observable content on equal protection before law.

ND
Article 8 Right to Remedy

No observable content on right to remedy.

ND
Article 9 No Arbitrary Detention

No observable content on arbitrary arrest or detention.

ND
Article 10 Fair Hearing

No observable content on fair and public hearing.

ND
Article 11 Presumption of Innocence

No observable content on presumption of innocence or criminal liability.

ND
Article 13 Freedom of Movement

No observable content on freedom of movement.

ND
Article 14 Asylum

No observable content on asylum.

ND
Article 15 Nationality

No observable content on nationality.

ND
Article 16 Marriage & Family

No observable content on marriage or family.

ND
Article 17 Property

No observable content on property rights.

ND
Article 18 Freedom of Thought

No observable content on freedom of thought, conscience, or religion.

ND
Article 20 Assembly & Association

No observable content on freedom of assembly or association.

ND
Article 21 Political Participation

No observable content on participation in government or public affairs.

ND
Article 22 Social Security

No observable content on social security or economic and social rights.

ND
Article 23 Work & Equal Pay

No observable content on right to work or labor conditions.

ND
Article 24 Rest & Leisure

No observable content on rest and leisure.

ND
Article 26 Education

No observable content on right to education.

ND
Article 30 No Destruction of Rights

No observable content that would restrict or interpret other UDHR rights.

Structural Channel
What the site does
Element Modifier Affects Note
Legal & Terms
Privacy
No privacy policy observable on domain; personal blog context.
Terms of Service
No ToS observable; personal blog without commercial terms.
Identity & Mission
Mission +0.15
Article 27 Article 19
Domain owner Leonardo de Moura is creator of Lean theorem prover; blog advocates for formal verification as public good, aligning with knowledge accessibility and scientific contribution values.
Editorial Code
No editorial code of conduct observable.
Ownership +0.10
Article 19
Author-owned personal blog; transparent single-perspective source with clear author authority in domain.
Access & Distribution
Access Model +0.05
Article 19 Article 27
Content freely accessible; no paywall or registration barrier; supports information access rights.
Ad/Tracking
No observable advertising or tracking infrastructure on personal blog.
Accessibility
No accessibility statement observable; standard HTML blog structure.
+0.60
Article 19 Freedom of Expression
High A: Freedom to impart scientific knowledge and information about software verification F: Framing AI-generated code as a transparency/accountability problem that requires open-source solutions
Structural
+0.60
Context Modifier
+0.30
SETL
-0.17

Blog is freely accessible without registration, paywall, or censorship mechanisms. Author maintains public identity and contact information. No observable restrictions on readership or distribution. Content indexed by search engines.

+0.55
Article 27 Cultural Participation
High A: Participation in scientific advancement and benefits of formal verification technology F: Framing AI-generated code verification as a knowledge commons that should benefit all
Structural
+0.55
Context Modifier
+0.20
SETL
+0.25

Author has contributed to scientific knowledge through Lean theorem prover (over 12 years development, 20-person team). Blog documents this contribution transparently. Lean ecosystem described as 'over 200,000 formalized theorems and 750 contributors'—a commons structure.

+0.40
Article 29 Duties to Community
Medium A: Community responsibility to maintain integrity and security of shared digital infrastructure
Structural
+0.40
Context Modifier
0.00
SETL
+0.15

Author and Lean community structure themselves as custodians of shared verification infrastructure, with transparent governance and contribution.

+0.35
Article 12 Privacy
Medium A: Privacy protection through supply-chain security in AI-generated code
Structural
+0.35
Context Modifier
0.00
SETL
+0.14

Blog structure supports privacy-preserving access to information (no tracking mentioned, freely accessible).

+0.35
Article 28 Social & International Order
Medium A: Social and international order required to realize verification rights
Structural
+0.35
Context Modifier
0.00
SETL
+0.14

Lean community is international: described as 'Research groups worldwide contribute to the ecosystem.' Open-source model enables cross-border collaboration.

+0.30
Article 25 Standard of Living
Medium A: Social and technical infrastructure for adequate standard of living (through secure, verified software)
Structural
+0.30
Context Modifier
0.00
SETL
+0.13

Blog provides accessible information on how to construct and maintain secure technical infrastructure.

+0.25
Preamble Preamble
Medium A: Advocates for mathematical proof as defense against systemic risk from AI-generated code
Structural
+0.25
Context Modifier
0.00
SETL
+0.19

Blog structure is openly authored and freely accessible; no barriers to information access. Author identity and domain expertise transparent.

+0.20
Article 1 Freedom, Equality, Brotherhood
Medium A: Equal dignity through equal access to verification technology
Structural
+0.20
Context Modifier
0.00
SETL
+0.17

Blog is freely accessible to all without registration or paywall; no discrimination in access to author's ideas or evidence.

+0.20
Article 3 Life, Liberty, Security
Medium A: Right to life and security protected through verified critical infrastructure
Structural
+0.20
Context Modifier
0.00
SETL
+0.11

Author maintains transparent, accountable presence with identified affiliation and expertise.

ND
Article 2 Non-Discrimination

Not applicable to this blog post.

ND
Article 4 No Slavery

Not applicable to this blog post.

ND
Article 5 No Torture

Not applicable to this blog post.

ND
Article 6 Legal Personhood

Not applicable to this blog post.

ND
Article 7 Equality Before Law

Not applicable to this blog post.

ND
Article 8 Right to Remedy

Not applicable to this blog post.

ND
Article 9 No Arbitrary Detention

Not applicable to this blog post.

ND
Article 10 Fair Hearing

Not applicable to this blog post.

ND
Article 11 Presumption of Innocence

Not applicable to this blog post.

ND
Article 13 Freedom of Movement

Not applicable to this blog post.

ND
Article 14 Asylum

Not applicable to this blog post.

ND
Article 15 Nationality

Not applicable to this blog post.

ND
Article 16 Marriage & Family

Not applicable to this blog post.

ND
Article 17 Property

Not applicable to this blog post.

ND
Article 18 Freedom of Thought

Not applicable to this blog post.

ND
Article 20 Assembly & Association

Not applicable to this blog post.

ND
Article 21 Political Participation

Not applicable to this blog post.

ND
Article 22 Social Security

Not applicable to this blog post.

ND
Article 23 Work & Equal Pay

Not applicable to this blog post.

ND
Article 24 Rest & Leisure

Not applicable to this blog post.

ND
Article 26 Education

Not applicable to this blog post.

ND
Article 30 No Destruction of Rights

Not applicable to this blog post.

Psychological Safety
experimental
How safe this content is to read — independent from rights stance. Scores are ordinal (rank-order only). Learn more
PSQ
+0.3
Per-model PSQ
L4P +0.4 L3P +0.5
Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.79 medium claims
Sources
0.8
Evidence
0.8
Uncertainty
0.7
Purpose
0.9
Propaganda Flags
3 manipulative rhetoric techniques found
3 techniques detected
appeal to fear
Content emphasizes catastrophic risks: 'Heartbleed exposed the private communications of millions of users... cost the industry hundreds of millions of dollars to remediate' and 'a single bug in OpenSSL... survived two years of code review.' These examples establish fear of hidden vulnerabilities.
appeal to authority
Repeated citations to authoritative figures: 'Microsoft's CTO predicts that 95% of all code will be AI-generated by 2030,' 'Chris Lattner, the creator of LLVM,' 'Fields Medalist Maryna Viazovska,' and references to Google DeepMind and major tech companies establish credibility.
bandwagon
Page documents broad adoption: 'Every major AI reasoning system that has achieved medal-level performance at the International Mathematical Olympiad used Lean. No competing platform was used by any of them.' This positions Lean adoption as inevitable consensus.
Emotional Tone
Emotional character: positive/negative, intensity, authority
urgent
Valence
-0.1
Arousal
0.7
Dominance
0.7
Transparency
Does the content identify its author and disclose interests?
0.78
✓ Author ✓ Conflicts
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.62 mixed
Reader Agency
0.6
Stakeholder Voice
Whose perspectives are represented in this content?
0.52 4 perspectives
Speaks: individualsinstitutioncorporation
About: governmentworkersmarginalized
Temporal Framing
Is this content looking backward, at the present, or forward?
prospective short term
Geographic Scope
What geographic area does this content cover?
global
United States, China, Europe
Complexity
How accessible is this content to a general audience?
technical high jargon domain specific
Longitudinal 915 HN snapshots · 152 evals
+1 0 −1 HN
Audit Trail 172 entries
2026-03-16 00:13 eval_success Evaluated: Moderate positive (0.48) - -
2026-03-16 00:13 model_divergence Cross-model spread 0.40 exceeds threshold (2 models) - -
2026-03-16 00:13 eval Evaluated by claude-haiku-4-5-20251001: +0.48 (Moderate positive) 16,167 tokens -0.02
2026-03-15 23:38 eval_success Evaluated: Moderate positive (0.50) - -
2026-03-15 23:38 model_divergence Cross-model spread 0.42 exceeds threshold (2 models) - -
2026-03-15 23:38 eval Evaluated by claude-haiku-4-5-20251001: +0.50 (Moderate positive) 16,810 tokens
2026-03-15 23:38 rater_validation_warn Validation warnings for model claude-haiku-4-5-20251001: 17W 48R - -
2026-03-14 17:38 eval_success PSQ evaluated: g-PSQ=0.440 (3 dims) - -
2026-03-14 17:38 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) +0.23
2026-03-14 17:26 eval_success Lite evaluated: Neutral (0.08) - -
2026-03-14 17:26 eval Evaluated by llama-4-scout-wai: +0.08 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-14 17:26 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-08 19:26 eval_success PSQ evaluated: g-PSQ=0.210 (3 dims) - -
2026-03-08 19:26 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-08 19:14 eval_success PSQ evaluated: g-PSQ=0.481 (3 dims) - -
2026-03-08 19:14 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) 0.00
2026-03-08 18:56 eval_success Lite evaluated: Neutral (0.08) - -
2026-03-08 18:56 eval Evaluated by llama-4-scout-wai: +0.08 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-08 18:56 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-08 18:47 eval_success Lite evaluated: Neutral (0.08) - -
2026-03-08 18:47 eval Evaluated by llama-3.3-70b-wai: +0.08 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-08 18:47 rater_validation_warn Lite validation warnings for model llama-3.3-70b-wai: 1W 0R - -
2026-03-08 16:37 eval_success PSQ evaluated: g-PSQ=0.210 (3 dims) - -
2026-03-08 16:37 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-08 16:25 eval_success PSQ evaluated: g-PSQ=0.481 (3 dims) - -
2026-03-08 16:25 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) 0.00
2026-03-08 16:05 eval_success Lite evaluated: Neutral (0.08) - -
2026-03-08 16:05 eval Evaluated by llama-4-scout-wai: +0.08 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-08 16:05 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-08 16:00 eval_success Lite evaluated: Neutral (0.08) - -
2026-03-08 16:00 eval Evaluated by llama-4-scout-wai: +0.08 (Neutral) +0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-08 16:00 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-08 15:50 eval Evaluated by llama-3.3-70b-wai: +0.08 (Neutral) -0.02
reasoning
Technical blog post on AI-generated software verification
2026-03-07 19:16 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-07 18:34 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) 0.00
2026-03-07 17:47 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-07 17:41 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) -0.23
2026-03-07 17:29 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) +0.03
2026-03-06 22:45 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) +0.23
2026-03-06 22:33 eval Evaluated by llama-3.3-70b-wai-psq: +0.45 (Moderate positive) 0.00
2026-03-06 17:54 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-06 17:39 eval Evaluated by llama-3.3-70b-wai-psq: +0.45 (Moderate positive) -0.03
2026-03-06 04:34 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) 0.00
2026-03-05 22:44 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-05 17:28 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) -0.00
2026-03-05 17:23 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) +0.00
2026-03-05 08:30 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-05 08:25 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive) 0.00
2026-03-05 07:27 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive) 0.00
2026-03-05 04:53 eval Evaluated by llama-4-scout-wai-psq: +0.21 (Mild positive)
2026-03-05 04:37 eval Evaluated by llama-3.3-70b-wai-psq: +0.48 (Moderate positive)
2026-03-05 04:16 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 04:03 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-05 03:41 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 03:19 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-05 02:59 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 02:44 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-05 02:17 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 02:12 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 02:05 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-05 01:35 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 01:26 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-05 00:56 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 00:47 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-05 00:12 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-05 00:10 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 23:35 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 23:33 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 23:30 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 22:53 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 22:49 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 22:48 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 22:44 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 22:06 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 22:05 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 21:32 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 21:30 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 21:27 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 21:24 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 20:46 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 20:41 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 20:36 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 20:12 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 19:56 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 19:22 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 19:18 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 19:07 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 18:16 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 18:06 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 16:49 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 16:44 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 16:39 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 16:05 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 15:56 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 15:24 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 15:19 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 15:18 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 14:40 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 14:36 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 14:02 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 13:57 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 13:24 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 13:20 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 12:49 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 12:43 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 12:38 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 12:10 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 12:00 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 11:55 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 11:30 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 11:07 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 10:48 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 10:24 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 10:13 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 10:08 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 09:50 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 09:27 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) -0.10
reasoning
Technical blog post on AI-generated software verification
2026-03-04 09:12 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 08:51 eval Evaluated by llama-3.3-70b-wai: +0.20 (Mild positive) +0.10
reasoning
Technical blog post on AI-generated software verification
2026-03-04 08:28 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 08:15 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 07:59 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) +0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 07:47 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) +0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 07:20 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 07:08 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 06:11 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 05:59 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 05:54 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 05:34 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 05:14 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 05:09 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 04:46 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) -0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 04:27 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 04:22 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 04:16 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 04:07 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) +0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 03:48 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 03:29 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 03:12 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 02:57 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 02:52 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 02:32 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 02:10 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) -0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 01:57 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) +0.04
reasoning
Technical blog post on AI-generated software verification
2026-03-04 01:30 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) +0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 01:18 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) -0.04
reasoning
Technical blog post on AI-generated software verification
2026-03-04 00:53 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 00:43 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-04 00:12 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-04 00:08 eval Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) +0.04
reasoning
Technical blog post on AI-generated software verification
2026-03-03 23:32 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 23:27 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 23:26 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 23:21 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 22:51 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 22:50 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 22:44 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 22:24 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 22:16 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 21:26 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 21:17 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 20:55 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 20:44 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 20:20 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) 0.00
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 20:15 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) -0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 20:12 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 19:33 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral) +0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 19:31 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 18:48 eval Evaluated by llama-4-scout-wai: +0.04 (Neutral) -0.02
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 18:47 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral) 0.00
reasoning
Technical blog post on AI-generated software verification
2026-03-03 18:11 eval Evaluated by llama-4-scout-wai: +0.06 (Neutral)
reasoning
Technical blog post discussing AI-generated software and verification, with no explicit human rights discussion.
2026-03-03 18:10 eval Evaluated by llama-3.3-70b-wai: +0.06 (Neutral)
reasoning
Technical blog post on AI-generated software verification