+0.39 We are changing our developer productivity experiment design (metr.org S:+0.34 )
87 points by ej88 5 days ago | 60 comments on HN | Moderate positive Contested Editorial · v3.7 · 2026-02-26 04:45:46 0
Summary Scientific Integrity & Research Transparency Advocates
This technical blog post documents METR's redesign of a developer productivity study following discovery of significant methodological flaws. The content exemplifies scientific transparency and intellectual honesty by openly acknowledging selection biases, measurement unreliability, and data limitations that undermine conclusions—prioritizing epistemic integrity over organizational credibility. The article supports human rights frameworks through open data access, transparent reporting, and commitment to building reliable evidence infrastructure for informed governance of AI systems.
Article Heatmap
Preamble: +0.30 — Preamble P Article 1: ND — Freedom, Equality, Brotherhood Article 1: No Data — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: ND — Life, Liberty, Security Article 3: No Data — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: ND — Equality Before Law Article 7: No Data — Equality Before Law 7 Article 8: ND — Right to Remedy Article 8: No Data — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: ND — Privacy Article 12: No Data — Privacy 12 Article 13: ND — Freedom of Movement Article 13: No Data — Freedom of Movement 13 Article 14: ND — Asylum Article 14: No Data — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: ND — Marriage & Family Article 16: No Data — Marriage & Family 16 Article 17: ND — Property Article 17: No Data — Property 17 Article 18: ND — Freedom of Thought Article 18: No Data — Freedom of Thought 18 Article 19: +0.78 — Freedom of Expression 19 Article 20: ND — Assembly & Association Article 20: No Data — Assembly & Association 20 Article 21: ND — Political Participation Article 21: No Data — Political Participation 21 Article 22: ND — Social Security Article 22: No Data — Social Security 22 Article 23: +0.31 — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: ND — Standard of Living Article 25: No Data — Standard of Living 25 Article 26: +0.26 — Education 26 Article 27: +0.63 — Cultural Participation 27 Article 28: +0.58 — Social & International Order 28 Article 29: ND — Duties to Community Article 29: No Data — Duties to Community 29 Article 30: ND — No Destruction of Rights Article 30: No Data — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean +0.39 Structural Mean +0.34
Weighted Mean +0.50 Unweighted Mean +0.48
Max +0.78 Article 19 Min +0.26 Article 26
Signal 6 No Data 25
Volatility 0.20 (Medium)
Negative 0 Channels E: 0.6 S: 0.4
SETL +0.16 Editorial-dominant
FW Ratio 58% 21 facts · 15 inferences
Evidence 15% coverage
2H 4M 1L 24 ND
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.30 (1 articles) Security: 0.00 (0 articles) Legal: 0.00 (0 articles) Privacy & Movement: 0.00 (0 articles) Personal: 0.00 (0 articles) Expression: 0.78 (1 articles) Economic & Social: 0.31 (1 articles) Cultural: 0.45 (2 articles) Order & Duties: 0.58 (1 articles)
HN Discussion 11 top-level · 22 replies
ej88 2026-02-24 20:07 UTC link
Really interesting updates to their 2025 experiment.

Repeat devs from the original experiment went from 0-40% slowdown to now -10-40% speedup - and METR estimates this as a 'lower-bound'

more devs saying they dont even want to do 50% of their work without AI, even for 50/hr

30-50% of devs decided not to submit certain tasks without AI, missing the tasks with the highest uplift

it also seems like there is a skill gap - repeat devs from the first study are more productive with ai tools than newly recruited ones with variable experience

overall it seems like the high preference for devs to use AI is actually hurting METR's ability to judge their speedup, due to a refusal to do tasks without it. imo this is indirectly quite supportive for ai coding's productivity claims.

softwaredoug 2026-02-24 20:21 UTC link
I'm a bit perplexed by the developer selection effects.

I get that developers want to use AI. But are they also claiming there's not still a no/low-AI population of developers? Or that their means of selection don't find these developers?

Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers?

sgillen 2026-02-24 22:12 UTC link
This is very interesting because I see a lot of AI detractors point to the original study as proof that AI is overhyped and nothing to worry about. In this new study the findings are essentially reversed (20% slowdown to 20% speedup).
arctic-true 2026-02-24 22:57 UTC link
Those developer quotes are tough to read. Rate limits are going to hit like a truck when the labs eventually need to make a profit.
camgunz 2026-02-24 23:00 UTC link
Unless this measures the entire SDLC longitudinally (like say, over a year) I'm not interested. I too can tell Claude Code to do things all day every day, but unless we have data on the defect rate it doesn't matter at all.
Bnjoroge 2026-02-24 23:12 UTC link
never been a better time to be a swe who doesnt or significantly limits the use of AI agents
atleastoptimal 2026-02-24 23:17 UTC link
It's kind of funny that METR is known primarily for both the most bearish study on AI progress (the original 20% slowdown one), and the most bullish one on AI progress (the long-task horizon study showing exponential increase in duration of tasks AI models can accomplish with respect to date of release).

In either case, it seems people ended up bolstering their preexisting views on AI based on whichever study most affirmed them (for the former, that AI coding models didn't actually help and created a mirage of productivity that required more work to fix than was worth it, the latter that AI models were improving at an exponential rate and will invariably eclipse SWE's in all tasks in a deterministic amount of time.)

I think the truth is somewhere in the middle. Just anecdotally we've seen multi-million dollar fortunes being minted by small teams developing using 90% AI-assisted coding. Anthropic claims they solely use agents to code and don't modify any code manually.

daxfohl 2026-02-24 23:41 UTC link
"I don't want to do this without AI" sounds like we're already well into the brain atrophy stage of this. Now what? (I'd think about it myself but....)
keeda 2026-02-25 00:26 UTC link
> When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI.

In fact, one of the developers in the original study later revealed on Twitter that he had already done exactly that during the study, i.e. filtered out tasks he prefered not to do without AI: https://xcancel.com/ruben_bloom/status/1943536052037390531

While this was only one developer (that we know of), given the N was 16 and he seems to have been one of the more AI-experienced devs, this could have had a non-trivial effect on the results.

The original study gets a lot of air-time from AI naysayers, let's see how much this follow-up gets ;-)

tonymet 2026-02-25 01:37 UTC link
> "AI tools lead to worse productivity"

> The subjects are using ChatGPT 2.5 and copy-pasting code.

The reason AI hype seems to be so bipolar is that "AI" isn't one thing. Hundreds of models, dozens of tools. And to get something done well, a seasoned engineer needs to master half a dozen at a time.

hyfgfh 2026-02-25 04:17 UTC link
What worried me is that LLMs are becoming a crutches for overworked engineers. But instead of reducing the workload it has also increased the expectation and consequently more aggressive deadlines, making it all worst overall
sgillen 2026-02-24 22:06 UTC link
The study was designed to have devs who are comfortable with AI perform 50% of tasks with AI and 50% without. So the problem is the population of "Developers who use AI regularly but are willing to do tasks without AI" is shrinking.

>> Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers?

The developer sample size was small (16 people in the original study) and the task sample size is larger (~250 tasks). I think the worry is variance in developer productivity would totally wash out any signal.

selridge 2026-02-24 22:10 UTC link
Here is my read:

Developers are refusing to complete the survey or selecting themselves out because they (apparently) don’t want to complete the non-AI task.

The also saw selection effects from a large reduction in the pay for the study (which is an unfortunate confounder here), 150/hr -> 50/hr.

They guess this makes their estimates lower bounds, but the selection effect is complicated (which they acknowledge).

Overall this is a hard problem for them in the current state. It will be challenging to produce convincing year over year analysis under these conditions.

ej88 2026-02-24 22:59 UTC link
not enough people look at the slope, just the coords
selridge 2026-02-24 23:04 UTC link
I think their old findings were hard to treat as gospel just due to the kind of comparison + the sample, but this new result is probably much noisier.

It’s hard to make reliable, directional assumptions about the kind of self-selection and refusal they saw, even without worrying about the reward dropping 66%.

simonw 2026-02-24 23:17 UTC link
At this point the AI labs would pretty much have to form an illegal price fixing cartel in order to jack the prices up, they've been competing to drive down prices for so long.

They'd have to get the Chinese AI labs to go along with that price fixing too.

simonw 2026-02-24 23:18 UTC link
AI detractors loved that previous study so much. It seems to have been brought up in the majority of conversations about AI productivity over the past six months.

(Notable to me was how few other studies they cited, which I think is because studies showing AI productivity loss are quite uncommon.)

roxolotl 2026-02-25 00:00 UTC link
The finding of the first study was people cannot judge their performance with these tools. So I don’t think the lack of individuals not willing to work without them is indicative of productivity improvements. I think it’s indicative of them being enjoyable to use.
marcosdumay 2026-02-25 00:02 UTC link
"I avoid issues like AI can finish things in just 2 hours, but I have to spend 20 hours. I will feel so painful if the task is decided as AI-disallowed."

What really doesn't sound like the results they got where developers may get up to twice as productive on the best scenario.

There's surely something scary there. And the lack of people ambivalent about AI isn't a certain indication it's well accepted as they think, it can just as easily be caused by polarization.

bitwize 2026-02-25 00:04 UTC link
AI will soon be an intrinsic part of the job. Now what? "Get your thumb out of your ass and learn [how to use AI]." —Eric S. Raymond
sjaiisba 2026-02-25 00:09 UTC link
> Anthropic claims they solely use agents to code and don't modify any code manually.

Have you used CC? It shows. They did not make their fortune off this, and it’s at least lost me a customer because of how sloppy it is. The model is good, and it’s why they have to gate access to it. I’d much rather use a different harness.

I do think you’re on to something though. As societal wealth further concentrates among the few, we’re going to get more and more slop for the rest of us because we have no money (relatively speaking). Agentic coding is here to stay because we as a society are forced more and more slop. It’s already rampant, this is just automating it.

falcor84 2026-02-25 00:33 UTC link
I'm pretty sure that this was exactly the response to the first generation of devs who insisted on coding with a terminal instead of submitting punch cards like "real programmers".
falcor84 2026-02-25 00:36 UTC link
Do any of those companies collect and share data on their defect rates to give you a baseline to compare against?
sjaiisba 2026-02-25 00:41 UTC link
> 3. Regarding me specifically, I work on the LessWrong codebase which is technically open-source. I feel like calling myself an "open-source developer" has the wrong connotations, and makes it more sound like I contribute to a highly-used Python library or something as an upper-tier developer which I'm not

That’s very interesting! This kinda matches what I see at work:

- low performers love it. it really does make them output more (which includes bugs, etc. it’s causing some contention that’s yet to be resolved)

- some high performers love it. these were guys who are more into greenfield stuff and ok with 90% good. very smart, but just not interested in anything outside of going fast

- everyone else seems to be finding use out of it, but reviews are painful

fxwin 2026-02-25 00:45 UTC link
fwiw i think the interesting part about the original study wasn't so much the slowdowm part, but the discrepancy between perceived and measured speedup/slowdown (which is the part i used to bring up frequently when talking to other devs)
Krei-se 2026-02-25 01:18 UTC link
great to see that wisdom and sanity is still found on yc
mock-possum 2026-02-25 03:08 UTC link
I don’t want to do work around the house without a fully charged battery for my ryobi either. I don’t want to go on a groccery run without my car. Using tools is not brain atrophy
pgwhalen 2026-02-25 03:24 UTC link
I really am quite in awe of Claude Code recently, so definitely not a naysayer, but this is a really important point. It’s so easy to create code, but am I shipping that much to prod than I used to? A bit.

Obviously this highly depends on your company and your setup and risk tolerance and what not.

SpicyLemonZest 2026-02-25 03:52 UTC link
As one of the naysayers who talked a lot about the original study, I enthusiastically endorse any attempt at all to actually measure AI productivity. An increase from 20% slowdown to 20% speedup over the past year seems broadly consistent with my understanding of how things have gone. I think I remain classified as a "naysayer", though, because the "booster" case has gone from "I'm multiple times more productive" to "I never have to look at code my AI agents just handle everything" over the same period.
azan_ 2026-02-25 04:57 UTC link
Keep in mind that they make large profit on inference. Not enough to make up for losses on training but it won’t be a problem for Chinese labs which will just steal their weights.
judahmeek 2026-02-25 05:28 UTC link
There are some people participating in the study who will fire & forget instructions to Claude/Codex running in parallel worktrees, but would really struggle if they were required to work on their project without AI assistance.

So while some study participants probably are seeing an actual speedup because of the discipline with which they manage their codebase's structure & documentation, other study participants are actually getting worse at non-AI coding.

...and METR's study can't tell which is which because METR's study isn't using any sort of codebase quality metrics for grounding.

vessenes 2026-02-25 05:38 UTC link
For the thousandth time - they. make. a. profit. Inference margin is over 60%, today.

They are spending that money training ever-larger models, so they are cashflow negative, but under almost any sane GAAP treatment that does not allow one to write down all R&D upfront (capital costs of model training), they are profitable.

Should this matter to you? Only if you're making financial decisions that assume that somehow one day the "jig will be up" - i.e. please don't short these stocks when they float, or at least do so very judiciously.

vessenes 2026-02-25 05:40 UTC link
I like this. I've bought a lot of CnC flatpack furniture in my day, and also employed a number of excellent cabinet makers. Room for both.
Editorial Channel
What the content says
+0.60
Article 19 Freedom of Expression
High Advocacy Framing
Editorial
+0.60
SETL
+0.17

Content exemplifies freedom of expression through transparent reporting of research findings, methodological critiques, and honest acknowledgment of data limitations. Publication of contradictory or unfavorable results demonstrates editorial commitment to disclosure and intellectual integrity.

+0.45
Article 28 Social & International Order
High Advocacy Practice
Editorial
+0.45
SETL
+0.15

Content exemplifies effort to establish social order enabling human rights through rigorous research on AI capabilities. Study attempts to create reliable evidence base for informed decision-making about AI governance, directly supporting Article 28's right to social and international order enabling rights.

+0.40
Article 27 Cultural Participation
Medium Advocacy Practice
Editorial
+0.40
SETL
+0.14

Content demonstrates engagement with cultural and scientific advancement through research methodology designed to measure AI capabilities and impact. Commitment to open science and transparent reporting supports Article 27's right to share in scientific progress.

+0.35
Article 23 Work & Equal Pay
Medium Practice
Editorial
+0.35
SETL
+0.19

Content addresses working conditions through study design: researchers paid developers for participation ($50-150/hour) and allowed task selection. However, the article identifies how lower pay rates contributed to selection bias, implying awareness of work compensation's importance.

+0.30
Preamble Preamble
Medium Advocacy
Editorial
+0.30
SETL
ND

Content implicitly affirms values of scientific integrity, transparency about methodological flaws, and commitment to understanding AI's societal impacts—aligned with Preamble's emphasis on reason and justice as foundations for human rights.

+0.25
Article 26 Education
Medium Practice
Editorial
+0.25
SETL
+0.16

Content indirectly relates to education through focus on developer technical skill and knowledge-building. Study methodology emphasizes understanding and documenting AI capabilities, relevant to educational advancement.

ND
Article 1 Freedom, Equality, Brotherhood
null

Content does not directly engage with equal dignity or inherent rights.

ND
Article 2 Non-Discrimination
null

Content does not address discrimination or distinctions based on status.

ND
Article 3 Life, Liberty, Security
null

Content does not directly address right to life, liberty, or security.

ND
Article 4 No Slavery
null

Content does not engage with slavery or servitude.

ND
Article 5 No Torture
null

Content does not address torture or cruel treatment.

ND
Article 6 Legal Personhood
null

Content does not engage with right to recognition as a person.

ND
Article 7 Equality Before Law
null

Content does not address equality before law.

ND
Article 8 Right to Remedy
null

Content does not engage with right to legal remedy.

ND
Article 9 No Arbitrary Detention
null

Content does not address arbitrary arrest or detention.

ND
Article 10 Fair Hearing
null

Content does not engage with right to fair and public trial.

ND
Article 11 Presumption of Innocence
null

Content does not address criminal liability or retrospective law.

ND
Article 12 Privacy
null

Content does not engage with privacy, family, or home.

ND
Article 13 Freedom of Movement
null

Content does not address freedom of movement.

ND
Article 14 Asylum
null

Content does not engage with asylum or refuge.

ND
Article 15 Nationality
null

Content does not address right to nationality.

ND
Article 16 Marriage & Family
null

Content does not engage with marriage or family rights.

ND
Article 17 Property
null

Content does not address property rights.

ND
Article 18 Freedom of Thought
null

Content does not engage with freedom of thought or conscience.

ND
Article 20 Assembly & Association
Low Practice

Content does not directly address assembly or association.

ND
Article 21 Political Participation
null

Content does not directly address participation in government.

ND
Article 22 Social Security
null

Content does not engage with social security or welfare rights.

ND
Article 24 Rest & Leisure
null

Content does not address rest or leisure.

ND
Article 25 Standard of Living
null

Content does not directly engage with food, housing, or healthcare rights.

ND
Article 29 Duties to Community
null

Content does not directly address duties or limitations on rights exercise.

ND
Article 30 No Destruction of Rights
null

Content does not engage with prevention of rights destruction.

Structural Channel
What the site does
+0.55
Article 19 Freedom of Expression
High Advocacy Framing
Structural
+0.55
Context Modifier
+0.20
SETL
+0.17

Website structure enables open access to research data and full publication record; datasets are publicly available, supporting reader access to underlying evidence for independent verification.

+0.40
Article 28 Social & International Order
High Advocacy Practice
Structural
+0.40
Context Modifier
+0.15
SETL
+0.15

Organization operates as nonprofit research entity; open-access publishing model supports collective knowledge infrastructure. Transparent documentation of limitations enables informed public discourse.

+0.35
Article 27 Cultural Participation
Medium Advocacy Practice
Structural
+0.35
Context Modifier
+0.25
SETL
+0.14

Organization structure as nonprofit research entity and open data access support participation in scientific advancement; website provides research outputs and datasets freely.

+0.25
Article 23 Work & Equal Pay
Medium Practice
Structural
+0.25
Context Modifier
0.00
SETL
+0.19

Study structure involves researcher-developer payment relationships; article documents how compensation levels affect study participation, suggesting structural engagement with labor conditions.

+0.15
Article 26 Education
Medium Practice
Structural
+0.15
Context Modifier
+0.05
SETL
+0.16

Website includes 'Research' and 'Notes' sections suggesting knowledge dissemination; full datasets publicly available supports educational access, though article does not explicitly address education rights.

ND
Preamble Preamble
Medium Advocacy

Content implicitly affirms values of scientific integrity, transparency about methodological flaws, and commitment to understanding AI's societal impacts—aligned with Preamble's emphasis on reason and justice as foundations for human rights.

ND
Article 1 Freedom, Equality, Brotherhood
null

No structural signals observable regarding equal treatment.

ND
Article 2 Non-Discrimination
null

Content does not address discrimination or distinctions based on status.

ND
Article 3 Life, Liberty, Security
null

Content does not directly address right to life, liberty, or security.

ND
Article 4 No Slavery
null

Content does not engage with slavery or servitude.

ND
Article 5 No Torture
null

Content does not address torture or cruel treatment.

ND
Article 6 Legal Personhood
null

Content does not engage with right to recognition as a person.

ND
Article 7 Equality Before Law
null

Content does not address equality before law.

ND
Article 8 Right to Remedy
null

Content does not engage with right to legal remedy.

ND
Article 9 No Arbitrary Detention
null

Content does not address arbitrary arrest or detention.

ND
Article 10 Fair Hearing
null

Content does not engage with right to fair and public trial.

ND
Article 11 Presumption of Innocence
null

Content does not address criminal liability or retrospective law.

ND
Article 12 Privacy
null

Content does not engage with privacy, family, or home.

ND
Article 13 Freedom of Movement
null

Content does not address freedom of movement.

ND
Article 14 Asylum
null

Content does not engage with asylum or refuge.

ND
Article 15 Nationality
null

Content does not address right to nationality.

ND
Article 16 Marriage & Family
null

Content does not engage with marriage or family rights.

ND
Article 17 Property
null

Content does not address property rights.

ND
Article 18 Freedom of Thought
null

Content does not engage with freedom of thought or conscience.

ND
Article 20 Assembly & Association
Low Practice

Website includes a 'Donate' and 'Careers' section, suggesting organizational structure that may facilitate participation and association, though this is minimal structural evidence.

ND
Article 21 Political Participation
null

Content does not directly address participation in government.

ND
Article 22 Social Security
null

Content does not engage with social security or welfare rights.

ND
Article 24 Rest & Leisure
null

Content does not address rest or leisure.

ND
Article 25 Standard of Living
null

Content does not directly engage with food, housing, or healthcare rights.

ND
Article 29 Duties to Community
null

Content does not directly address duties or limitations on rights exercise.

ND
Article 30 No Destruction of Rights
null

Content does not engage with prevention of rights destruction.

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.85 low claims
Sources
0.8
Evidence
0.8
Uncertainty
0.9
Purpose
0.8
Propaganda Flags
No manipulative rhetoric detected
0 techniques detected
Emotional Tone
Emotional character: positive/negative, intensity, authority
measured
Valence
+0.1
Arousal
0.3
Dominance
0.4
Transparency
Does the content identify its author and disclose interests?
0.50
✓ Author ✗ Conflicts ✗ Funding
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.58 mixed
Reader Agency
0.6
Stakeholder Voice
Whose perspectives are represented in this content?
0.62 3 perspectives
Speaks: institutionindividuals
About: workersinstitution
Temporal Framing
Is this content looking backward, at the present, or forward?
present short term
Geographic Scope
What geographic area does this content cover?
global
GitHub
Complexity
How accessible is this content to a general audience?
technical medium jargon domain specific
Longitudinal · 7 evals
+1 0 −1 HN
Audit Trail 27 entries
2026-02-28 14:27 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 14:27 model_divergence Cross-model spread 0.49 exceeds threshold (4 models) - -
2026-02-28 14:27 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Technical blog post
2026-02-28 14:21 model_divergence Cross-model spread 0.49 exceeds threshold (4 models) - -
2026-02-28 14:21 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 14:21 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
Technical blog post
2026-02-26 22:40 eval_success Light evaluated: Neutral (0.00) - -
2026-02-26 22:40 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-26 20:07 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 20:04 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:03 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:02 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 20:02 eval_failure Evaluation failed: Error: Unknown model in registry: llama-4-scout-wai - -
2026-02-26 20:02 eval_failure Evaluation failed: Error: Unknown model in registry: llama-4-scout-wai - -
2026-02-26 20:02 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:26 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 17:24 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:23 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 08:59 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 08:56 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 08:55 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 08:55 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 08:55 dlq Dead-lettered after 1 attempts: We are changing our developer productivity experiment design - -
2026-02-26 08:27 eval Evaluated by deepseek-v3.2: +0.33 (Neutral) 11,532 tokens
2026-02-26 04:45 eval Evaluated by claude-haiku-4-5-20251001: +0.49 (Moderate positive) 13,919 tokens +0.07
2026-02-26 04:20 eval Evaluated by claude-haiku-4-5-20251001: +0.42 (Moderate positive) 13,688 tokens -0.02
2026-02-26 03:33 eval Evaluated by claude-haiku-4-5-20251001: +0.44 (Moderate positive) 14,745 tokens