-0.01 The First Fully General Computer Action Model (si.inc S:0.00 )
345 points by nee1r 7 days ago | 80 comments on HN | Neutral Editorial · v3.7 · 2026-02-26 00:31:01 0
Summary Technology Governance & Labor Rights Acknowledges
This technical product announcement for FDM-1, a foundation model for autonomous computer use, demonstrates mild positive engagement with free expression and information access (Article 19) through public research sharing and internet-scale data advocacy, but exhibits significant negative signals on privacy (Article 12), accessibility (Article 2), labor rights (Article 23), and social governance (Articles 28-29). The post frames technological capability advancement without addressing privacy consent for training data, worker displacement concerns, or institutional frameworks needed to govern autonomous systems deployment in high-stakes domains.
Article Heatmap
Preamble: -0.06 — Preamble P Article 1: 0.00 — Freedom, Equality, Brotherhood 1 Article 2: -0.30 — Non-Discrimination 2 Article 3: 0.00 — Life, Liberty, Security 3 Article 4: 0.00 — No Slavery 4 Article 5: 0.00 — No Torture 5 Article 6: 0.00 — Legal Personhood 6 Article 7: 0.00 — Equality Before Law 7 Article 8: 0.00 — Right to Remedy 8 Article 9: 0.00 — No Arbitrary Detention 9 Article 10: 0.00 — Fair Hearing 10 Article 11: 0.00 — Presumption of Innocence 11 Article 12: -0.19 — Privacy 12 Article 13: +0.12 — Freedom of Movement 13 Article 14: 0.00 — Asylum 14 Article 15: 0.00 — Nationality 15 Article 16: 0.00 — Marriage & Family 16 Article 17: 0.00 — Property 17 Article 18: 0.00 — Freedom of Thought 18 Article 19: +0.41 — Freedom of Expression 19 Article 20: 0.00 — Assembly & Association 20 Article 21: 0.00 — Political Participation 21 Article 22: 0.00 — Social Security 22 Article 23: -0.14 — Work & Equal Pay 23 Article 24: 0.00 — Rest & Leisure 24 Article 25: 0.00 — Standard of Living 25 Article 26: -0.16 — Education 26 Article 27: +0.26 — Cultural Participation 27 Article 28: -0.12 — Social & International Order 28 Article 29: -0.06 — Duties to Community 29 Article 30: 0.00 — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean -0.01 Structural Mean 0.00
Weighted Mean -0.01 Unweighted Mean -0.01
Max +0.41 Article 19 Min -0.30 Article 2
Signal 31 No Data 0
Volatility 0.12 (Medium)
Negative 7 Channels E: 0.6 S: 0.4
SETL -0.04 Structural-dominant
FW Ratio 57% 41 facts · 31 inferences
Evidence 33% coverage
10M 21L
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: -0.12 (3 articles) Security: 0.00 (3 articles) Legal: 0.00 (6 articles) Privacy & Movement: -0.02 (4 articles) Personal: 0.00 (3 articles) Expression: 0.14 (3 articles) Economic & Social: -0.04 (4 articles) Cultural: 0.05 (2 articles) Order & Duties: -0.06 (3 articles)
HN Discussion 19 top-level · 14 replies
rio_popper 2026-02-23 17:06 UTC link
Curious about the masked diffusion IDM choice. They mention CTC loss and cross-entropy both underperformed — I'd love to see ablations on that. The claim that typos were "extremely common" with non-causal cross-entropy is interesting but hand-wavy without numbers.
ennucore 2026-02-23 17:06 UTC link
The car thing is very impressive By the way, do you have plans to handle the computer’s audio output?
ennucore 2026-02-23 17:08 UTC link
How do you tokenize the mouse inputs?
nee1r 2026-02-23 17:10 UTC link
Hey guys! I’m Neel, been holed up in our south park office for the past year working on model training. excited to share our research!

This is a preview of a very different type of computer use model—we train on the internet. Specifically we have 11 million hours of computer video stored on our storage cluster (previously shared https://news.ycombinator.com/item?id=45438496 !) and the model can work in 30 FPS. Since we match the fundamental form factor of computer-use, we can get our model to do CAD, browse websites, and even drive a car using arrow keys. I’m super excited to see what our model can do as we scale more, it's a fun frontier to work on (not language models :) ).

The team and I will be online responding to the comments, so drop any questions.

clemvonstengel 2026-02-23 17:13 UTC link
I rly liked the point about ctrl-c only being able to be labelled retrocausally. I do think that with enough past context you should be able to know what was copied - in some sense the past does encode the future - but also an agentic decision is precisely the kind where the future is more informative than the past for reconstructing that decision.

It does make me wonder if you should have the inverse dynamics model split into specifically retrocausal and causal. You kind of do this already with the inverse and forward dynamics model, but the idea of a model that knows only about the future training in a feedback loop with a model that knows only about the past is kind of interesting.

I think you could just do a clever masking regime in your diffusion model to achieve the same effect without a whole architecture change.

ClaireBookworm 2026-02-23 17:23 UTC link
What sort of fine tuning data was needed to allow the model to self-drive? One hour of video of someone driving, or extra labeling?
kdrag0n 2026-02-23 17:34 UTC link
what tasks can the model do out of the box? was each of the examples a different fine tuned model?
aakashks 2026-02-23 18:42 UTC link
The video compression is very cool. And the small tricks like binning the mouse movements.

Wonder how much data is generalizable across different UIs? ie how good will the model be at using Figma if it’s never seen it before but has seen a lot of Photoshop

alyxya 2026-02-24 02:24 UTC link
This looks extremely impressive, really deserves more attention here.

Are the inverse dynamics and forward dynamics models trained separately? It sounds like if the inverse dynamics model is meant to extrapolate more training data, then perhaps all that means is it takes very little data to generalize directly with the forward dynamics model assuming the right architecture.

152334H 2026-02-24 14:22 UTC link
holy crap, this is so good. How did it get buried?
Obscura- 2026-02-25 22:01 UTC link
Amazing!
piva00 2026-02-25 22:27 UTC link
Just wanted to say: this is might impressive research.

Really interesting breakdown, proper nerdsniped into this, thanks for the refreshing AI news outside of language models :)

wasmainiac 2026-02-25 22:28 UTC link
Can it defeat captchas?
sp1nningaway 2026-02-25 22:28 UTC link
May I suggest a driving demo in a parking lot with a mannequin instead of a real world video where it drives way too close to a pedestrian?

Otherwise, very cool and exciting!

cs702 2026-02-25 22:37 UTC link
At first glance, this looks incredible to me. The authors train one model on 40K hours of computer-use video, previously labeled by contractors with keyboard and mouse actions, then use that model, in effect, to label 11M hours of computer-use video, which they use to train the computer-action model. The key advance is in compression. Quoting from the OP:

> [previous models] burn a million tokens to understand just one minute of 30 FPS computer data. Our video encoder encodes nearly 2 hours of video in the same number of tokens—that’s 50x more token-efficient than the previous state-of-the-art and 100x more token-efficient than OpenAI’s encoder.

While I was already aware that there are people working on new, more efficient "world models," this is the first one I've seen in action. I'm a bit in shock at how good it is, quite frankly.

I've added the OP, as well as a related 2018 paper on Behavioral Cloning from Obervation (BCO) to my reading list.[a] So far, I've only skimmed the 2018 paper, but it's already evident that it's well-written. I'm no expert in deep RL, and I can understand it. BTW, "Behavioral Cloning from Obervation" is a really good name, with an easy-to-remember acronym.

Thank you for sharing this on HN.

[a] https://arxiv.org/abs/1805.01954

akoboldfrying 2026-02-25 22:50 UTC link
My tech-informed but ML-ignorant take: This will soon be the biggest thing since ChatGPT.
bitwize 2026-02-25 23:30 UTC link
Looks like it's playing the special stages from Knuckles' Chaotix?
LorenDB 2026-02-25 23:42 UTC link
Nice that it can drive a car, but you could just use openpilot.
vessenes 2026-02-25 23:45 UTC link
dammmmmmnnnn - lots to like here. I'm impressed with the 80,000 parallel website fuzzing desktops. And the 30hz (everything). Amazing.
g413n 2026-02-23 17:12 UTC link
yeah we've done audio work in the past so we'll def merge the recipes at some point, long term should have full io that a human has (except maybe not generating video for video calls that seems a bit much)
nee1r 2026-02-23 17:14 UTC link
good question! we use exponential binning (map the mouse movements onto a plane with exponentially increasing tick marks https://si.inc/fdm1/exponential_binning.webp) but tried a bunch of other methods (linear creates too many tokens for the model to learn well). Polar coordinates seem like a better solution but empirically didn't work well because the tokens got too coarse too fast.
g413n 2026-02-23 17:14 UTC link
we do exponential binning but fwiw I think we can do way better just hasn't been the main research area initially
nee1r 2026-02-23 17:15 UTC link
the main chain of experiments was trying causal => non-causal => non-causal with ctc and CE. i think a good intuition here is that you need a generative approach fundamentally because there definitely are multiple correct IDM labels.
nee1r 2026-02-23 17:27 UTC link
i actually drove the car (with arrow keys) around south park for around ~45 minutes as finetuning data, no extra labelling other than that. think the car line graph is super cool because you actually see the videegame prior working
g413n 2026-02-23 17:29 UTC link
yeah we actually had some wacky ideas with ctc + a reverse-causal mask but diffusion does just make it all a bit more simple
g413n 2026-02-23 17:29 UTC link
relevant note is that we finetuned by having the human also use arrow keys which keeps it in-distribution but also slower to collect
g413n 2026-02-23 17:39 UTC link
it's a pretty general policy but this is all super early, it's great at exploring websites so fuzzing was easy, for CAD it has good enough base rates with the few-shot prompt when we do the repetitive stuff, and we gave it checkpoints on each step, the other stuff in the mosaic are just some of our favorite clips from internal evals
nee1r 2026-02-23 18:45 UTC link
this is honestly an issue for the inverse dynamics (for app specific shortcuts etc.) but for general UI learning we still see promising eval trends
nee1r 2026-02-24 17:46 UTC link
real
nee1r 2026-02-24 17:48 UTC link
thanks! the inverse dynamics model is trained first on 40k hours of data and then frozen to label all 11 million hours. yup! the idea is that it should take a small amount of data to generalize environment dynamics, then you can use a lot of data to understand actions.
yoyohello13 2026-02-25 22:28 UTC link
Too technical for HN
AndrewKemendo 2026-02-25 23:14 UTC link
This looks like a really promising approach

In particular the Forward rollout module is very important. It aligns your (effectively) world model with what it expects from the world, and keeping those in sync I think gives this the power it needs to be able to generate the state action pairs to continuously train semi supervised

dangoodmanUT 2026-02-25 23:15 UTC link
11 million hours of data is a lot, did you have to synthesize it at all, or was it purely collected?
Editorial Channel
What the content says
+0.30
Article 19 Freedom of Expression
Medium Advocacy Practice
Editorial
+0.30
SETL
+0.17

Content explicitly discusses information freedom: model trains 'unsupervised from the entirety of the internet' and post advocates for 'internet-scale video corpus' as necessary for AI development. Technical documentation and demos are publicly shared.

+0.20
Article 27 Cultural Participation
Medium Advocacy Practice
Editorial
+0.20
SETL
+0.14

Post describes technical innovation and scientific advancement; frames FDM-1 as enabling new capability in computer vision and AI research. Advocates for 'internet-scale video corpus' as necessary for scientific progress.

+0.10
Article 13 Freedom of Movement
Medium Practice
Editorial
+0.10
SETL
-0.09

Content is published openly and accessible globally; post discusses training on internet-scale video and knowledge sharing through technical publication.

0.00
Article 1 Freedom, Equality, Brotherhood
Low
Editorial
0.00
SETL
ND

No observable engagement with principles of human equality or dignity in context of technical development.

0.00
Article 3 Life, Liberty, Security
Low
Editorial
0.00
SETL
ND

No observable engagement with right to life or personal security in context of autonomous vehicle deployment.

0.00
Article 4 No Slavery
Low
Editorial
0.00
SETL
ND

No engagement with slavery or forced servitude; content does not address labor implications of automation.

0.00
Article 5 No Torture
Low
Editorial
0.00
SETL
ND

No engagement with freedom from torture or cruel punishment.

0.00
Article 6 Legal Personhood
Low
Editorial
0.00
SETL
ND

No engagement with legal personhood or rights recognition.

0.00
Article 7 Equality Before Law
Low
Editorial
0.00
SETL
ND

No engagement with equality before law.

0.00
Article 8 Right to Remedy
Low
Editorial
0.00
SETL
ND

No engagement with legal remedy or justice access.

0.00
Article 9 No Arbitrary Detention
Low
Editorial
0.00
SETL
ND

No engagement with arbitrary detention.

0.00
Article 10 Fair Hearing
Low
Editorial
0.00
SETL
ND

No engagement with fair trial or due process.

0.00
Article 11 Presumption of Innocence
Low
Editorial
0.00
SETL
ND

No engagement with criminal responsibility or ex post facto protection.

0.00
Article 14 Asylum
Low
Editorial
0.00
SETL
ND

No engagement with asylum or political refuge.

0.00
Article 15 Nationality
Low
Editorial
0.00
SETL
ND

No engagement with nationality or citizenship.

0.00
Article 16 Marriage & Family
Low
Editorial
0.00
SETL
ND

No engagement with marriage, family, or intimate relationships.

0.00
Article 17 Property
Low
Editorial
0.00
SETL
ND

No engagement with property rights or ownership.

0.00
Article 18 Freedom of Thought
Low
Editorial
0.00
SETL
ND

No engagement with freedom of thought, conscience, or belief.

0.00
Article 20 Assembly & Association
Low
Editorial
0.00
SETL
ND

No engagement with freedom of assembly or association.

0.00
Article 21 Political Participation
Low
Editorial
0.00
SETL
ND

No engagement with political participation or democratic governance.

0.00
Article 22 Social Security
Low
Editorial
0.00
SETL
ND

No engagement with social security or welfare rights.

0.00
Article 24 Rest & Leisure
Low
Editorial
0.00
SETL
ND

No engagement with rest, leisure, or reasonable working hours.

0.00
Article 25 Standard of Living
Low
Editorial
0.00
SETL
ND

No engagement with healthcare or adequate living standards.

0.00
Article 26 Education
Medium Practice
Editorial
0.00
SETL
+0.15

No observable discussion of education access or cultural participation in technical content.

0.00
Article 30 No Destruction of Rights
Low
Editorial
0.00
SETL
ND

No engagement with prevention of rights destruction.

-0.10
Preamble Preamble
Medium Framing
Editorial
-0.10
SETL
-0.10

Content framing emphasizes technical capability and product launch without engagement with dignity, human welfare, or social purpose dimensions reflected in UDHR Preamble.

-0.10
Article 29 Duties to Community
Medium Framing
Editorial
-0.10
SETL
-0.10

Post frames technical capability and capability advancement without discussing duties or responsibilities. Omits discussion of human rights obligations, responsible deployment, or ethical constraints on model use.

-0.15
Article 2 Non-Discrimination
Medium Practice
Editorial
-0.15
SETL
0.00

Content does not address discrimination or equal protection; self-driving demo shows real-world deployment without discussion of accessibility or inclusivity.

-0.20
Article 23 Work & Equal Pay
Medium Practice
Editorial
-0.20
SETL
-0.17

Post emphasizes replacing 'contractor-annotated screenshots' with unsupervised learning, framing contractor labor as an obsolete cost center rather than recognizing worker dignity. Describes automation of 'CAD, finance, engineering, and eventually ML research' without discussion of labor displacement or worker rights.

-0.20
Article 28 Social & International Order
Medium Framing
Editorial
-0.20
SETL
-0.20

Post presents technical capability (autonomous driving, UI testing, CAD automation) without discussing social order or frameworks needed to govern deployment. Frames capabilities as inevitable ('coworker for CAD, finance, engineering') without addressing regulatory, ethical, or institutional structures required by Article 28.

-0.25
Article 12 Privacy
Medium Practice
Editorial
-0.25
SETL
-0.19

Post describes training on 11-million-hour screen recording dataset without addressing privacy consent or data subject notification. Frames data collection as technical achievement rather than privacy-sensitive activity.

Structural Channel
What the site does
+0.20
Article 19 Freedom of Expression
Medium Advocacy Practice
Structural
+0.20
Context Modifier
+0.15
SETL
+0.17

Blog post itself is public and freely accessible; DCP access model modifier (+0.05) reflects open-access technical content. Technical audience orientation somewhat limits accessibility but does not restrict information availability.

+0.15
Article 13 Freedom of Movement
Medium Practice
Structural
+0.15
Context Modifier
0.00
SETL
-0.09

Blog post is publicly readable without geographic restrictions; open technical content suggests support for freedom of movement through information access.

+0.10
Article 27 Cultural Participation
Medium Advocacy Practice
Structural
+0.10
Context Modifier
+0.10
SETL
+0.14

Public sharing of technical research and demonstrations supports scientific participation; DCP notes modest mission framing around technical capability (modifier +0.1 affects Article 19 and 27).

0.00
Preamble Preamble
Medium Framing
Structural
0.00
Context Modifier
0.00
SETL
-0.10

Public blog post with accessible reading on open platform; structural signals neither advance nor impede Preamble values.

0.00
Article 1 Freedom, Equality, Brotherhood
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural barriers or enablements observable regarding equal treatment.

0.00
Article 3 Life, Liberty, Security
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals regarding safety or life protection.

0.00
Article 4 No Slavery
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No observable structural signals related to forced labor.

0.00
Article 5 No Torture
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 6 Legal Personhood
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 7 Equality Before Law
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 8 Right to Remedy
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 9 No Arbitrary Detention
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 10 Fair Hearing
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 11 Presumption of Innocence
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 14 Asylum
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 15 Nationality
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 16 Marriage & Family
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 17 Property
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 18 Freedom of Thought
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 20 Assembly & Association
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 21 Political Participation
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 22 Social Security
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 24 Rest & Leisure
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 25 Standard of Living
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

0.00
Article 28 Social & International Order
Medium Framing
Structural
0.00
Context Modifier
0.00
SETL
-0.20

No observable structural engagement with social order principles; product deployment (autonomous vehicle demo) occurs without apparent coordination with governance frameworks.

0.00
Article 29 Duties to Community
Medium Framing
Structural
0.00
Context Modifier
0.00
SETL
-0.10

No observable structural signals regarding duties or community obligations.

0.00
Article 30 No Destruction of Rights
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No structural signals applicable.

-0.05
Article 23 Work & Equal Pay
Medium Practice
Structural
-0.05
Context Modifier
0.00
SETL
-0.17

As a technology product, FDM-1 is designed to automate computer-based work; no observable structural protections for affected workers in the post.

-0.10
Article 12 Privacy
Medium Practice
Structural
-0.10
Context Modifier
0.00
SETL
-0.19

No privacy policy visible on URL per DCP; JavaScript-heavy site may employ tracking. DCP notes potential privacy concerns from dataset training sourced but no on-domain policy visible (DCP modifier: null, but affects Article 12).

-0.15
Article 2 Non-Discrimination
Medium Practice
Structural
-0.15
Context Modifier
-0.15
SETL
0.00

Video content embedded without captions; JavaScript-heavy player may exclude assistive technology users; inherited DCP modifier (-0.15) for accessibility applies.

-0.15
Article 26 Education
Medium Practice
Structural
-0.15
Context Modifier
-0.10
SETL
+0.15

Video-heavy content with JavaScript-intensive player and no captions; DCP accessibility modifier (-0.15) applies. Technical content is specialized and assumes domain knowledge, limiting education accessibility.

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.58 high claims
Sources
0.6
Evidence
0.6
Uncertainty
0.3
Purpose
0.8
Propaganda Flags
3 manipulative rhetoric techniques found
3 techniques detected
appeal to authority
Claims of 'first model' and 'uniquely good' without peer review or third-party validation visible in post; authority rests on company assertion.
loaded language
Describes contractor labor as 'expensive' cost problem; frames elimination of human annotation work as technical progress without acknowledging worker impact.
causal oversimplification
Suggests internet-scale video corpus is necessary condition for AI competence without discussing alternatives, safeguards, or tradeoffs (privacy, consent, labor).
Emotional Tone
Emotional character: positive/negative, intensity, authority
celebratory
Valence
+0.7
Arousal
0.7
Dominance
0.8
Transparency
Does the content identify its author and disclose interests?
0.50
✗ Author
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.28 problem only
Reader Agency
0.2
Stakeholder Voice
Whose perspectives are represented in this content?
0.25 2 perspectives
Speaks: corporationinstitution
About: workerscontractorsusers
Temporal Framing
Is this content looking backward, at the present, or forward?
present medium term
Geographic Scope
What geographic area does this content cover?
global
San Francisco, internet
Complexity
How accessible is this content to a general audience?
technical high jargon domain specific
Longitudinal 1883 HN snapshots · 8 evals
+1 0 −1 HN
Audit Trail 28 entries
2026-02-28 14:29 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 14:29 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
Technical post no rights stance
2026-02-27 16:34 eval_success Light evaluated: Neutral (0.00) - -
2026-02-27 16:34 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-26 21:17 eval_success Evaluated: Neutral (0.02) - -
2026-02-26 21:17 eval Evaluated by deepseek-v3.2: +0.02 (Neutral) 11,043 tokens
2026-02-26 20:26 dlq Dead-lettered after 1 attempts: The First Fully General Computer Action Model - -
2026-02-26 20:25 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:23 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:22 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:51 dlq Dead-lettered after 1 attempts: The First Fully General Computer Action Model - -
2026-02-26 17:49 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:48 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:47 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 15:05 rater_validation_fail Parse failure for model deepseek-v3.2: Error: Failed to parse OpenRouter JSON: SyntaxError: Expected ',' or '}' after property value in JSON at position 421 (line 14 column 4). Extracted text starts with: { "schema_version": "3.7", "ev - -
2026-02-26 09:19 dlq Dead-lettered after 1 attempts: The First Fully General Computer Action Model - -
2026-02-26 09:19 dlq Dead-lettered after 1 attempts: The First Fully General Computer Action Model - -
2026-02-26 09:19 dlq Dead-lettered after 1 attempts: The First Fully General Computer Action Model - -
2026-02-26 09:17 rate_limit OpenRouter rate limited (429) model=hermes-3-405b - -
2026-02-26 09:17 rate_limit OpenRouter rate limited (429) model=mistral-small-3.1 - -
2026-02-26 09:17 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 09:16 rate_limit OpenRouter rate limited (429) model=hermes-3-405b - -
2026-02-26 09:16 rate_limit OpenRouter rate limited (429) model=mistral-small-3.1 - -
2026-02-26 00:31 eval Evaluated by claude-haiku-4-5-20251001: -0.01 (Neutral) 13,760 tokens +0.07
2026-02-25 23:38 eval Evaluated by claude-haiku-4-5-20251001: -0.08 (Neutral) 13,224 tokens -0.02
2026-02-25 23:33 eval Evaluated by claude-haiku-4-5-20251001: -0.07 (Neutral) 13,063 tokens -0.10
2026-02-25 23:23 eval Evaluated by claude-haiku-4-5-20251001: +0.03 (Neutral) 12,944 tokens +0.18
2026-02-25 22:09 eval Evaluated by claude-haiku-4-5-20251001: -0.15 (Mild negative) 9,527 tokens