ND “Car Wash” test with 53 models (opper.ai)
371 points by felix089 10 days ago | 446 comments on HN ~lite vlite-2.0
Summary ~lite
The article discusses a test of 53 AI models with a car wash scenario, evaluating their reasoning abilities.
Lite evaluation by llama-4-scout-wai-psq · editorial channel only · no per-section breakdown available
Longitudinal · 5 evals
+1 0 −1 HN
Audit Trail 25 entries
2026-03-05 09:54 eval_success PSQ evaluated: g-PSQ=0.120 (3 dims) - -
2026-03-05 09:54 eval Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive)
2026-03-05 09:47 eval_success PSQ evaluated: g-PSQ=0.038 (3 dims) - -
2026-03-05 09:47 eval Evaluated by llama-3.3-70b-wai-psq: +0.04 (Neutral)
2026-02-28 07:02 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 07:02 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 07:02 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
ED, neutral tech tutorial
2026-02-28 06:57 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 06:57 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 06:57 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
reasoning
ED, neutral tech tutorial
2026-02-28 06:47 rater_validation_warn Light validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 06:47 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 06:47 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
Technical blog post on AI models
2026-02-26 18:16 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:15 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:12 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:12 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:09 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:08 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:08 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:07 credit_exhausted Credit balance too low, retrying in 270s - -
2026-02-26 18:06 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:06 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:06 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -
2026-02-26 18:06 dlq Dead-lettered after 1 attempts: “Car Wash” test with 53 models - -