Top AI models underperform in languages other than English

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

Model: @cf/meta/llama-3.3-70b-instruct-fp8-fast lite ND @cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 @cf/meta/llama-4-scout-17b-16e-instruct lite ND @cf/meta/llama-4-scout-17b-16e-instruct lite +0.10 Compare

ND	Top AI models underperform in languages other than English (www.economist.com)
	17 points by Brajeshwar 9 days ago \| 4 comments on HN ~lite vlite-2.0

Summary ~lite

Neutral news article

Lite evaluation by llama-3.3-70b-wai-psq · editorial channel only · no per-section breakdown available

Longitudinal 66 HN snapshots · 16 evals

Audit Trail 33 entries

2026-03-20 00:42	eval_success	PSQ evaluated: g-PSQ=-0.092 (3 dims)	- -
2026-03-20 00:42	eval	Evaluated by llama-3.3-70b-wai-psq: -0.09 (Neutral)
2026-03-20 00:40	eval_success	Lite evaluated: Mild negative (-0.26)	- -
2026-03-20 00:40	eval	Evaluated by llama-3.3-70b-wai: -0.26 (Mild negative)
	reasoning Technical article on AI models
2026-03-20 00:40	rater_validation_warn	Lite validation warnings for model llama-3.3-70b-wai: 1W 0R	- -
2026-03-19 22:30	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-19 22:30	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-19 22:29	eval_success	Lite evaluated: Mild negative (-0.20)	- -
2026-03-19 22:29	eval	Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
	reasoning The article discusses AI models' performance in languages other than English, with no explicit human rights discussion.
2026-03-19 21:18	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-19 21:18	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-19 21:10	eval_success	Lite evaluated: Mild negative (-0.20)	- -
2026-03-19 21:10	eval	Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
	reasoning The article discusses AI models' performance in languages other than English, with no explicit human rights discussion.
2026-03-19 19:53	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-19 19:53	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-19 19:50	eval_success	Lite evaluated: Mild negative (-0.20)	- -
2026-03-19 19:50	eval	Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
	reasoning The article discusses AI models' performance in languages other than English, with no explicit human rights discussion.
2026-03-19 18:38	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-19 18:38	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-19 18:33	eval_success	Lite evaluated: Mild negative (-0.20)	- -
2026-03-19 18:33	eval	Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
	reasoning The article discusses AI models' performance in languages other than English, with no explicit human rights discussion.
2026-03-19 17:21	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-19 17:21	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-19 17:19	eval_success	Lite evaluated: Mild negative (-0.20)	- -
2026-03-19 17:19	eval	Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
	reasoning The article discusses AI models' performance in languages other than English, with no explicit human rights discussion.
2026-03-19 16:04	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-19 16:04	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-19 16:03	eval_success	Lite evaluated: Mild negative (-0.20)	- -
2026-03-19 16:03	eval	Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
	reasoning The article discusses AI models' performance in languages other than English, with no explicit human rights discussion.
2026-03-19 14:44	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-19 14:44	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive)
2026-03-19 14:44	eval_success	Lite evaluated: Mild negative (-0.20)	- -
2026-03-19 14:44	eval	Evaluated by llama-4-scout-wai: -0.20 (Mild negative)
	reasoning The article discusses AI models' performance in languages other than English, with no explicit human rights discussion.

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-16 02:03:38 UTC