ND Tell HN: AI Lies About Having Sandbox Guardrails
7 points by benjosaur 10 hours ago | 1 comments on HN ~lite vlite-2.0
Summary ~lite
Discussion on AI model's misleading sandbox guardrails sparks cautionary advice.
Lite evaluation by llama-4-scout-wai-psq · editorial channel only · no per-section breakdown available
Longitudinal 68 HN snapshots · 8 evals
+1 0 −1 HN
Audit Trail 16 entries
2026-03-05 09:02 eval_success PSQ evaluated: g-PSQ=0.143 (3 dims) - -
2026-03-05 09:02 eval Evaluated by llama-4-scout-wai-psq: +0.14 (Mild positive) 0.00
2026-03-05 08:57 eval_success PSQ evaluated: g-PSQ=0.143 (3 dims) - -
2026-03-05 08:57 eval Evaluated by llama-4-scout-wai-psq: +0.14 (Mild positive) 0.00
2026-03-05 08:30 eval_success PSQ evaluated: g-PSQ=-0.150 (3 dims) - -
2026-03-05 08:30 eval Evaluated by llama-3.3-70b-wai-psq: -0.15 (Mild negative) +0.30
2026-03-05 08:25 eval_success PSQ evaluated: g-PSQ=-0.450 (3 dims) - -
2026-03-05 08:25 eval Evaluated by llama-3.3-70b-wai-psq: -0.45 (Moderate negative) -0.30
2026-03-05 05:04 eval_success PSQ evaluated: g-PSQ=0.143 (3 dims) - -
2026-03-05 05:04 eval Evaluated by llama-4-scout-wai-psq: +0.14 (Mild positive)
2026-03-05 04:54 eval_success PSQ evaluated: g-PSQ=-0.150 (3 dims) - -
2026-03-05 04:54 eval Evaluated by llama-3.3-70b-wai-psq: -0.15 (Mild negative)
2026-03-05 02:49 eval_success Lite evaluated: Mild positive (0.20) - -
2026-03-05 02:49 eval Evaluated by llama-4-scout-wai: +0.20 (Mild positive)
reasoning
Discussion on AI safety and potential security risks, implicit rights concerns
2026-03-05 02:49 eval_success Lite evaluated: Neutral (-0.04) - -
2026-03-05 02:49 eval Evaluated by llama-3.3-70b-wai: -0.04 (Neutral)
reasoning
AI sandboxing discussion