| 2026-03-05 16:43 | eval_success | PSQ evaluated: g-PSQ=-0.257 (3 dims) | - - |
| 2026-03-05 16:43 |
eval
|
Evaluated by llama-4-scout-wai-psq: -0.26 (Mild negative) | |
| 2026-03-05 16:33 | eval_success | PSQ evaluated: g-PSQ=-0.474 (3 dims) | - - |
| 2026-03-05 16:32 |
eval
|
Evaluated by llama-3.3-70b-wai-psq: -0.47 (Moderate negative) | |
| 2026-02-28 11:22 | model_divergence | Cross-model spread 0.30 exceeds threshold (2 models) | - - |
| 2026-02-28 11:22 | eval_success | Lite evaluated: Mild positive (0.10) | - - |
| 2026-02-28 11:22 |
eval
|
Evaluated by llama-4-scout-wai: +0.10 (Mild positive) 0.00 | |
| reasoning Editorial complaining about Amazon's customer service |
| 2026-02-28 11:22 | rater_validation_warn | Lite validation warnings for model llama-4-scout-wai: 0W 1R | - - |
| 2026-02-28 11:17 | model_divergence | Cross-model spread 0.30 exceeds threshold (2 models) | - - |
| 2026-02-28 11:17 | eval_success | Lite evaluated: Moderate positive (0.40) | - - |
| 2026-02-28 11:17 |
eval
|
Evaluated by llama-3.3-70b-wai: +0.40 (Moderate positive) 0.00 | |
| reasoning Exposing corporate abuse |
| 2026-02-28 11:17 | rater_validation_warn | Lite validation warnings for model llama-3.3-70b-wai: 0W 1R | - - |
| 2026-02-28 11:16 | model_divergence | Cross-model spread 0.30 exceeds threshold (2 models) | - - |
| 2026-02-28 11:16 | eval_success | Lite evaluated: Mild positive (0.10) | - - |
| 2026-02-28 11:16 |
eval
|
Evaluated by llama-4-scout-wai: +0.10 (Mild positive) | |
| reasoning Editorial complaining about Amazon's customer service |
| 2026-02-28 11:16 | rater_validation_warn | Lite validation warnings for model llama-4-scout-wai: 0W 1R | - - |
| 2026-02-28 11:12 | rater_validation_warn | Lite validation warnings for model llama-3.3-70b-wai: 0W 1R | - - |
| 2026-02-28 11:12 | eval_success | Lite evaluated: Moderate positive (0.40) | - - |
| 2026-02-28 11:12 |
eval
|
Evaluated by llama-3.3-70b-wai: +0.40 (Moderate positive) | |
| reasoning Exposing corporate abuse |