Longitudinal · 27 evals
+1 0 −1 HN
Audit Trail 47 entries
2026-03-01 15:54 eval_success Evaluated: Neutral (-0.01) - -
2026-03-01 15:54 model_divergence Cross-model spread 0.50 exceeds threshold (2 models) - -
2026-03-01 15:54 eval Evaluated by deepseek-v3.2: -0.01 (Neutral) 14,573 tokens -0.04
2026-03-01 15:54 rater_validation_warn Validation warnings for model deepseek-v3.2: 0W 2R - -
2026-02-28 16:08 eval_success Lite evaluated: Mild negative (-0.20) - -
2026-02-28 16:08 model_divergence Cross-model spread 0.85 exceeds threshold (4 models) - -
2026-02-28 16:08 eval Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 13:46 eval_success Evaluated: Neutral (0.03) - -
2026-02-28 13:46 model_divergence Cross-model spread 0.85 exceeds threshold (4 models) - -
2026-02-28 13:46 eval Evaluated by deepseek-v3.2: +0.03 (Neutral) 14,968 tokens -0.11
2026-02-28 13:46 rater_validation_warn Validation warnings for model deepseek-v3.2: 1W 0R - -
2026-02-28 13:15 model_divergence Cross-model spread 0.85 exceeds threshold (3 models) - -
2026-02-28 13:15 eval_success Lite evaluated: Mild negative (-0.20) - -
2026-02-28 13:15 eval Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 13:15 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 10:28 model_divergence Cross-model spread 0.85 exceeds threshold (3 models) - -
2026-02-28 10:28 eval_success Lite evaluated: Mild negative (-0.20) - -
2026-02-28 10:28 eval Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 10:28 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 09:31 model_divergence Cross-model spread 0.85 exceeds threshold (3 models) - -
2026-02-28 09:31 eval_success Light evaluated: Mild negative (-0.20) - -
2026-02-28 09:31 eval Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 09:31 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 08:02 model_divergence Cross-model spread 0.85 exceeds threshold (3 models) - -
2026-02-28 08:02 eval_success Light evaluated: Moderate positive (0.30) - -
2026-02-28 08:02 eval Evaluated by llama-3.3-70b-wai: +0.30 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 08:02 rater_validation_warn Light validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 07:46 eval Evaluated by llama-3.3-70b-wai: +0.30 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 07:33 eval Evaluated by llama-4-scout-wai: -0.20 (Mild negative) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 06:42 eval Evaluated by llama-3.3-70b-wai: +0.30 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 06:13 eval Evaluated by llama-3.3-70b-wai: +0.30 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 06:06 eval Evaluated by llama-3.3-70b-wai: +0.30 (Moderate positive) -0.20
reasoning
Exposing privacy abuse
2026-02-28 05:17 eval Evaluated by llama-4-scout-wai: -0.20 (Mild negative) -0.20
reasoning
ED, slightly negative lean on user data handling
2026-02-28 05:00 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 04:53 eval Evaluated by llama-3.3-70b-wai: +0.50 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 04:51 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) -0.20
reasoning
ED, slightly negative lean on user data handling
2026-02-28 04:19 eval Evaluated by llama-3.3-70b-wai: +0.50 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 03:46 eval Evaluated by llama-3.3-70b-wai: +0.50 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 03:23 eval Evaluated by llama-3.3-70b-wai: +0.50 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 03:17 eval Evaluated by llama-3.3-70b-wai: +0.50 (Moderate positive) 0.00
reasoning
Exposing privacy abuse
2026-02-28 02:57 eval Evaluated by llama-4-scout-wai: +0.20 (Mild positive) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 02:55 eval Evaluated by llama-4-scout-wai: +0.20 (Mild positive) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 02:35 eval Evaluated by llama-4-scout-wai: +0.20 (Mild positive) 0.00
reasoning
ED, slightly negative lean on user data handling
2026-02-28 02:29 eval Evaluated by llama-4-scout-wai: +0.20 (Mild positive)
reasoning
ED, slightly negative lean on user data handling
2026-02-28 02:06 eval Evaluated by llama-3.3-70b-wai: +0.50 (Moderate positive)
reasoning
Exposing privacy abuse
2026-02-28 01:18 eval Evaluated by claude-haiku-4-5: +0.65 (Strong positive)
2026-02-27 00:03 eval Evaluated by deepseek-v3.2: +0.14 (Mild positive) 14,007 tokens