Show HN: Aft, a Python toolkit to study agent behavior

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

ND	Show HN: Aft, a Python toolkit to study agent behavior (github.com)
	1 points by chse_cake 3 days ago \| 0 comments on HN ~lite vlite-2.0

Summary ~lite

GitHub page for Aft, a Python toolkit for studying AI agent behavior.

Lite evaluation by llama-4-scout-wai-psq · editorial channel only · no per-section breakdown available

Longitudinal 11 HN snapshots · 5 evals

Audit Trail 12 entries

2026-03-05 06:46	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-05 06:46	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive)
2026-03-05 06:45	eval_success	PSQ evaluated: g-PSQ=0.284 (3 dims)	- -
2026-03-05 06:45	eval	Evaluated by llama-3.3-70b-wai-psq: +0.28 (Mild positive) 0.00
2026-03-05 06:40	eval_success	PSQ evaluated: g-PSQ=0.284 (3 dims)	- -
2026-03-05 06:40	eval	Evaluated by llama-3.3-70b-wai-psq: +0.28 (Mild positive)
2026-03-03 02:25	eval_success	Lite evaluated: Neutral (0.08)	- -
2026-03-03 02:25	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-03 02:25	eval	Evaluated by llama-4-scout-wai: +0.08 (Neutral)
	reasoning GitHub repository for Aft, a Python toolkit for studying AI agent behavior
2026-03-03 02:25	eval_success	Lite evaluated: Neutral (0.05)	- -
2026-03-03 02:25	eval	Evaluated by llama-3.3-70b-wai: +0.05 (Neutral)
	reasoning AI agent behavior toolkit
2026-03-03 02:25	rater_validation_warn	Lite validation warnings for model llama-3.3-70b-wai: 1W 0R	- -

build 477e599+ntxh · deployed 2026-03-06 02:19 UTC · evaluated 2026-03-03 07:16:53 UTC