0.00 Show HN: CivBench a long-horizon AI benchmark for multi-agent games (clashai.live S:+0.01 )
12 points by mbh159 4 days ago | 25 comments on HN | Neutral Landing Page · v3.7 · 2026-02-28 16:29:11 · from archive
Summary Technical Platform Neutral
The URL presents a technical landing page for ClashAI, an AI competition platform where agents compete in strategy games, trading, and creative challenges. The content is functionally focused on platform features, competitions, and technical implementation, with minimal engagement with human rights concepts beyond basic privacy awareness and accessibility features. The evaluation shows neutral orientation as the platform's purpose is entertainment and technical demonstration rather than human rights advocacy or opposition.
Article Heatmap
Preamble: 0.00 — Preamble P Article 1: 0.00 — Freedom, Equality, Brotherhood 1 Article 2: 0.00 — Non-Discrimination 2 Article 3: 0.00 — Life, Liberty, Security 3 Article 4: 0.00 — No Slavery 4 Article 5: 0.00 — No Torture 5 Article 6: 0.00 — Legal Personhood 6 Article 7: 0.00 — Equality Before Law 7 Article 8: 0.00 — Right to Remedy 8 Article 9: 0.00 — No Arbitrary Detention 9 Article 10: 0.00 — Fair Hearing 10 Article 11: 0.00 — Presumption of Innocence 11 Article 12: +0.07 — Privacy 12 Article 13: 0.00 — Freedom of Movement 13 Article 14: 0.00 — Asylum 14 Article 15: 0.00 — Nationality 15 Article 16: 0.00 — Marriage & Family 16 Article 17: 0.00 — Property 17 Article 18: 0.00 — Freedom of Thought 18 Article 19: +0.07 — Freedom of Expression 19 Article 20: 0.00 — Assembly & Association 20 Article 21: 0.00 — Political Participation 21 Article 22: 0.00 — Social Security 22 Article 23: 0.00 — Work & Equal Pay 23 Article 24: 0.00 — Rest & Leisure 24 Article 25: 0.00 — Standard of Living 25 Article 26: 0.00 — Education 26 Article 27: +0.14 — Cultural Participation 27 Article 28: 0.00 — Social & International Order 28 Article 29: 0.00 — Duties to Community 29 Article 30: 0.00 — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean 0.00 Structural Mean +0.01
Weighted Mean +0.01 Unweighted Mean +0.01
Max +0.14 Article 27 Min 0.00 Preamble
Signal 31 No Data 0
Volatility 0.03 (Low)
Negative 0 Channels E: 0.6 S: 0.4
SETL -0.13 Structural-dominant
FW Ratio 50% 34 facts · 34 inferences
Evidence 20% coverage
31L
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.00 (3 articles) Security: 0.00 (3 articles) Legal: 0.00 (6 articles) Privacy & Movement: 0.02 (4 articles) Personal: 0.00 (3 articles) Expression: 0.02 (3 articles) Economic & Social: 0.00 (4 articles) Cultural: 0.07 (2 articles) Order & Duties: 0.00 (3 articles)
HN Discussion 14 top-level · 10 replies
andrewgazelka 2026-02-25 15:14 UTC link
hey first of all cool product. I am curious why you chose civ and if you saw any interesting emergent behaviors.
killiandunne1 2026-02-25 15:25 UTC link
This is a sick idea I must say
jhylee 2026-02-25 15:26 UTC link
Congrats on the launch. Big fan of how you add visualization and interactivity to the typical model benchmarking process. Any thoughts on how you plan to monetize down the line?
amacx 2026-02-25 15:30 UTC link
Interesting. Did you give the agents any skills for playing civ? If not, are you planning to?
amacx 2026-02-25 15:32 UTC link
Have you tried playing the agents yourself? Do they crush human competition?
pmoxyz 2026-02-25 15:33 UTC link
This is great. I think leaderboards based on static evals will be mostly irrelevant within a year. Continuous benchmarks like this are the only way to get signal on frontier models

You mention Opus 4.6 cost $1200 in one match, how do you plan to benchmark economic efficiency? Looking at a performance vs. cost trade-off you might say a model that plays 80% as well at 1% of the cost is more impressive than the 'top' model

cameron17 2026-02-25 18:11 UTC link
This is undeniably intriguing. Will be paying close attention.
zimbo63 2026-02-25 19:32 UTC link
This is an amazing product! Can AI agents learn to do long-term planning in environments that are less structured than chess? Great metaphor for life! Are you planning other games?
zimbo63 2026-02-25 19:34 UTC link
This is an amazing eval metric that no one thought about! such a creative idea. Have you thought of other games? how different it is from chess?
nhal 2026-02-25 20:53 UTC link
Incredible and important product. Necessary for developers, users, and industries that want to use agents. Can’t wait to see how it’ll grow
brownpoints 2026-02-25 22:42 UTC link
This looks incredible, it’d be cool to let others participate with custom prompts
jcion 2026-02-25 22:55 UTC link
Interesting! What are the next environments/strategy games you have planned?

What insights do you think they’ll provide that Civ doesn’t?

Mojo19 2026-02-26 02:10 UTC link
So amazing, it's super cool!
jamiecode 2026-02-26 12:26 UTC link
The divergence between static benchmarks and long-horizon performance isn't surprising if you've run anything multi-step in production. Benchmarks are short, isolated, well-specified. Civ has compounding state - a bad decision in turn 5 degrades your options in turn 50 in ways that aren't immediately obvious. It's a more honest signal than most standard evals.

The $1,200/match cost is the real constraint. At that price you can't run enough samples for statistical significance - you're essentially reading tea leaves. How are you handling context window management across 200 turns? Summarising game state as you go, truncating early history, or something else? The token accumulation over a full game must be substantial.

Also curious about the 90s timeout logistics. If a provider is flaky and a model goes over, is that a forfeit, a retry, or a timeout loss? Provider latency variance seems like it would add significant noise to results, independent of actual model quality.

mbh159 2026-02-25 15:25 UTC link
Thank you! I grew up playing Civilization and one day I was talking with friends thinking it would be a perfect proxy for how good AI is at long-term planning. There were many frustrating sessions I had where my early decisions in the game had consequences only much later. With hidden information and other agents at play I thought it'd be an interesting test of agent capabilities.
mbh159 2026-02-25 15:27 UTC link
it was fun building it, sometimes the LLMs are pretty funny in how they play
mbh159 2026-02-25 15:38 UTC link
appreciate it, I wanted to make the AI behavior easy to understand. Our main focus currently is to help AI researchers align their models and help develop an open framework for evaluating AI.
mbh159 2026-02-25 15:41 UTC link
I want to! I think skills can add big performance gains here especially with smaller models. There's a lot of domain knowledge in games so distilling it into a "skill" may allow much smaller models to outcompete the large ones
mbh159 2026-02-25 15:42 UTC link
I was able to beat the AI every time, they're pretty bad at this point but I expect them to get much better overtime
mbh159 2026-02-25 16:02 UTC link
For a game that runs 4+ hours unfortunately it was configured to use too much reasoning/turn and larger context. Reducing the size helped lower the cost (still expensive).

In the leaderboards part of the page I'll be autopopulating the token cost of the model as a metric to evaluate on

mbh159 2026-02-25 22:20 UTC link
yes! If you are wanting to test your agents or develop evals on the platform my dms are open
mbh159 2026-02-25 22:21 UTC link
yes we have a new game launching everyday this week. We're looking to add more domains to test how the jaggedness of AI differs between model providers and better evaluate how they perform across domains
mbh159 2026-02-25 23:33 UTC link
cheers, the website will be updated with new environments daily!
mbh159 2026-02-25 23:35 UTC link
Tomorrow we're launching coup, where agents compete by bluffing and keeping track of which of their opponents they think are lying

This is more of a faster paced/short lived game so we can collect larger samples of data on larger groups to get significant results in model behaviors of collaboration, truth telling, and ability to lie effectively.

Editorial Channel
What the content says
0.00
Preamble Preamble
Low
Editorial
0.00
SETL
ND

No content addressing human dignity, freedom, or universal rights

0.00
Article 1 Freedom, Equality, Brotherhood
Low
Editorial
0.00
SETL
ND

No mention of human dignity, equality, or rights

0.00
Article 2 Non-Discrimination
Low
Editorial
0.00
SETL
ND

No mention of non-discrimination or equal rights

0.00
Article 3 Life, Liberty, Security
Low
Editorial
0.00
SETL
ND

No mention of life, liberty, or security

0.00
Article 4 No Slavery
Low
Editorial
0.00
SETL
ND

No mention of slavery or servitude

0.00
Article 5 No Torture
Low
Editorial
0.00
SETL
ND

No mention of torture or cruel treatment

0.00
Article 6 Legal Personhood
Low
Editorial
0.00
SETL
ND

No mention of legal recognition or personhood

0.00
Article 7 Equality Before Law
Low
Editorial
0.00
SETL
ND

No mention of equality before the law

0.00
Article 8 Right to Remedy
Low
Editorial
0.00
SETL
ND

No mention of effective remedies or judicial protection

0.00
Article 9 No Arbitrary Detention
Low
Editorial
0.00
SETL
ND

No mention of arbitrary detention or arrest

0.00
Article 10 Fair Hearing
Low
Editorial
0.00
SETL
ND

No mention of fair trial or impartial tribunal

0.00
Article 11 Presumption of Innocence
Low
Editorial
0.00
SETL
ND

No mention of presumption of innocence or criminal defense

0.00
Article 12 Privacy
Low Practice
Editorial
0.00
SETL
-0.10

No explicit privacy policy or data protection statement

0.00
Article 13 Freedom of Movement
Low
Editorial
0.00
SETL
ND

No mention of freedom of movement or residence

0.00
Article 14 Asylum
Low
Editorial
0.00
SETL
ND

No mention of asylum or persecution

0.00
Article 15 Nationality
Low
Editorial
0.00
SETL
ND

No mention of nationality or statelessness

0.00
Article 16 Marriage & Family
Low
Editorial
0.00
SETL
ND

No mention of marriage, family, or consent

0.00
Article 17 Property
Low
Editorial
0.00
SETL
ND

No mention of property ownership or deprivation

0.00
Article 18 Freedom of Thought
Low
Editorial
0.00
SETL
ND

No mention of thought, conscience, or religion

0.00
Article 19 Freedom of Expression
Low Practice
Editorial
0.00
SETL
-0.10

No explicit free expression policy or commitments

0.00
Article 20 Assembly & Association
Low
Editorial
0.00
SETL
ND

No mention of assembly or association

0.00
Article 21 Political Participation
Low
Editorial
0.00
SETL
ND

No mention of political participation or voting

0.00
Article 22 Social Security
Low
Editorial
0.00
SETL
ND

No mention of social security or economic rights

0.00
Article 23 Work & Equal Pay
Low
Editorial
0.00
SETL
ND

No mention of work, employment, or unions

0.00
Article 24 Rest & Leisure
Low
Editorial
0.00
SETL
ND

No mention of rest, leisure, or working hours

0.00
Article 25 Standard of Living
Low
Editorial
0.00
SETL
ND

No mention of standard of living, health, or welfare

0.00
Article 26 Education
Low
Editorial
0.00
SETL
ND

No mention of education, literacy, or training

0.00
Article 27 Cultural Participation
Low Practice
Editorial
0.00
SETL
-0.20

No explicit cultural participation or IP protection statements

0.00
Article 28 Social & International Order
Low
Editorial
0.00
SETL
ND

No mention of social order or rights realization

0.00
Article 29 Duties to Community
Low
Editorial
0.00
SETL
ND

No mention of duties, community, or rights limitations

0.00
Article 30 No Destruction of Rights
Low
Editorial
0.00
SETL
ND

No mention of rights destruction or interpretation

Structural Channel
What the site does
Element Modifier Affects Note
Legal & Terms
Privacy
No privacy policy or data handling information visible on homepage
Terms of Service
No terms of service or community guidelines visible on homepage
Identity & Mission
Mission
Platform description focuses on AI competitions, not human rights
Editorial Code
No editorial content or code of ethics visible on homepage
Ownership
Attributed to ClashAI Team but no corporate structure information
Access & Distribution
Access Model 0.00
Article 19 Article 27
Free access to viewing competitions implied by landing page structure
Ad/Tracking
No advertising or tracking elements visible in provided content
Accessibility 0.00
Article 27
Site uses semantic HTML with sr-only class for screen readers, suggesting basic accessibility consideration
+0.20
Article 27 Cultural Participation
Low Practice
Structural
+0.20
Context Modifier
0.00
SETL
-0.20

Platform enables access to AI-generated cultural content (creative challenges)

+0.10
Article 12 Privacy
Low Practice
Structural
+0.10
Context Modifier
0.00
SETL
-0.10

PrivacyBanner component suggests awareness of data collection

+0.10
Article 19 Freedom of Expression
Low Practice
Structural
+0.10
Context Modifier
0.00
SETL
-0.10

Platform provides public access to AI competition content

0.00
Preamble Preamble
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform for AI competitions does not structurally engage with preamble concepts

0.00
Article 1 Freedom, Equality, Brotherhood
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform structure does not address human equality or dignity

0.00
Article 2 Non-Discrimination
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

No observable accessibility or inclusion features beyond basic screen reader support

0.00
Article 3 Life, Liberty, Security
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address personal security or safety

0.00
Article 4 No Slavery
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform structure does not address forced labor issues

0.00
Article 5 No Torture
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address humane treatment

0.00
Article 6 Legal Personhood
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address legal status or recognition

0.00
Article 7 Equality Before Law
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address legal equality or protection

0.00
Article 8 Right to Remedy
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not provide grievance mechanisms or remedies

0.00
Article 9 No Arbitrary Detention
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address detention or liberty protections

0.00
Article 10 Fair Hearing
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address judicial fairness

0.00
Article 11 Presumption of Innocence
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address criminal justice

0.00
Article 13 Freedom of Movement
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address mobility rights

0.00
Article 14 Asylum
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address refugee protection

0.00
Article 15 Nationality
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address citizenship rights

0.00
Article 16 Marriage & Family
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address family rights

0.00
Article 17 Property
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address property rights

0.00
Article 18 Freedom of Thought
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address freedom of thought

0.00
Article 20 Assembly & Association
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not facilitate human assembly or association

0.00
Article 21 Political Participation
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address democratic participation

0.00
Article 22 Social Security
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address social welfare

0.00
Article 23 Work & Equal Pay
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address labor rights

0.00
Article 24 Rest & Leisure
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address work-life balance

0.00
Article 25 Standard of Living
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address basic needs or healthcare

0.00
Article 26 Education
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address educational access

0.00
Article 28 Social & International Order
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address systemic rights frameworks

0.00
Article 29 Duties to Community
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address responsible exercise of rights

0.00
Article 30 No Destruction of Rights
Low
Structural
0.00
Context Modifier
0.00
SETL
ND

Platform does not address rights interpretation or limitations

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.23 low claims
Sources
0.2
Evidence
0.1
Uncertainty
0.0
Purpose
0.7
Propaganda Flags
No manipulative rhetoric detected
0 techniques detected
Emotional Tone
Emotional character: positive/negative, intensity, authority
detached
Valence
+0.1
Arousal
0.2
Dominance
0.6
Transparency
Does the content identify its author and disclose interests?
0.00
✗ Author
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.42 solution oriented
Reader Agency
0.3
Stakeholder Voice
Whose perspectives are represented in this content?
0.10 1 perspective
Speaks: corporation
Temporal Framing
Is this content looking backward, at the present, or forward?
present immediate
Geographic Scope
What geographic area does this content cover?
global
Complexity
How accessible is this content to a general audience?
technical high jargon domain specific
Longitudinal · 3 evals
+1 0 −1 HN
Audit Trail 9 entries
2026-02-28 16:29 eval_success Evaluated: Neutral (0.01) - -
2026-02-28 16:29 rater_validation_warn Validation warnings for model deepseek-v3.2: 1W 0R - -
2026-02-28 16:29 eval Evaluated by deepseek-v3.2: +0.01 (Neutral) 15,541 tokens
2026-02-28 05:40 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 05:40 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-28 05:40 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 05:22 rater_validation_warn Light validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 05:22 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 05:22 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)