+0.05 PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks (vibrantlabs.com S:0.00 )
38 points by shahules 4 days ago | 9 comments on HN | Neutral Editorial · v3.7 · 2026-02-26 00:19:18 0
Summary Free Expression & Academic Access Acknowledges
This URL hosts a technical research blog post on AI agent evaluation titled 'PA Bench: Evaluating Web Agents on Real World Personal Assistant Workflows.' The content demonstrates minimal human rights engagement, with observable signals only in Article 19 (free expression) and Article 26 (education) through open publication and access to research. The site operates a permissive platform for knowledge dissemination but contains no explicit human rights advocacy, frameworks, or protections.
Article Heatmap
Preamble: ND — Preamble Preamble: No Data — Preamble P Article 1: ND — Freedom, Equality, Brotherhood Article 1: No Data — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: ND — Life, Liberty, Security Article 3: No Data — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: ND — Equality Before Law Article 7: No Data — Equality Before Law 7 Article 8: ND — Right to Remedy Article 8: No Data — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: ND — Privacy Article 12: No Data — Privacy 12 Article 13: ND — Freedom of Movement Article 13: No Data — Freedom of Movement 13 Article 14: ND — Asylum Article 14: No Data — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: ND — Marriage & Family Article 16: No Data — Marriage & Family 16 Article 17: ND — Property Article 17: No Data — Property 17 Article 18: ND — Freedom of Thought Article 18: No Data — Freedom of Thought 18 Article 19: 0.00 — Freedom of Expression 19 Article 20: ND — Assembly & Association Article 20: No Data — Assembly & Association 20 Article 21: ND — Political Participation Article 21: No Data — Political Participation 21 Article 22: ND — Social Security Article 22: No Data — Social Security 22 Article 23: ND — Work & Equal Pay Article 23: No Data — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: ND — Standard of Living Article 25: No Data — Standard of Living 25 Article 26: +0.06 — Education 26 Article 27: ND — Cultural Participation Article 27: No Data — Cultural Participation 27 Article 28: ND — Social & International Order Article 28: No Data — Social & International Order 28 Article 29: ND — Duties to Community Article 29: No Data — Duties to Community 29 Article 30: ND — No Destruction of Rights Article 30: No Data — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean +0.05 Structural Mean 0.00
Weighted Mean +0.03 Unweighted Mean +0.03
Max +0.06 Article 26 Min 0.00 Article 19
Signal 2 No Data 29
Volatility 0.03 (Low)
Negative 0 Channels E: 0.6 S: 0.4
SETL +0.10 Editorial-dominant
FW Ratio 56% 5 facts · 4 inferences
Evidence 4% coverage
2M 29 ND
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.00 (0 articles) Security: 0.00 (0 articles) Legal: 0.00 (0 articles) Privacy & Movement: 0.00 (0 articles) Personal: 0.00 (0 articles) Expression: 0.00 (1 articles) Economic & Social: 0.00 (0 articles) Cultural: 0.06 (1 articles) Order & Duties: 0.00 (0 articles)
HN Discussion 3 top-level · 2 replies
abhijithneil 2026-02-25 22:04 UTC link
Is there a possible way computer use can be automated using multiple computer use agents from different providers, but also with some sort of routing setup so the best course of action can be chosen without hitting failures (for eg: permission issues in OpenAI could be rerouted to Gemini)
mrorigo 2026-02-26 07:41 UTC link
I just don't get why would you would want an agent to use the browser to do these mundane things (check email, work with calendar etc), when you can simply give it a few tools, and save maybe six gazillion tokens per task?
AIorNot 2026-02-26 08:32 UTC link
Well if these guys computer action model works as they intended (ground up video trained model)

https://news.ycombinator.com/item?id=47125014

maybe this benchmark will be conquered far faster then expected

shahules 2026-02-25 23:05 UTC link
There are few agents like browser-use, skyvern etc that may provide this capability.
shenberg 2026-02-26 11:42 UTC link
Using existing enterprise apps probably - this solution is scalable for the vendor and it's easier to sell using existing software as-is than to start out by writing new custom tools.
Editorial Channel
What the content says
+0.10
Article 26 Education
Medium Advocacy
Editorial
+0.10
SETL
+0.10

Blog post appears to be technical research that could be educational in nature. No explicit educational content visible in provided text, but academic research dissemination serves educational function.

0.00
Article 19 Freedom of Expression
Medium Advocacy
Editorial
0.00
SETL
ND

Content is a technical research blog post on AI agents. Title and structure suggest neutral academic/technical presentation. No explicit advocacy for or restriction of free expression, but publication format demonstrates freedom to disseminate research findings.

ND
Preamble Preamble
ND

No observable content addressing dignity, rights, or foundational human rights principles.

ND
Article 1 Freedom, Equality, Brotherhood
ND

No discussion of equality, dignity, or freedom.

ND
Article 2 Non-Discrimination
ND

No content addressing discrimination or equal rights regardless of status.

ND
Article 3 Life, Liberty, Security
ND

No discussion of right to life, liberty, or security of person.

ND
Article 4 No Slavery
ND

No content addressing slavery or forced servitude.

ND
Article 5 No Torture
ND

No discussion of torture or cruel treatment.

ND
Article 6 Legal Personhood
ND

No content addressing legal personhood or right to recognition.

ND
Article 7 Equality Before Law
ND

No discussion of equal protection under law.

ND
Article 8 Right to Remedy
ND

No content addressing right to legal remedy.

ND
Article 9 No Arbitrary Detention
ND

No discussion of protection from arbitrary detention.

ND
Article 10 Fair Hearing
ND

No content addressing right to fair trial or due process.

ND
Article 11 Presumption of Innocence
ND

No discussion of legal presumptions or criminal liability.

ND
Article 12 Privacy
ND

No content addressing right to privacy, family, or correspondence.

ND
Article 13 Freedom of Movement
ND

No discussion of freedom of movement or residence.

ND
Article 14 Asylum
ND

No content addressing right to asylum or refuge.

ND
Article 15 Nationality
ND

No discussion of right to nationality or citizenship.

ND
Article 16 Marriage & Family
ND

No content addressing marriage, family, or property rights.

ND
Article 17 Property
ND

No discussion of property rights or protection.

ND
Article 18 Freedom of Thought
ND

No content addressing freedom of thought, conscience, or religion.

ND
Article 20 Assembly & Association
ND

No content addressing freedom of peaceful assembly or association.

ND
Article 21 Political Participation
ND

No discussion of political participation or democratic governance.

ND
Article 22 Social Security
ND

No content addressing social security or welfare rights.

ND
Article 23 Work & Equal Pay
ND

No discussion of right to work or employment.

ND
Article 24 Rest & Leisure
ND

No content addressing rest, leisure, or work limitations.

ND
Article 25 Standard of Living
ND

No discussion of adequate standard of living or health.

ND
Article 27 Cultural Participation
ND

No content addressing cultural participation or intellectual property.

ND
Article 28 Social & International Order
ND

No discussion of social order or international framework.

ND
Article 29 Duties to Community
ND

No content addressing community responsibility or limitations of rights.

ND
Article 30 No Destruction of Rights
ND

No content addressing prohibition of rights destruction.

Structural Channel
What the site does
0.00
Article 19 Freedom of Expression
Medium Advocacy
Structural
0.00
Context Modifier
0.00
SETL
ND

Website allows public access to technical research content without apparent barriers. No content moderation or censorship signals observed. Blog post is openly accessible.

0.00
Article 26 Education
Medium Advocacy
Structural
0.00
Context Modifier
0.00
SETL
+0.10

Website provides open access to technical research content. Free access to research materials supports educational access without discrimination.

ND
Preamble Preamble
ND

Site structure provides no signals regarding dignity or foundational rights commitments.

ND
Article 1 Freedom, Equality, Brotherhood
ND

No structural signals regarding equal treatment or dignity.

ND
Article 2 Non-Discrimination
ND

No observable non-discrimination practices in site design.

ND
Article 3 Life, Liberty, Security
ND

No structural engagement with security or safety provisions.

ND
Article 4 No Slavery
ND

No observable labor practice statements.

ND
Article 5 No Torture
ND

No structural signals regarding safety or protection from harm.

ND
Article 6 Legal Personhood
ND

No structural provisions for legal recognition or status.

ND
Article 7 Equality Before Law
ND

No observable equal protection or anti-discrimination structures.

ND
Article 8 Right to Remedy
ND

No accessible grievance or remedy procedures observable.

ND
Article 9 No Arbitrary Detention
ND

Not applicable to this content type.

ND
Article 10 Fair Hearing
ND

No dispute resolution or due process mechanisms observable.

ND
Article 11 Presumption of Innocence
ND

No structural engagement with legal process protections.

ND
Article 12 Privacy
ND

No privacy policy or data protection statements observed on page.

ND
Article 13 Freedom of Movement
ND

Not applicable to this content.

ND
Article 14 Asylum
ND

Not applicable to this platform.

ND
Article 15 Nationality
ND

No observable nationality-based structures.

ND
Article 16 Marriage & Family
ND

No structural engagement with family or property protections.

ND
Article 17 Property
ND

No observable property or ownership provisions.

ND
Article 18 Freedom of Thought
ND

No structural protections for belief systems observable.

ND
Article 20 Assembly & Association
ND

No community organizing or association features observable.

ND
Article 21 Political Participation
ND

No participation or governance mechanisms observable.

ND
Article 22 Social Security
ND

No observable social support provisions.

ND
Article 23 Work & Equal Pay
ND

No observable employment or labor practice policies.

ND
Article 24 Rest & Leisure
ND

No structural provisions for work-life balance or rest.

ND
Article 25 Standard of Living
ND

No observable health or welfare provisions.

ND
Article 27 Cultural Participation
ND

No observable cultural or intellectual property provisions.

ND
Article 28 Social & International Order
ND

No observable commitment to international rights frameworks.

ND
Article 29 Duties to Community
ND

No observable community-oriented governance provisions.

ND
Article 30 No Destruction of Rights
ND

No structural safeguards against rights elimination observable.

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.38 low claims
Sources
0.3
Evidence
0.4
Uncertainty
0.2
Purpose
0.7
Propaganda Flags
No manipulative rhetoric detected
0 techniques detected
Emotional Tone
Emotional character: positive/negative, intensity, authority
measured
Valence
+0.1
Arousal
0.2
Dominance
0.5
Transparency
Does the content identify its author and disclose interests?
0.00
✗ Author
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.50 mixed
Reader Agency
0.5
Stakeholder Voice
Whose perspectives are represented in this content?
0.20 1 perspective
Speaks: institution
Temporal Framing
Is this content looking backward, at the present, or forward?
present unspecified
Geographic Scope
What geographic area does this content cover?
global
Complexity
How accessible is this content to a general audience?
technical high jargon domain specific
Longitudinal 1076 HN snapshots · 8 evals
+1 0 −1 HN
Audit Trail 28 entries
2026-02-28 14:02 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 14:02 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
tech blog neutral
2026-02-27 16:34 eval_success Light evaluated: Neutral (0.00) - -
2026-02-27 16:34 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-26 20:27 dlq Dead-lettered after 1 attempts: PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks - -
2026-02-26 20:24 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:23 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:22 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:51 dlq Dead-lettered after 1 attempts: PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks - -
2026-02-26 17:49 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:48 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:47 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 15:38 eval_success Evaluated: Neutral (0.01) - -
2026-02-26 15:38 eval Evaluated by deepseek-v3.2: +0.01 (Neutral) 17,792 tokens
2026-02-26 09:20 dlq Dead-lettered after 1 attempts: PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks - -
2026-02-26 09:19 dlq Dead-lettered after 1 attempts: PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks - -
2026-02-26 09:19 dlq Dead-lettered after 1 attempts: PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks - -
2026-02-26 09:19 dlq Dead-lettered after 1 attempts: PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks - -
2026-02-26 09:18 rate_limit OpenRouter rate limited (429) model=mistral-small-3.1 - -
2026-02-26 09:17 rate_limit OpenRouter rate limited (429) model=hermes-3-405b - -
2026-02-26 09:17 rate_limit OpenRouter rate limited (429) model=qwen3-next-80b - -
2026-02-26 09:17 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 09:17 rate_limit OpenRouter rate limited (429) model=hermes-3-405b - -
2026-02-26 00:19 eval Evaluated by claude-haiku-4-5-20251001: +0.03 (Neutral) 18,030 tokens +0.01
2026-02-25 23:29 eval Evaluated by claude-haiku-4-5-20251001: +0.02 (Neutral) 17,991 tokens -0.24
2026-02-25 23:02 eval Evaluated by claude-haiku-4-5-20251001: +0.26 (Mild positive) 18,422 tokens +0.16
2026-02-25 22:36 eval Evaluated by claude-haiku-4-5-20251001: +0.10 (Mild positive) 15,125 tokens +0.01
2026-02-25 22:08 eval Evaluated by claude-haiku-4-5-20251001: +0.09 (Neutral) 15,665 tokens