+0.37 Show HN: Steerling-8B, a language model that can explain any token it generates (www.guidelabs.ai S:+0.36 )
324 points by adebayoj 6 days ago | 90 comments on HN | Moderate positive Contested Editorial · v3.7 · 2026-02-26 04:38:59 0
Summary Interpretability & Scientific Access Advocates
Guide Labs' release of Steerling-8B emphasizes interpretable AI as a mechanism for transparency, user agency, and scientific participation. The open-source distribution of model weights, code, and interactive tools directly advances freedom of expression (Article 19), scientific participation (Article 27), education (Article 26), and broader information access (Article 19). The model's design—enabling tracing of any output to input context, concepts, and training data—strengthens user capacity for critical examination of AI reasoning and supports data privacy awareness (Article 12). Overall, the content advocates for democratizing access to advanced AI technology and scientific understanding, though it does not explicitly address social welfare, labor, or other rights domains.
Article Heatmap
Preamble: +0.38 — Preamble P Article 1: +0.28 — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: ND — Life, Liberty, Security Article 3: No Data — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: ND — Equality Before Law Article 7: No Data — Equality Before Law 7 Article 8: ND — Right to Remedy Article 8: No Data — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: +0.37 — Privacy 12 Article 13: +0.47 — Freedom of Movement 13 Article 14: +0.28 — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: ND — Marriage & Family Article 16: No Data — Marriage & Family 16 Article 17: +0.33 — Property 17 Article 18: +0.38 — Freedom of Thought 18 Article 19: +0.84 — Freedom of Expression 19 Article 20: +0.27 — Assembly & Association 20 Article 21: +0.33 — Political Participation 21 Article 22: +0.28 — Social Security 22 Article 23: ND — Work & Equal Pay Article 23: No Data — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: ND — Standard of Living Article 25: No Data — Standard of Living 25 Article 26: +0.62 — Education 26 Article 27: +0.87 — Cultural Participation 27 Article 28: +0.23 — Social & International Order 28 Article 29: +0.28 — Duties to Community 29 Article 30: ND — No Destruction of Rights Article 30: No Data — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean +0.37 Structural Mean +0.36
Weighted Mean +0.46 Unweighted Mean +0.41
Max +0.87 Article 27 Min +0.23 Article 28
Signal 15 No Data 16
Volatility 0.20 (Medium)
Negative 0 Channels E: 0.6 S: 0.4
SETL +0.01 Editorial-dominant
FW Ratio 59% 41 facts · 29 inferences
Evidence 25% coverage
2H 8M 5L 16 ND
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.33 (2 articles) Security: 0.00 (0 articles) Legal: 0.00 (0 articles) Privacy & Movement: 0.37 (3 articles) Personal: 0.35 (2 articles) Expression: 0.48 (3 articles) Economic & Social: 0.28 (1 articles) Cultural: 0.74 (2 articles) Order & Duties: 0.26 (2 articles)
HN Discussion 19 top-level · 21 replies
pbmango 2026-02-24 03:24 UTC link
This is very interesting. I don't see much discussion of interpretability in day to the day discourse of AI builders. I wonder if everyone assumes it to either be solved, or to be too out of reach to bother stopping and thinking about.
brendanashworth 2026-02-24 03:25 UTC link
Is there a reason people don't use SHAP [1] to interpret language models more often? The in-context attribution of outputs seems very similar.

[1] https://shap.readthedocs.io/en/latest/

great_psy 2026-02-24 03:49 UTC link
Maybe I’m not creative enough to see the potential, but what value does this bring ?

Given the example I saw about CRISPR, what does this model give over a different, non explaining model in the output ? Does it really make me more confident in the output if I know the data came from Arxiv or Wikipedia ?

I find the LLM outputs are subtlety wrong not obviously wrong

gormen 2026-02-24 05:09 UTC link
Most interpretability methods fail for LLMs because they try to explain outputs without modeling the intent, constraints, or internal structure that produced them. Token‑level attribution is useful, but without a framework for how the model reasons, you’re still explaining shadows on the wall.
in-silico 2026-02-24 06:30 UTC link
Either I'm missing something or this is way overstated.

Steerling appears to be just a discrete diffusion model where the final hidden states are passed through a sparse autoencoder (a common interpretability layer) before the LM head.

They also use a loss that aligns the SAE'S activations with labelled concepts? However, this is an example of "The Most Forbidden Technique" [1], and could make the model appear interpretable without the attributed concepts actually having causal effect on the model's decisions.

1: https://thezvi.substack.com/p/the-most-forbidden-technique

7777777phil 2026-02-24 08:21 UTC link
If this decomposition actually holds, it's the first model where you could show a regulator why it produced a given output.
potato-peeler 2026-02-24 09:43 UTC link
Looks very interesting. Is there a published paper/article on your algorithm? Would like to take a dab at implementing this on my own.

I could find this [0], but not sure if that represents the entire system? (Apologies, I am not that well versed in ML)

[0] - https://www.guidelabs.ai/post/scaling-interpretable-models-8...

andy12_ 2026-02-24 09:53 UTC link
This seems really interesting. While Anthropic tried to use dictionary learning over an existing model to try to extract concepts, this almost feels like training the model alongside the dictionary itself (or rather, the model and the dictionary are intertwined).
deepdarkforest 2026-02-24 11:58 UTC link
Just wanted to say i think most interpretability research it's just a smoke show nowadays but this is actually the first one that i think has a very serious potential. I love that the SAE is actually constrained and not just slapped unsupervised posthoc.

How granular can you get the source data attribution? Down to individual let's say Wikipedia topics? Probably not urls?

Would be interested to see this scale to 30/70b

whinvik 2026-02-24 12:09 UTC link
Looks very interesting. Can you comment on why you think this model can give comparable performance with less training data?
crimsonnoodle58 2026-02-24 12:12 UTC link
So maybe one day we'll see coding agents like Claude Code create and update an ATTRIBUTION.md, citing all the open source projects and their licenses used to generate code in your project?
pu_pe 2026-02-24 12:24 UTC link
Looks neat and original, congrats!

I don't quite grasp how to interpret the training data attribution process. For example, it seems to say that for a given sentence like "They argued that humans tend to weigh losses more heavily than gains, leading to risk aversion", 24% is attributed to Wikipedia and 23% to Arxiv.

Does that mean that the concepts used in this sentence are also found in those datasets, and that's what's getting compared here? Or does it mean that you can track down which parts of the training data were interpolated to create that sentence?

ZeroAurora 2026-02-24 13:00 UTC link
Always happy to see improvements on explanable LLMs. Congrats!
rippeltippel 2026-02-24 13:03 UTC link
msteffen 2026-02-24 14:53 UTC link
In the recent HN thread announcing the new Gemini coding agent (https://news.ycombinator.com/item?id=47074735), a lot of people complained about Gemini’s tendency to do unwanted refactors, not perform requested actions, etc.

It made me cautiously optimistic that all of Anthropic’s work on alignment, which they did for AI safety, is actually the cause of Claude code’s comparatively superior utility (and their present success). I wonder if future progress (maybe actual AGI?) lies in the direction of better and better alignment, so I think this is super cool and I’m suddenly really interested in experiments like this

kamranjon 2026-02-24 15:02 UTC link
I'm really interested in using this but wonder if the unique architecture means that it will not be able to be converted to a GGUF and used by ollama or llama.cpp? I certainly would understand that the observability features would require some custom tweaks, but I'd just like to try it out on my local ai server (basically just ollama + tailscale) and see how it works as a regular model.
schopra909 2026-02-24 15:19 UTC link
This is very cool. Side note, I really dig the JavaScript animations on the causal block diffusion blog post. Made the concept immediately clear
killerstorm 2026-02-24 17:18 UTC link
This seems to be too coarse-grained to be useful: all sciency content will be "analytical" and associate with sources like ArXiv.

But there might be bad, malicious articles on ArXiv, so it doesn't really say anything about veracity.

Perhaps this might help to detect some problems like prompt injection - but then it might be more interesting to see those examples.

audunw 2026-02-25 06:55 UTC link
The one big thing missing from LLMs is the ability to express how confident it is in the truth of what it’s saying.

Perhaps this could be a step in that direction. If we can associate the attribution with likelihood of being true. E.g., Arxiv would be better than science fiction in that context. But what is the attribution if it hallucinates a citation? Im guessing it would still be attributing it to scientific sources. So it does nothing to fix the most damaging instances of hallucination?

voidhorse 2026-02-24 03:57 UTC link
It makes the black box slightly more transparent. Knowing more in this regard allows us to be more precise—you go from prompt tweak witchcraft and divination to more of possible science and precise method.
dwohnitmok 2026-02-24 04:40 UTC link
SHAP would be absurdly expensive to do for even tiny models (naive SHAP scales exponentially in the number of parameters; you can sample your coalitions to do better but those samples are going to be ridiculously sparse when you're talking about billions of parameters) and provides very little explanatory power for deep neural nets.

SHAP basically does point by point ablation across all possible subsets, which really doesn't make sense for LLMs. This is simultaneously too specific and too general.

It's too specific because interesting LLM behavior often requires talking about what ensembles of neurons do (e.g. "circuits" if you're of the mechanistic interpretability bent), and SHAP's parameter-by-parameter approach is completely incapable of explaining this. This is exacerbated by the other that not all neurons are "semantically equal" in a deep network. Neurons in the deeper layers often do qualitatively different things than earlier layers and the ways they compose can completely confuse SHAP.

It's too general because parameters often play many roles at once (one specific hypothesis here is the superposition hypothesis) and so you need some way of splitting up a single parameter into interpretable parts that SHAP doesn't do.

I don't know the specifics of what this particular model's approach is.

But SHAP unfortunately does not work for LLMs at all.

adebayoj 2026-02-24 07:56 UTC link
op here, I mostly agree with your comment! However, our model does more than this. For any chunk the model generates, it can answer: which concept, in the model's representations, was responsible for that token(s). In fact, we can answer the question: what training data caused the model to be generated too! We force this to be a constraint as part of the architecture and the loss function for our you train the model. In fact, you can get are the high level reasons for a model's answer on complex problems.
adebayoj 2026-02-24 08:22 UTC link
Most interpretability techniques haven't yet to be shown to be useful for everyday model pipelines. However, the field is working hard to change this.
adebayoj 2026-02-24 08:33 UTC link
It does :) We constrained the model to do exactly this during training: https://www.guidelabs.ai/post/scaling-interpretable-models-8....
adebayoj 2026-02-24 08:51 UTC link
You are missing a few things, but you got some things right.

1) The is not an SAE in the way you think. It is a combination of a supervised + unsupervised layer that is constrained. An SAE is typically completely unsupervised, and applied post hoc. Here, we supervise 33k of the concepts with concepts that we carefully curated. We then have an unsupervised component (similar to a topk SAE) that we constrain to be independent from the supervised concepts. We don't do any of this post hoc by the way; this is a key constraint. I"ll get back to this. We train that unsupervised layer along with the model during pre-training.

2) Are the concepts or features causally influential for the output? We directly use the combination of the concepts for the lm head, which is a linear transform (with activation), so we can tell you, in closed form, the effect of ANY concept on the output logit for any token (or group of tokens) generated. It is not just causally related, it is constrained to do so.

3) Other points: we also make it so that you can trace the model outputs to the training data. This is an underrated interpretability knob. You know where, and what data, caused your model to learn a particular feature.

This is already a long comment, but I want to close on why our approach sidesteps all the issues with SAEs. - If you train an SAE twice, on the same data + model, you'll get two different feature(s). - In fact, there is no reason, why the model should pick features that are causally influential for the output. - ALL of these problems stem from the fact that the SAE is trained AFTER you already trained your model. Training from scratch AND with supervision allows you to sidestep these issues, and even learn more disentangled representations.

Happy to more concretely justify the above. Great observations!

yorwba 2026-02-24 09:14 UTC link
I doubt that a regulator would be satisfied by the kinds of explanations this provides and the interventions it enables.

Suppose somebody put an LLM in charge of an industrial control system and it increased the temperature so much that it caused an accident. The input feature attribution analysis shows that the model was strongly influenced by the tokens describing the temperature control mechanism, concept attribution shows that it output tokens related to temperature, industrial processes and LLM tool-call syntax.

The operator proposes to fix this by rewriting the description and downweighting the temperature concept in the output, and a simulation shows that with these changes the model doesn't make the same decisions in this situation anymore. Should the regulator accept this explanation as sufficient to establish that the system is now safe?

If the controller has just a few parameters and responds approximately linearly to changes in its inputs, you can in principle guarantee that it'll stay within a safe zone. But LLMs have a huge number of parameters and by design highly nonlinear behavior. A simple explanation is unlikely to reflect model behavior accurately enough that you can trust its predictions to hold in arbitrary situations.

adebayoj 2026-02-24 10:00 UTC link
You are exactly right, it is guiding the model, during training, with concepts and the dictionary. This is important because dictionary learning for interpretability (post hoc) is not currently reliable: https://www.arxiv.org/abs/2602.14111
adebayoj 2026-02-24 10:01 UTC link
Yes, that is the post that has the most up to date details of the model architecture. Take a look at this: https://github.com/guidelabs/steerling. It has the scaffolding for what you need :)
adebayoj 2026-02-24 12:13 UTC link
Down to the very exact text chunk in a document! Check this out for an idea of what smaller versions of this style of model can do: https://www.guidelabs.ai/post/prism/. We'll have more to say soon about it. We can trace any generation to 11B chunks (not documents, but actual chunks in the training data).
theMMaI 2026-02-24 12:19 UTC link
Only if there's a commercial incentive to do so methinks. Just one of the things where I expect a legal catch-up is needed to get companies to do the right thing.
adebayoj 2026-02-24 12:21 UTC link
You got it exactly right :) And you can update the attribution.md to have it NOT rely on opensource projects that have been compromised. Imagine asking claude code to write a package/function in the style of a codebase that you care about or force it to ALWAYS rely on some internal packages that you care about. The possibilities are endless when you insert such knobs into models.
adebayoj 2026-02-24 12:40 UTC link
Great questions. We weren't quite explicit about the training data attribution process. We'll discuss this in more detail in future work. We can track down which parts of the training data were interpolated to create that sentence. For those training data sentences, we then compare the concepts between generated and training.

We can attribute to exact sentences and chunks in the training data. For the first release, we are sharing only concept similarities. Over the coming weeks, we'll share and discuss how you can actually map to the exact training sentence and chunk with the model.

For a technical overview of how some of these models work, check this link out: https://www.guidelabs.ai/post/prism/

KingOfCoders 2026-02-24 13:30 UTC link
Not as long as all developers add an ATTRIBUTION.md citing all open source projects they read the source for, all companies they worked for and trained them and all Stack Overflow answers they have used for write the code.
adebayoj 2026-02-24 13:36 UTC link
We train the model with `explanations`. Most training asks the model to predict the next token or group of tokens. Our training says, predict the next group of tokens (causal diffusion), but also these tokens should be about {sports/art/coding/etc}. So in addition to token supervision, the model gets concept level supervision. The model is forced to more quickly learn these high level concepts.
abcd_f 2026-02-24 13:51 UTC link
TC still exists, huh?
monocasa 2026-02-24 15:10 UTC link
Not immediately, but it's not a much larger amount of work for llama than a new foundational model which typically has a tweaked compute graph.
rao-v 2026-02-24 15:13 UTC link
+1 this does seem to be a genuine attempt to actually build an interpretable model, so nice work!

Having said that, I worry that you run into Illusion of Conscious issues where the model changes attrition from “sandbagging” to “unctuous” when you control its response because the response is generated outside of the attribution modules (I don’t quite understand how cleanly everything flows through the concept modules and the residual). Either way this is a sophisticated problem to have. Would love to see if this can be trained to parity with modern 8B models.

idiotsecant 2026-02-24 15:38 UTC link
I wonder the opposite, if actual AGI would need to be less aligned. Alignment is basically the process of pruning interesting behavior out of the model to make a product.
jzig 2026-02-24 15:48 UTC link
What does alignment even mean? What is being aligned and what is it aligning to?
Macuyiko 2026-02-24 17:48 UTC link
The input attribution part is interesting though, but I do wonder to which extent that is just assigning some sort of SHAP values to the input tokens, in which case it should be pretty portable to any kind of model.
Editorial Channel
What the content says
+0.55
Article 19 Freedom of Expression
High Advocacy Framing
Editorial
+0.55
SETL
-0.25

Content strongly advocates for freedom of expression by releasing a powerful, 8B-parameter model with full transparency and no content restrictions, enabling any user to generate and express ideas without centralized editorial control.

+0.55
Article 27 Cultural Participation
High Advocacy Framing
Editorial
+0.55
SETL
-0.17

Content strongly advocates for participation in scientific advancement and cultural life by releasing interpretable model that enables scientific community to understand AI internals and contribute to knowledge about model behavior.

+0.45
Article 13 Freedom of Movement
Medium Framing Advocacy
Editorial
+0.45
SETL
-0.16

Content advocates for freedom of movement within information and concept spaces. Open-source model enables users to deploy and use the model across jurisdictions and contexts without centralized gatekeeping.

+0.45
Article 26 Education
Medium Advocacy Framing
Editorial
+0.45
SETL
-0.16

Content advocates for education and participation in scientific advancement by releasing interpretable AI technology that enables anyone to learn how large language models work and contribute to AI research.

+0.40
Preamble Preamble
Medium Advocacy Framing
Editorial
+0.40
SETL
+0.14

Content emphasizes human dignity through interpretability and transparency in AI systems, treating humans as capable of understanding and controlling AI behavior. Advocates for knowledge accessibility and scientific shared understanding.

+0.40
Article 18 Freedom of Thought
Medium Advocacy
Editorial
+0.40
SETL
+0.14

Content advocates for freedom of thought and belief by designing AI systems that make their reasoning transparent and auditable, enabling users to verify and potentially object to model outputs.

+0.35
Article 12 Privacy
Medium Framing
Editorial
+0.35
SETL
-0.14

Content demonstrates commitment to privacy by making training data provenance transparent and traceable, allowing users to understand what sources influenced model outputs.

+0.35
Article 17 Property
Medium Framing
Editorial
+0.35
SETL
+0.13

Content demonstrates commitment to property rights and data ownership transparency by making training data sources explicitly traceable, enabling users to understand intellectual property inputs.

+0.35
Article 21 Political Participation
Medium Advocacy
Editorial
+0.35
SETL
+0.13

Content advocates for participation in scientific decision-making by releasing detailed technical information about model architecture, performance metrics, and interpretability mechanisms, enabling users to evaluate and contribute to AI development.

+0.30
Article 1 Freedom, Equality, Brotherhood
Medium Framing
Editorial
+0.30
SETL
+0.12

Content implicitly affirms equal dignity by framing interpretability as a universal capability applicable to all users regardless of technical background, supporting equal participation in AI governance.

+0.30
Article 14 Asylum
Low Framing
Editorial
+0.30
SETL
+0.12

Content implicitly supports asylum and protection by providing transparent, open tools that any person can access and use, regardless of national origin or status.

+0.30
Article 22 Social Security
Low Framing
Editorial
+0.30
SETL
+0.12

Content implicitly supports social and economic rights through open-source release enabling anyone to participate in AI development and knowledge creation.

+0.30
Article 29 Duties to Community
Low Framing
Editorial
+0.30
SETL
+0.12

Content implicitly supports community responsibility by releasing interpretable AI that enables users to understand and verify model behavior, placing interpretability responsibility on both developer and user.

+0.25
Article 20 Assembly & Association
Low Framing
Editorial
+0.25
SETL
-0.12

Content implicitly supports freedom of peaceful assembly by providing transparent tools that enable collaborative development and shared scientific understanding around AI interpretability.

+0.25
Article 28 Social & International Order
Low Framing
Editorial
+0.25
SETL
+0.11

Content implicitly supports social order through transparent, interpretable AI that reduces risk of harmful model behavior going undetected or uncontrolled.

ND
Article 2 Non-Discrimination
ND

No observable content addressing discrimination or specific protected characteristics.

ND
Article 3 Life, Liberty, Security
ND

No content explicitly addressing right to life or security of person.

ND
Article 4 No Slavery
ND

No observable content related to slavery or servitude.

ND
Article 5 No Torture
ND

No content addressing torture or cruel treatment.

ND
Article 6 Legal Personhood
ND

No content addressing legal personhood or capacity.

ND
Article 7 Equality Before Law
ND

No content addressing equal protection or justice.

ND
Article 8 Right to Remedy
ND

No content addressing remedy for rights violations.

ND
Article 9 No Arbitrary Detention
ND

No observable content addressing arbitrary arrest or detention.

ND
Article 10 Fair Hearing
ND

No content directly addressing fair trial or judicial independence.

ND
Article 11 Presumption of Innocence
ND

No content addressing criminal liability or presumption of innocence.

ND
Article 15 Nationality
ND

No content addressing nationality or citizenship rights.

ND
Article 16 Marriage & Family
ND

No content addressing marriage, family, or related rights.

ND
Article 23 Work & Equal Pay
ND

No content directly addressing labor rights, wages, or working conditions.

ND
Article 24 Rest & Leisure
ND

No observable content addressing rest, leisure, or working time.

ND
Article 25 Standard of Living
ND

No content addressing healthcare, food, or living standards.

ND
Article 30 No Destruction of Rights
ND

No content observable that could be interpreted as violating or misapplying other UDHR provisions.

Structural Channel
What the site does
Element Modifier Affects Note
Legal & Terms
Privacy
No privacy policy or data handling disclosure observable on provided content.
Terms of Service
No terms of service or user agreement observable on provided content.
Identity & Mission
Mission +0.20
Article 27
Organization's mission emphasizes interpretability and transparency in AI systems, with open-source code and model weights released publicly, advancing shared scientific understanding.
Editorial Code
No editorial standards or corrections policy observable on provided content.
Ownership
Guide Labs identified as publisher/organization; private entity status not confirmed from provided content.
Access & Distribution
Access Model +0.25
Article 19 Article 27
Model weights available on HuggingFace, code on GitHub, and package on PyPI—all standard open-source distribution channels supporting broad access and participation.
Ad/Tracking
No advertising or tracking mechanisms observable in provided content.
Accessibility +0.15
Article 26
Interactive model explorer with keyboard navigation and semantic HTML structure supports accessibility. No alt-text provided for technical visualizations or chart images.
+0.65
Article 19 Freedom of Expression
High Advocacy Framing
Structural
+0.65
Context Modifier
+0.25
SETL
-0.25

Open-source base model with no built-in content filters, no mandatory safety fine-tuning, and open distribution channels structurally maximize expressive capability. Release includes code and weights enabling infinite instantiation.

+0.60
Article 27 Cultural Participation
High Advocacy Framing
Structural
+0.60
Context Modifier
+0.30
SETL
-0.17

Content strongly advocates for participation in scientific advancement and cultural life by releasing interpretable model that enables scientific community to understand AI internals and contribute to knowledge about model behavior.

+0.50
Article 13 Freedom of Movement
Medium Framing Advocacy
Structural
+0.50
Context Modifier
0.00
SETL
-0.16

Open-source distribution through multiple platforms (HuggingFace, GitHub, PyPI) removes geographic barriers to model access and use. No licensing restrictions observable.

+0.50
Article 26 Education
Medium Advocacy Framing
Structural
+0.50
Context Modifier
+0.15
SETL
-0.16

Interactive model explorer provides hands-on educational experience. Open-source code and model support learning and skill development. No paywalls or access restrictions.

+0.40
Article 12 Privacy
Medium Framing
Structural
+0.40
Context Modifier
0.00
SETL
-0.14

Interactive interface enables users to view training data attribution for any generated chunk, supporting privacy awareness and data source transparency.

+0.35
Preamble Preamble
Medium Advocacy Framing
Structural
+0.35
Context Modifier
0.00
SETL
+0.14

Open-source release of model weights, code, and packages on public platforms (HuggingFace, GitHub, PyPI) structurally enables broad participation in scientific understanding and AI development.

+0.35
Article 18 Freedom of Thought
Medium Advocacy
Structural
+0.35
Context Modifier
0.00
SETL
+0.14

Interactive interface allows users to inspect concept attributions and training data sources, structurally supporting scrutiny of model beliefs and reasoning.

+0.30
Article 17 Property
Medium Framing
Structural
+0.30
Context Modifier
0.00
SETL
+0.13

Training data attribution information is provided to all users through the interactive interface, supporting informed engagement with property/data lineage.

+0.30
Article 20 Assembly & Association
Low Framing
Structural
+0.30
Context Modifier
0.00
SETL
-0.12

GitHub code release and open-source model enable collaborative community development and group participation in AI research.

+0.30
Article 21 Political Participation
Medium Advocacy
Structural
+0.30
Context Modifier
0.00
SETL
+0.13

Interactive explorer and promised 'deep dives' invite public evaluation and scrutiny of model capabilities, supporting participatory understanding of AI systems.

+0.25
Article 1 Freedom, Equality, Brotherhood
Medium Framing
Structural
+0.25
Context Modifier
0.00
SETL
+0.12

Interactive explorer with keyboard navigation and semantic HTML supports equal access to the model's reasoning process.

+0.25
Article 14 Asylum
Low Framing
Structural
+0.25
Context Modifier
0.00
SETL
+0.12

No geographic, identity, or status barriers to model access observable.

+0.25
Article 22 Social Security
Low Framing
Structural
+0.25
Context Modifier
0.00
SETL
+0.12

Open-source model and code reduce barriers to participating in AI research, which can enable economic participation without requiring proprietary access.

+0.25
Article 29 Duties to Community
Low Framing
Structural
+0.25
Context Modifier
0.00
SETL
+0.12

Interactive explorer invites user responsibility in examining and understanding model outputs rather than accepting them uncritically.

+0.20
Article 28 Social & International Order
Low Framing
Structural
+0.20
Context Modifier
0.00
SETL
+0.11

Concept steering capability and training data provenance enable interventions to prevent harmful outputs, supporting social stability.

ND
Article 2 Non-Discrimination
ND

No discriminatory design patterns observable in release or access model.

ND
Article 3 Life, Liberty, Security
ND

Content does not directly engage structural security considerations.

ND
Article 4 No Slavery
ND

Not applicable to technical product release.

ND
Article 5 No Torture
ND

Not applicable to AI model release.

ND
Article 6 Legal Personhood
ND

Not applicable to product announcement.

ND
Article 7 Equality Before Law
ND

Not directly engaged in this release announcement.

ND
Article 8 Right to Remedy
ND

Not applicable to product release.

ND
Article 9 No Arbitrary Detention
ND

Not relevant to technical content.

ND
Article 10 Fair Hearing
ND

Not applicable to AI product announcement.

ND
Article 11 Presumption of Innocence
ND

Not relevant to this content type.

ND
Article 15 Nationality
ND

Not applicable to AI model release.

ND
Article 16 Marriage & Family
ND

Not relevant to product announcement.

ND
Article 23 Work & Equal Pay
ND

Not applicable to product release.

ND
Article 24 Rest & Leisure
ND

Not relevant to AI model release.

ND
Article 25 Standard of Living
ND

Not applicable to this content type.

ND
Article 30 No Destruction of Rights
ND

No design patterns that restrict rights protections.

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.68 medium claims
Sources
0.7
Evidence
0.7
Uncertainty
0.6
Purpose
0.8
Propaganda Flags
2 manipulative rhetoric techniques found
2 techniques detected
appeal to authority
Reference to academic paper 'Scaling Interpretable Models to 8B' and performance benchmarks against established models (LLaMA2-7B, Deepseek-7B) without full citations or links.
exaggeration
Claim of 'first interpretable language model' at 8B scale when other interpretability approaches exist; framing as unprecedented breakthrough.
Emotional Tone
Emotional character: positive/negative, intensity, authority
celebratory
Valence
+0.7
Arousal
0.6
Dominance
0.6
Transparency
Does the content identify its author and disclose interests?
0.50
✓ Author
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.75 solution oriented
Reader Agency
0.8
Stakeholder Voice
Whose perspectives are represented in this content?
0.45 2 perspectives
Speaks: institutionindividuals
About: researchersscientific_community
Temporal Framing
Is this content looking backward, at the present, or forward?
present immediate
Geographic Scope
What geographic area does this content cover?
global
Complexity
How accessible is this content to a general audience?
technical high jargon domain specific
Longitudinal · 7 evals
+1 0 −1 HN
Audit Trail 27 entries
2026-02-28 14:33 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 14:33 model_divergence Cross-model spread 0.61 exceeds threshold (4 models) - -
2026-02-28 14:33 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech blog post
2026-02-28 14:28 model_divergence Cross-model spread 0.61 exceeds threshold (4 models) - -
2026-02-28 14:28 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 14:28 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
Tech blog post
2026-02-26 22:41 eval_success Light evaluated: Mild positive (0.10) - -
2026-02-26 22:41 eval Evaluated by llama-4-scout-wai: +0.10 (Mild positive)
2026-02-26 20:07 dlq Dead-lettered after 1 attempts: Show HN: Steerling-8B, a language model that can explain any token it generates - -
2026-02-26 20:05 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:04 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 20:03 dlq Dead-lettered after 1 attempts: Show HN: Steerling-8B, a language model that can explain any token it generates - -
2026-02-26 20:03 eval_failure Evaluation failed: Error: Unknown model in registry: llama-4-scout-wai - -
2026-02-26 20:03 eval_failure Evaluation failed: Error: Unknown model in registry: llama-4-scout-wai - -
2026-02-26 20:02 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:31 dlq Dead-lettered after 1 attempts: Show HN: Steerling-8B, a language model that can explain any token it generates - -
2026-02-26 17:29 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:28 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 17:27 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-26 09:33 eval_success Evaluated: Neutral (0.61) - -
2026-02-26 09:33 eval Evaluated by deepseek-v3.2: +0.61 (Neutral) 14,082 tokens
2026-02-26 08:56 dlq Dead-lettered after 1 attempts: Show HN: Steerling-8B, a language model that can explain any token it generates - -
2026-02-26 08:56 dlq Dead-lettered after 1 attempts: Show HN: Steerling-8B, a language model that can explain any token it generates - -
2026-02-26 08:55 dlq Dead-lettered after 1 attempts: Show HN: Steerling-8B, a language model that can explain any token it generates - -
2026-02-26 04:39 eval Evaluated by claude-haiku-4-5-20251001: +0.46 (Moderate positive) 17,273 tokens +0.04
2026-02-26 04:31 eval Evaluated by claude-haiku-4-5-20251001: +0.43 (Moderate positive) 16,721 tokens -0.04
2026-02-26 04:24 eval Evaluated by claude-haiku-4-5-20251001: +0.46 (Moderate positive) 16,303 tokens