+0.35 New accounts on HN more likely to use em-dashes

Name: HRCB Evaluation: New accounts on HN more likely to use em-dashes
Item: New accounts on HN more likely to use em-dashes
Rating: 0.437
Author: HN HRCB

Model: @cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 @cf/meta/llama-4-scout-17b-16e-instruct lite 0.00 deepseek/deepseek-v3.2-20251201 +0.25 claude-haiku-4-5-20251001 +0.35 meta-llama/llama-3.3-70b-instruct:free ND Compare

+0.35	New accounts on HN more likely to use em-dashes (www.marginalia.nu S:+0.45 )
	717 points by todsacerdoti 4 days ago \| 603 comments on HN \| Moderate positive Contested Editorial · v3.7 · 2026-02-26 00:55:10 0

Summary Free Expression & Authentic Discourse Advocates

This technical analysis post advocates for protecting authentic human expression on Hacker News by documenting statistical evidence of bot infiltration. The author demonstrates how non-human accounts systematically differ in linguistic patterns from genuine users, framing bot activity as a threat to community integrity and meaningful dialogue. The content champions Article 19 freedoms (free expression and information) through rigorous, evidence-based public criticism of platform manipulation, while the freely accessible publication model structurally supports open access to this critical information.

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Editorial Mean	+0.35	Structural Mean	+0.45
Weighted Mean	+0.44	Unweighted Mean	+0.40
Max	+0.76 Article 19	Min	+0.25 Article 1
Signal	4	No Data	27
Volatility	0.21 (Medium)
Negative	0	Channels	E: 0.6 S: 0.4
SETL ℹ	+0.23	Editorial-dominant
FW Ratio ℹ	53%	10 facts · 9 inferences

Evidence 9% coverage ℹ

 1H  3M   27 ND 

Theme Radar

HN Discussion 20 top-level · 30 replies

marginalia_nu 2026-02-25 14:46 UTC link

(author) I saw a 32:1 rate of EM-dashes last night when I just eyeballed the first 3 pages of /newcomments and /noobcomments. So I'm not sure how stable this is over over time.

onion2k 2026-02-25 16:11 UTC link

I’ve had this sense that HN has gotten absolutely innundated with bots last few months.

Is it possible to differentiate between a bot, and a human using AI to 'improve' the quality of their comment where some of the content might be AI written but not all? I don't think it is.

CharlesW 2026-02-25 16:16 UTC link

A couple thoughts:

(1) I don't recommend focusing disproportionately on one signal. They'll change, and are incredibly easy to optimize for. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

(2) I do recommend taking one minute to dash a note off to [email protected] if you see suspicious patterns. Dang and our other intrepid mods are preturnatually responsive, and appear to appreciate the extra eyeballs on the problem.

maurycyz 2026-02-25 16:17 UTC link

Most people want to avoid looking like AI, ut what if you want to blend in with the robot uprising.

I present ⸻ the U+2E3B dash.

SkyeCA 2026-02-25 16:18 UTC link

If I see an em-dash in a comment I stop reading and I've seriously considered setting up a filter across multiple sites to remove any comments containing one.

I know there are legitimate usecases for the em-dash, but a few paragraphs (at most) of text in an HN/Reddit comment? Into the trash it goes.

atourgates 2026-02-25 16:19 UTC link

Shoutout to my English Major comrades who have been using em-dashes forever, and have had to stop so we don't sound like AI.

If AI starts use the New Yorker style diaeresis (umlaut-looking thing when there are two vowels in words like coöperate) I swear I'm gonna lose it.

dematz 2026-02-25 16:22 UTC link

One pattern I've noticed recently is sort of formulaic comments that look okish on their own, maybe a bit abstract/vague/bland, and not taking a particular side on good/bad in the way people like to do, but really obviously AI when you look at the account history and they're all the same formula:

>this is [summary]

>not just x, it's y

>punchy ending, maybe question

Once you know it's AI it's very obvious they told it to use normal dashes instead of em dashes, type in lowercase, etc., but it's still weirdly formal and formulaic.

For example from https://news.ycombinator.com/threads?id=snowhale

"this is the underreported second-order risk. Micron, Samsung, SK Hynix all allocated HBM capacity based on hyperscaler capex projections. NAND fabs are similarly committed. a 57% reduction in projected OpenAI spend (.4T -> B) doesn't just affect NVIDIA orders -- it ripples into the memory suppliers who shifted capacity to HBM and away from commodity DRAM/NAND. if multiple hyperscalers revise down simultaneously you get a situation similar to the 2019 crypto ASIC overhang: companies tooled up for demand that evaporated. not predicting that, but the purchasing commitments question is real."

d4mi3n 2026-02-25 16:22 UTC link

I'm still salty that I can't use em-dashes anymore for fear of my writing being flagged as AI generated. Been using them for years—it's just `alt+shift+-` on a Mac keyboard and I find them more legible in many fonts compared to the simple dash on the typical numpad.

It's so sad to me that good typographical conventions have been co-opted by the zeitgeist of LLMs.

hartator 2026-02-25 17:35 UTC link

Biggest tell that a comment is AI: it's deeply uninteresting.

No one wants to read your ChatGPT outputs.

marginalia_nu 2026-02-25 17:55 UTC link

Fwiw I did some more comparisons, looking for words disproportionately favored by noob comments:

    word   noob new   p-value
    ----------------------------
    ai 14.93% 7.87% p=0.00016
    actually 12.53% 5.34% p=1.1e-05
    code 11.47% 6.04% p=0.00081
    real 10.93% 2.95% p=2.6e-08
    built 10.93% 2.11% p=2.1e-10
    data 8.93% 3.51% p=6.1e-05
    tools 7.6% 2.67% p=5.5e-05
    agent 7.47% 2.95% p=0.00024
    app 7.2% 3.09% p=0.00078
    tool 6.8% 1.83% p=8.5e-06
    model 6.8% 2.39% p=0.00013
    agents 6.67% 2.11% p=5.2e-05
    api 6.53% 1.12% p=2.7e-07
    building 6.13% 1.54% p=1.3e-05
    full 6.0% 1.97% p=0.00017
    across 5.87% 1.4% p=1.3e-05
    interesting 5.33% 1.54% p=0.00014
    answer 5.2% 1.4% p=9.6e-05
    simple 4.93% 1.54% p=0.00043
    project 4.8% 1.26% p=0.00015

simonw 2026-02-25 18:00 UTC link

The data is available in a SQLite database on GitHub: https://github.com/vlofgren/hn-green-clankers

You can explore the underlying data using SQL queries in your browser here: https://lite.datasette.io/?url=https%253A%252F%252Fraw.githu... (that's Datasette Lite, my build of the Datasette Python web app that runs in Pyodide in WebAssembly)

Here's a SQL query that shows the users in that data that posted the most comments with at least one em dash - the top ones all look like legitimate accounts to me: https://lite.datasette.io/?url=https%3A%2F%2Fraw.githubuserc...

eterm 2026-02-25 18:11 UTC link

It's the "incredibly banal" comments that upset me. The ones that just re-state the article in one or two uncontraversial sentences.

Often lean slightly pro-AI, but otherwise avoid saying much about anything.

AustinDev 2026-02-25 18:36 UTC link

Downstream of this I used to cycle my accounts pretty regularly but have stopped since generative AI. Don't want people thinking I'm an LLM spam bot. My stupid comments are entirely my own.

mrandish 2026-02-25 18:58 UTC link

Prior to the rise of LLM-written posts and the natural reaction of hair-trigger suspicion, I used to em and en dash fairly often in posts on HN. No reason really other than being a bit of a typography geek who happens to have always used dashes in casual writing instead of semicolons. So when I was setting up a modifier-key keyboard layer with AHK many years ago I put the em dash on modifier+dash just because I could - which made it easy.

Now someone may search old posts without a time cutoff and assume I'm an LLM. That combined with the fact I sometimes write longer posts and naturally default to pretty good punctuation, spelling and grammar, is basically a perfect storm of traits. I've already had posts accused twice in the past year of being an LLM.

Kind of sad some random quirk of LLM training caused a fun little typography thing I did just for myself (assuming no one else would even notice) to become something negative.

arjie 2026-02-25 18:58 UTC link

I noticed a similar trend a couple of weeks ago so I auto-hide green comments now. I also autohide all top 1000 user accounts but it strikes me that perhaps I should also choose a “user signed up on $date” filter that precedes OpenClaw.

rob 2026-02-25 19:04 UTC link

Most of the bots I've caught on here don't really use em dashes at all.

For example, here's an active bot that posted 30 mins ago (as of this comment):

https://news.ycombinator.com/threads?id=aplomb1026

Examine the last two detailed comments it made and you'll see the timestamps show they were posted < 30 seconds apart:

https://news.ycombinator.com/item?id=47155655

https://news.ycombinator.com/item?id=47155648

If it wasn't for them misconfiguring their bot and having it post so quickly, these would go by undetected and most people would engage with them. The comments themselves seem "normal" at first glance.

---

Other bots:

https://news.ycombinator.com/threads?id=dirtytoken7

https://news.ycombinator.com/threads?id=fdefitte

vjerancrnjak 2026-02-25 19:24 UTC link

On reddit it's even worse, I feel like Reddit is internally having their own bots for engagement bait.

As someone who loves LaTeX, I can't imagine ever spending so much time on typography on online forums, italics, bold, emdashes, headers, sections. I quit reddit and will quit hn as well if situation worsens.

mikenew 2026-02-25 19:32 UTC link

This feels like an existential threat to HN, and to the general concept of anonymous online discourse. Trust in the platform is foundational, and without it the whole thing falls down.

Requiring proof of identity is the only solution I can think of, despite how unappealing it is. And even then, you'll still have people handing their account over to an LLM.

I really struggle to imagine a way around it. It could be that the future is just smaller, closed groups of people you know or know indirectly.

atleastoptimal 2026-02-25 21:59 UTC link

It would be trivial to make a HN comment agent that avoids all the usual hallmarks of AI writing. Mere estimations of bot activity based on character frequency would likely underestimate their presence.

Velocifyer 2026-02-25 22:35 UTC link

I find that specifically GPT4 uses a lot on em dashes. I find that it uses em-dashes when it is not usefull.

Muhammad523 2026-02-25 16:09 UTC link

I just took a look at /noobcomments and wow, there's ever a comment where a person argues with AI instead of, you know, using their own brain. It was abivous it was ai since it was formatted with markdown

cookiengineer 2026-02-25 16:12 UTC link

I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.

Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.

There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.

edit: lol @ downvotes. Must have hit a vulnerable spot, huh?

lm28469 2026-02-25 16:13 UTC link

> HN has gotten absolutely innundated with bots last few months.

hm, the whole internet really, youtube, reddit, twitter, facebook, blog posts, food recipes, news articles, it's getting more and more obvious

gritzko 2026-02-25 16:18 UTC link

This is probably the time to add some invitation system like GMail had in the beginning. Or make a shade for accounts <1yr. Or something else, before things get too mixed.

isoprophlex 2026-02-25 16:20 UTC link

The Big Chungus of dashes. Could this be the character that has the widest rendering?!

5o1ecist 2026-02-25 16:22 UTC link

> minute to dash a note

I support this dashing recommendation.

bob1029 2026-02-25 16:23 UTC link

I'd like to see a histogram of my HN em dash usage over time. Maybe someone could get bored and visualize the 2nd order effects described here.

scosman 2026-02-25 16:23 UTC link

Agreed.

Join me in double-dash em proximates. Shows you manually typed it out with total disregard token count and technical correctness.

munk-a 2026-02-25 16:25 UTC link

I don't personally care about the distinction especially since AI usually 'improves' things by making it more verbose. Don't waste tokens to force me to read more useless words about your position - just state it plainly.

Brevity is the soul of wit.

yoyohello13 2026-02-25 16:29 UTC link

I just assume if any comment sounds like an ad it's a bot. All the comments like "I'm 10x faster with Claude Opus 4.6!" or "Have you tried Codex with ChatGPT 5.X? What a time to be alive!" can be lumped in the bot bin.

a4isms 2026-02-25 16:33 UTC link

I worked for GitHub for a time. There was a cultural abhorrence of the diaeresis, it was considered reader-hostile and elitist. I refused to coöperate with that edict internally, although I grant that every company has the right to micro-manage communications with the public.

marginalia_nu 2026-02-25 16:34 UTC link

I have sent them an email a few days ago about the state of /noobcomments.

This wasn't really a intended as an "wow, dang is sure sleeping on the job", more than an interesting observation on the new bot ecosystem.

I also feel like there's a missing discussion about the comment quality on HN lately. It feels like it's dropped like crazy. Wanted to see if I could find some hard data to show I haven't gone full Terry Davis.

duxup 2026-02-25 16:35 UTC link

I've certainly noticed the summary posts.

I'll actually post a comment or question and I'll get a reply with a bit of a paragraph of what feels like a very "off" (not 'wrong' but strangely vague) summary of the topic ... and then maybe an observation or pointed agenda to push, but almost strangely disconnected from what I said.

One of the challenges is that yeah regular users don't get each other's meaning / don't read well as it is / language barriers. Yet the volume of posts I see where the other user REALLY isn't responding to the other person seems awfully high these days.

5o1ecist 2026-02-25 16:45 UTC link

> what if you want to blend in with the robot uprising.

There is nothing to fear, MY HUMAN FRIEND!

delichon 2026-02-25 16:47 UTC link

AI generated content routinely takes sides. Their pretense of neutrality is no deeper than a typical homo sapien's. This is necessarily so in an entity that derives its values from a set of weights that distill human values. Maybe reasoning AI can overcome that some day, but to me that sounds like an enormous problem that may never be solved. If AI doesn't take sides like people do they still take sides in their own way. That only becomes obscure to the extent that their value judgments conflict with ours, and they are very good at aligning with the zeitgeist values, so can hide their biases better than we can.

I wonder if it is neural networks that are inherently biased, but in blind spots, and that applies to both natural and artificial ones. It may be that to approximate neutrality we or our machines have to leave behind the form of intelligence that depends on intrinsically biased weights and instead depend on logically deriving all values from first principles. I have low confidence that AI's can accomplish that any time soon, and zero confidence that natural intelligence can. And it's difficult to see how first principles regarding human values can be neutral.

I'm also skeptical that succeeding at becoming unbiased is a solution, and that while neutrality may be an epistemic advance, it also degrades social cohesion, and that neutrality looks like rationality, but bias may be Chesterson's Fence and we should be very careful about tearing it down. Maybe it's a blessing that we can't.

anotherlab 2026-02-25 17:05 UTC link

I used to use em-dashes and en-dashes in my work emails and other writings, but stopped using them when they became AI markers.

bakugo 2026-02-25 17:16 UTC link

Is there even an incentive to optimize for such signals, though? Em-dashes have been a known indicator of AI-generated text for a good while, and are still extremely prevalent. While someone who doesn't like AI slop and knows and what to look out for will notice and call out obvious AI comments, the unfortunate truth is that the majority of people simply cannot tell, and even among those who can, many don't care.

Obvious AI-generated posts and articles make it to the front page on a daily basis, and I get the impression that neither the average user nor the moderation team see that as a problem at all anymore.

dang 2026-02-25 17:32 UTC link

Just do it anyway—I always have, and always will.

Well, I haven't always—just for maybe 20 years.

NoiseBert69 2026-02-25 17:50 UTC link

We avoid censorship by ⸻ more often and talking to ⸻ about ⸻.

m_w_ 2026-02-25 17:52 UTC link

"is real" is another big red flag, go search this in comments. There appear to be at least three accounts posting direct LLM outputs.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

kraftman 2026-02-25 17:56 UTC link

It's wierd because the barrier to not have that in is so low, you can just tack on 'talk like me not AI, dont use em dashes, don't use formulaic structures, be concice' and itll get rid of half of those signals.

kdheiwns 2026-02-25 17:59 UTC link

AI post "improvements" are the most annoying thing. I see more and more people doing it, especially when posting reviews/experiences with things, and they always get called out for it. They always justify it with "AI helped me organize what I wanted to say." Like man, you're having an AI write about an experience it didn't have and likely didn't even proofread it. Who knows what BS it added to the story. Even disorganized and misspelled stories are better than AI fantasy renditions that are 20 times longer than they need to be.

rcarmo 2026-02-25 18:10 UTC link

I still call voodoo on this. I use an iPhone, iPad, Mac to comment here—all of them autocorrect to em dashes at one point or another. Same goes for ellipsis.

elevation 2026-02-25 18:11 UTC link

LLM fatigue is real. It's not just em-dash — it's the overall tone of the writing that clues people in. But if your viewpoints and approach are unique, your typesetting won't raise suspicion of machine-generation, except in the most dull of readers. Just be you and it will be fine.

If you'd like more tips on writing I'd be happy to help.

wavemode 2026-02-25 18:12 UTC link

It's funny - some months ago I noticed that I use the word "actually" lot, and started trying to curb it from my writing. Not for any AI-related reason, but because it is almost always a meaningless filler word, and I find that being concise helps get my points across more clearly.

e.g. "The body of the template is parsed, but not actually type-checked until the template is used." -> "but not typechecked until the template is used." The word "actually" here has a pleasant academic tone, but adds no meaning.

garganzol 2026-02-25 18:12 UTC link

The user [1] you've mentioned has 160 points being a poster of total four bland messages. This goes against a normal statistical distribution. And this gives away why they do it: the long-term aim is to cultivate voting rings to influence the narratives and rankings in the future. For now, this is only my theory but it may be a real monetization strategy for them.

[1] https://news.ycombinator.com/threads?id=snowhale

xlii 2026-02-25 18:14 UTC link

Actually building full, real AI app project code across simple API data tools helps built model agents answer an interesting tool — an agent.

marginalia_nu 2026-02-25 18:15 UTC link

If you change to

> select user, source, count(*), ...

it's clear that every single outlier in em-dash use in the data set is a green account.

stronglikedan 2026-02-25 18:15 UTC link

great repo name!

chrisjj 2026-02-25 18:26 UTC link

> No one wants to read your ChatGPT outputs.

...except ChatGPT fans.

Editorial Channel

What the content says

+0.55

Article 19 Freedom of Expression

High Advocacy Practice

Editorial

+0.55

SETL

+0.23

The article advocates for free expression and authentic voice by documenting how bot infiltration degrades genuine discourse. The author explicitly analyzes and critiques the problem of non-human accounts drowning out authentic human communication on HN, defending the value of meaningful human speech.

+0.35

Preamble Preamble

Medium Framing

Editorial

+0.35

SETL

Content frames the problem of bot infiltration on Hacker News as a threat to authentic discourse and community integrity, implicitly supporting principles of dignity and meaningful participation reflected in the Preamble's vision of a world where human rights are universally recognized.

+0.25

Article 1 Freedom, Equality, Brotherhood

Medium Framing

Editorial

+0.25

SETL

The article implicitly affirms human dignity by highlighting the degradation of discourse quality caused by bot infiltration, suggesting concern that authentic human voices and meaningful exchange are being undermined.

+0.25

Article 20 Assembly & Association

Medium Framing

Editorial

+0.25

SETL

The post implicitly supports peaceful assembly and association by highlighting how bot infiltration disrupts authentic community dialogue, suggesting concern for protecting the integrity of human collective spaces.

Article 2 Non-Discrimination

No observable content related to nationality, asylum, or status.

Article 3 Life, Liberty, Security

No observable content related to right to life, liberty, or security.

Article 4 No Slavery

No observable content related to slavery or servitude.

Article 5 No Torture

No observable content related to torture or cruel treatment.

Article 6 Legal Personhood

No observable content related to legal personhood.

Article 7 Equality Before Law

No observable content related to equal protection before the law.

Article 8 Right to Remedy

No observable content related to effective remedies for rights violations.

Article 9 No Arbitrary Detention

No observable content related to arbitrary arrest or detention.

Article 10 Fair Hearing

No observable content related to fair and public hearing.

Article 11 Presumption of Innocence

No observable content related to presumption of innocence or criminal liability.

Article 12 Privacy

No observable content related to privacy, family, home, or correspondence.

Article 13 Freedom of Movement

No observable content related to freedom of movement or residence.

Article 14 Asylum

No observable content related to asylum or refuge.

Article 15 Nationality

No observable content related to nationality.

Article 16 Marriage & Family

No observable content related to marriage or family rights.

Article 17 Property

No observable content related to property rights.

Article 18 Freedom of Thought

No observable content related to freedom of thought, conscience, or religion.

Article 21 Political Participation

No observable content related to participation in government or public affairs.

Article 22 Social Security

No observable content related to social security or economic rights.

Article 23 Work & Equal Pay

No observable content related to work or employment rights.

Article 24 Rest & Leisure

No observable content related to rest and leisure.

Article 25 Standard of Living

No observable content related to health or social welfare.

Article 26 Education

No observable content related to education.

Article 27 Cultural Participation

No observable content related to cultural participation or intellectual property.

Article 28 Social & International Order

No observable content related to social and international order.

Article 29 Duties to Community

No observable content related to duties or limitations on rights.

Article 30 No Destruction of Rights

No observable content related to prohibition of destruction of rights.

Structural Channel

What the site does

+0.45

Article 19 Freedom of Expression

High Advocacy Practice

Structural

+0.45

Context Modifier

+0.25

SETL

+0.23

The website provides free, open access to this critical analysis without paywalls or registration barriers, enabling readers to freely receive and share information about platform manipulation. The practice of publishing detailed technical evidence supports the exercise of freedom to seek, receive, and impart information.

Preamble Preamble

Medium Framing