This technical analysis post advocates for protecting authentic human expression on Hacker News by documenting statistical evidence of bot infiltration. The author demonstrates how non-human accounts systematically differ in linguistic patterns from genuine users, framing bot activity as a threat to community integrity and meaningful dialogue. The content champions Article 19 freedoms (free expression and information) through rigorous, evidence-based public criticism of platform manipulation, while the freely accessible publication model structurally supports open access to this critical information.
(author) I saw a 32:1 rate of EM-dashes last night when I just eyeballed the first 3 pages of /newcomments and /noobcomments. So I'm not sure how stable this is over over time.
I’ve had this sense that HN has gotten absolutely innundated with bots last few months.
Is it possible to differentiate between a bot, and a human using AI to 'improve' the quality of their comment where some of the content might be AI written but not all? I don't think it is.
(2) I do recommend taking one minute to dash a note off to [email protected] if you see suspicious patterns. Dang and our other intrepid mods are preturnatually responsive, and appear to appreciate the extra eyeballs on the problem.
If I see an em-dash in a comment I stop reading and I've seriously considered setting up a filter across multiple sites to remove any comments containing one.
I know there are legitimate usecases for the em-dash, but a few paragraphs (at most) of text in an HN/Reddit comment? Into the trash it goes.
One pattern I've noticed recently is sort of formulaic comments that look okish on their own, maybe a bit abstract/vague/bland, and not taking a particular side on good/bad in the way people like to do, but really obviously AI when you look at the account history and they're all the same formula:
>this is [summary]
>not just x, it's y
>punchy ending, maybe question
Once you know it's AI it's very obvious they told it to use normal dashes instead of em dashes, type in lowercase, etc., but it's still weirdly formal and formulaic.
"this is the underreported second-order risk. Micron, Samsung, SK Hynix all allocated HBM capacity based on hyperscaler capex projections. NAND fabs are similarly committed. a 57% reduction in projected OpenAI spend (.4T -> B) doesn't just affect NVIDIA orders -- it ripples into the memory suppliers who shifted capacity to HBM and away from commodity DRAM/NAND. if multiple hyperscalers revise down simultaneously you get a situation similar to the 2019 crypto ASIC overhang: companies tooled up for demand that evaporated. not predicting that, but the purchasing commitments question is real."
I'm still salty that I can't use em-dashes anymore for fear of my writing being flagged as AI generated. Been using them for years—it's just `alt+shift+-` on a Mac keyboard and I find them more legible in many fonts compared to the simple dash on the typical numpad.
It's so sad to me that good typographical conventions have been co-opted by the zeitgeist of LLMs.
Downstream of this I used to cycle my accounts pretty regularly but have stopped since generative AI. Don't want people thinking I'm an LLM spam bot. My stupid comments are entirely my own.
Prior to the rise of LLM-written posts and the natural reaction of hair-trigger suspicion, I used to em and en dash fairly often in posts on HN. No reason really other than being a bit of a typography geek who happens to have always used dashes in casual writing instead of semicolons. So when I was setting up a modifier-key keyboard layer with AHK many years ago I put the em dash on modifier+dash just because I could - which made it easy.
Now someone may search old posts without a time cutoff and assume I'm an LLM. That combined with the fact I sometimes write longer posts and naturally default to pretty good punctuation, spelling and grammar, is basically a perfect storm of traits. I've already had posts accused twice in the past year of being an LLM.
Kind of sad some random quirk of LLM training caused a fun little typography thing I did just for myself (assuming no one else would even notice) to become something negative.
I noticed a similar trend a couple of weeks ago so I auto-hide green comments now. I also autohide all top 1000 user accounts but it strikes me that perhaps I should also choose a “user signed up on $date” filter that precedes OpenClaw.
If it wasn't for them misconfiguring their bot and having it post so quickly, these would go by undetected and most people would engage with them. The comments themselves seem "normal" at first glance.
On reddit it's even worse, I feel like Reddit is internally having their own bots for engagement bait.
As someone who loves LaTeX, I can't imagine ever spending so much time on typography on online forums, italics, bold, emdashes, headers, sections. I quit reddit and will quit hn as well if situation worsens.
This feels like an existential threat to HN, and to the general concept of anonymous online discourse. Trust in the platform is foundational, and without it the whole thing falls down.
Requiring proof of identity is the only solution I can think of, despite how unappealing it is. And even then, you'll still have people handing their account over to an LLM.
I really struggle to imagine a way around it. It could be that the future is just smaller, closed groups of people you know or know indirectly.
It would be trivial to make a HN comment agent that avoids all the usual hallmarks of AI writing. Mere estimations of bot activity based on character frequency would likely underestimate their presence.
I just took a look at /noobcomments and wow, there's ever a comment where a person argues with AI instead of, you know, using their own brain. It was abivous it was ai since it was formatted with markdown
I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.
Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.
There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.
edit: lol @ downvotes. Must have hit a vulnerable spot, huh?
This is probably the time to add some invitation system like GMail had in the beginning. Or make a shade for accounts <1yr. Or something else, before things get too mixed.
I don't personally care about the distinction especially since AI usually 'improves' things by making it more verbose. Don't waste tokens to force me to read more useless words about your position - just state it plainly.
I just assume if any comment sounds like an ad it's a bot. All the comments like "I'm 10x faster with Claude Opus 4.6!" or "Have you tried Codex with ChatGPT 5.X? What a time to be alive!" can be lumped in the bot bin.
I worked for GitHub for a time. There was a cultural abhorrence of the diaeresis, it was considered reader-hostile and elitist. I refused to coöperate with that edict internally, although I grant that every company has the right to micro-manage communications with the public.
I have sent them an email a few days ago about the state of /noobcomments.
This wasn't really a intended as an "wow, dang is sure sleeping on the job", more than an interesting observation on the new bot ecosystem.
I also feel like there's a missing discussion about the comment quality on HN lately. It feels like it's dropped like crazy. Wanted to see if I could find some hard data to show I haven't gone full Terry Davis.
I'll actually post a comment or question and I'll get a reply with a bit of a paragraph of what feels like a very "off" (not 'wrong' but strangely vague) summary of the topic ... and then maybe an observation or pointed agenda to push, but almost strangely disconnected from what I said.
One of the challenges is that yeah regular users don't get each other's meaning / don't read well as it is / language barriers. Yet the volume of posts I see where the other user REALLY isn't responding to the other person seems awfully high these days.
AI generated content routinely takes sides. Their pretense of neutrality is no deeper than a typical homo sapien's. This is necessarily so in an entity that derives its values from a set of weights that distill human values. Maybe reasoning AI can overcome that some day, but to me that sounds like an enormous problem that may never be solved. If AI doesn't take sides like people do they still take sides in their own way. That only becomes obscure to the extent that their value judgments conflict with ours, and they are very good at aligning with the zeitgeist values, so can hide their biases better than we can.
I wonder if it is neural networks that are inherently biased, but in blind spots, and that applies to both natural and artificial ones. It may be that to approximate neutrality we or our machines have to leave behind the form of intelligence that depends on intrinsically biased weights and instead depend on logically deriving all values from first principles. I have low confidence that AI's can accomplish that any time soon, and zero confidence that natural intelligence can. And it's difficult to see how first principles regarding human values can be neutral.
I'm also skeptical that succeeding at becoming unbiased is a solution, and that while neutrality may be an epistemic advance, it also degrades social cohesion, and that neutrality looks like rationality, but bias may be Chesterson's Fence and we should be very careful about tearing it down. Maybe it's a blessing that we can't.
Is there even an incentive to optimize for such signals, though? Em-dashes have been a known indicator of AI-generated text for a good while, and are still extremely prevalent. While someone who doesn't like AI slop and knows and what to look out for will notice and call out obvious AI comments, the unfortunate truth is that the majority of people simply cannot tell, and even among those who can, many don't care.
Obvious AI-generated posts and articles make it to the front page on a daily basis, and I get the impression that neither the average user nor the moderation team see that as a problem at all anymore.
It's wierd because the barrier to not have that in is so low, you can just tack on 'talk like me not AI, dont use em dashes, don't use formulaic structures, be concice' and itll get rid of half of those signals.
AI post "improvements" are the most annoying thing. I see more and more people doing it, especially when posting reviews/experiences with things, and they always get called out for it. They always justify it with "AI helped me organize what I wanted to say." Like man, you're having an AI write about an experience it didn't have and likely didn't even proofread it. Who knows what BS it added to the story. Even disorganized and misspelled stories are better than AI fantasy renditions that are 20 times longer than they need to be.
I still call voodoo on this. I use an iPhone, iPad, Mac to comment here—all of them autocorrect to em dashes at one point or another. Same goes for ellipsis.
LLM fatigue is real. It's not just em-dash — it's the overall tone of the writing that clues people in. But if your viewpoints and approach are unique, your typesetting won't raise suspicion of machine-generation, except in the most dull of readers. Just be you and it will be fine.
If you'd like more tips on writing I'd be happy to help.
It's funny - some months ago I noticed that I use the word "actually" lot, and started trying to curb it from my writing. Not for any AI-related reason, but because it is almost always a meaningless filler word, and I find that being concise helps get my points across more clearly.
e.g. "The body of the template is parsed, but not actually type-checked until the template is used." -> "but not typechecked until the template is used." The word "actually" here has a pleasant academic tone, but adds no meaning.
The user [1] you've mentioned has 160 points being a poster of total four bland messages. This goes against a normal statistical distribution. And this gives away why they do it: the long-term aim is to cultivate voting rings to influence the narratives and rankings in the future. For now, this is only my theory but it may be a real monetization strategy for them.
The article advocates for free expression and authentic voice by documenting how bot infiltration degrades genuine discourse. The author explicitly analyzes and critiques the problem of non-human accounts drowning out authentic human communication on HN, defending the value of meaningful human speech.
FW Ratio: 50%
Observable Facts
The post provides statistical evidence comparing new account behavior (17.47% use em-dashes) to established accounts (1.83%), with p-value 7e-20.
The author describes accessing Hacker News comment data via /newcomments and /noobcomments endpoints and performing statistical analysis.
The post is published on a personal blog without paywall, allowing unrestricted access to the analysis.
Inferences
By documenting bot activity patterns, the author exercises Article 19 rights to seek information and impart findings to the public.
The technical, evidence-based approach to analyzing platform manipulation supports transparent discourse about digital communication integrity.
The accessibility of the content reinforces the structural protection of free expression by removing barriers to information access.
Content frames the problem of bot infiltration on Hacker News as a threat to authentic discourse and community integrity, implicitly supporting principles of dignity and meaningful participation reflected in the Preamble's vision of a world where human rights are universally recognized.
FW Ratio: 60%
Observable Facts
The post describes accounts posting gibberish text like '13 60 well and t6ctctfuvuh7hguhuig8h88gd' as evidence of bot activity.
The author states the vibe of HN is 'seriously off' with comments that are 'incredibly banal, or oddly off topic.'
The post concludes with a call to investigate bot patterns through statistical analysis of comment corpora.
Inferences
The framing of bot infiltration as a problem affecting community quality suggests concern for preserving authentic human expression and trust.
By documenting systematic differences between new and established accounts, the author advocates for transparency and evidence-based understanding of platform dynamics.
The article implicitly affirms human dignity by highlighting the degradation of discourse quality caused by bot infiltration, suggesting concern that authentic human voices and meaningful exchange are being undermined.
FW Ratio: 50%
Observable Facts
The post emphasizes that new accounts differ substantially from established human accounts in stylistic patterns.
The author treats the detection of bot-like behavior as a problem worthy of investigation and documentation.
Inferences
By treating bot activity as a violation of community norms, the author implicitly endorses the dignity of authentic human participation.
The concern with 'banal' and 'off topic' comments reflects a value for meaningful, respectful discourse among humans.
The post implicitly supports peaceful assembly and association by highlighting how bot infiltration disrupts authentic community dialogue, suggesting concern for protecting the integrity of human collective spaces.
FW Ratio: 50%
Observable Facts
The author describes HN as a community affected by bot activity, treating it as a collective space of concern.
The post uses inclusive language ('we', 'I've had this sense') suggesting participation in a shared community experiencing the problem.
Inferences
Concern with bot-driven degradation of HN's community vibe reflects an implicit value for authentic human association and collective discourse.
The collaborative nature of analyzing comment data suggests investment in preserving community integrity.
The website provides free, open access to this critical analysis without paywalls or registration barriers, enabling readers to freely receive and share information about platform manipulation. The practice of publishing detailed technical evidence supports the exercise of freedom to seek, receive, and impart information.
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 10:41:39 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.