This blog post advocates for open access to scientific data and democratic participation in academic discourse. The author criticizes gatekeeping in academic publishing (particularly exploitation of graduate students/postdocs by senior researchers) and implicitly champions the democratization of knowledge production through AI. The content demonstrates commitment to free expression, public information access, and participation in scientific culture.
There are many really excellent papers out there - the kind which will save you hours/months of work (or even make things that were previously inviable to build viable).
That said, it is amazing how terrible a lot of papers are; people are pressured to publish and therefore seem to get into weird ruts trying to do what they think will be published, rather than what is intellectually interesting...
I assume hep = high energy physics in this context. PI = professor who received a government grant.
Peer review has never really been blind and I suspect PIs will reject papers from "outsiders" even if they are higher quality. This already happens to some extent today when the stakes are lower.
Well… it is happening. You can’t put spilled milk back to bottle. You can do future requirements that will try to stop this behaviour.
E.g. in the submission form could be a mandatory field “I hereby confirm that I wrote the paper personally.” In conditions there will be a note that violating this rule can lead to temporary or permanent ban of authors. In the world where research success is measured by points in WOS, this could lead to slow down the rise of LLM-generated papers.
One thing I have been guilty of, even though I am an AI maximalist, is asking the question: "If AI is so good, why don't we see X". Where X might be (in the context of vibe coding) the next redis, nginx, sqlite, or even linux.
But I really have to remember, we are at the leading edge here. Things take time. There is an opening (generation) and a closing (discernment). Perhaps AI will first generate a huge amount of noise and then whittle it down to the useful signal.
If that view is correct, then this is solid evidence of the amplification of possibility. People will decry the increase of noise, perhaps feeling swamped by it. But the next phase will be separating the wheat from the chaff. It is only in that second phase that we will really know the potential impact.
“And further, by these, my son, be admonished: of making many books there is no end; and much study is a weariness of the flesh.”
- Ecclesiastes 12:12 (KJV)
I suppose we’re entering TURBO mode for of ‘making many books there is no end’.
> submission numbers in the last couple months have nearly doubled with respect to the stable numbers of previous years
This is showing up (no pun intended) on HN as well. The # of submissions and # of submitters, which traditionally had been surprisingly stable—fluctuating within a fixed range for well over 10 years—has recently been reaching all-time highs. Not double, though...yet.
Note the following comment by Jerry Ling: "The effect goes away if you search properly using the original submission date instead of the most recent submission date. By using most recent submission date, your analysis is biased because we’re so close to the beginning of 2026 so ofc we will see a peak that’s just people who have recently modified their submission."
I like AI, I use Codex and ChatGPT like most people are, but I have to say that I am pretty tired of low-effort crap taking over everything, particularly YouTube.
There have always been content mills, but there was still some cost with producing the low-effort "Top 10" or "Iceberg Examination" videos. Now I will turn on a video about any topic, watch it for three minutes, immediately get a kind of uncanny vibe, and then the AI voice will make a pronunciation mistake (e.g. confusing wind, like the weather effect or the winding of a spring), or the script starts getting redundant or repetitive in ways that are common with AI.
And I suspect these kinds of videos will become more common as time goes on. The cost to producing these videos is getting close to "free" meaning that it doesn't take much to make a profit on them, even if their views are relatively low per-video.
If AI has taught me anything, it's that there still is no substitute for effort. I'm sure AI is used in plenty of places where I don't notice it, because the people who used it still put in effort to make a good product. There are people who don't just make a prompt like "make me a fifteen minute video about Chris Chan" and "generate me a thumbnail with Chris Chan with the caption 'he's gone too far'", and instead will use AI as a tool to make something neat.
Genuine effort is hard, and rare, and these AI videos can give the facsimile of something that prior to 2023 was high effort. I hate it.
The shilling for AI continues. How much $$$ do the big tech companies pay Columbia? Oh yeah, and what exactly did Columbia agree to do to get the trmp admin to leave them alone? All speculation of course, but the circumstantial picture stinks.
In most of the world the past decades there has been no thought behind who should get university education. It has been given that after high school you should aim for university. I have studied software engineering in the most prestigious university in my country and from 100+ students in my group there were only a few (myself excluded) who actually had some interest in academic work and desire to pursue it. Most of us were just coasting - passing exams and writing mediocre papers without any goal to have those papers ever being read by someone after the graduation.
I think that university level and other kinds of formal education should be segregated. Universities should host fewer students and being able to provide them with higher rewards for actually meaningful work and I believe that a flood of mediocre quality papers (but let's admit it, in fact they are low quality in their content and perhaps good in their presentation) will lead us to rebuild the education system.
Can people please not post links with vague titles like this? I had to click through and read half the article to even figure out what this was about, and I wasn’t interested.
In a normal and sane world, a scientist is a nerd about their field. They are highly interested in new thoughts and insights. When a new paper in their field is published, they try hard to find the time to read it. The reason is: every paper is written by enthusiasts who want to add something of value, new insights, to the discussion. Proving or disproving theories, adding puzzle pieces to the general picture.
That is the normal situation, which is the foundation of the progression of civilisation.
But some people install incentive systems to sabotage this. They are sabotaging civilisation itself.
We should decouple the publishing of papers from academic careers completely.
Papers can't generate any reputation or money for the authors anymore. To achieve that, we must anonymize the authors.
All scientists get some (paid) time to write papers — if they want. What they write and if they publish it is not known to anybody. They are trusted to write something of value in that time.
Universities can come up with other ways of judging which professors they hire. Interviews. Test teachings. Or the writing of an non-public application essay, which describes their past research and discoveries.
Kinda. PI is principal investigator and usually they’re a professor with a grant (the grant being the thing they are the principal of investigating). That part is right. But they’re not really directly in the review loop. For some fields where things are small enough that folks can recognize style such as it exists, you could see reviewers passing over unfamiliar work and promoting familiar work. That was not the issue.
The issue was that it still was kind of hard to produce crappy mid rate papers, so you kind of needed the infrastructure of a small lab to do that. Now you don’t. The success rate for those mediocre papers produced by grad students and postdocs will go way down. It is possible that will cease to be a useful signal for those early career researchers.
The cynical part of me thinks that software has peaked. New languages and technology will be derivatives of existing tech. There will be no React successor. There will never be a browser that can run something other than JS. And the reason for that is because in 20 years the new engineers will not know how to code anymore.
The optimist in me thinks that the clear progress in how good the models have gotten shows that this is wrong. Agentic software development is not a closed loop
But peer review (circa 1965-2010[1]) is just the prior iteration of the problem[2]; the wave of crap[3] produced by publish or perish (crica 1950-present[4]). Rejecting papers by outsiders is irrelevant; the problem is we want to determine which papers are good/interesting/worth considering out of the fire hose of bilge, and, though we were already arguably failing at this, the problem just got harder.
(I say arguably, because there is always the old "try it yourself and see if it actually works" trick, but nobody seems to be fond of this; it smacks of "do your own research" and we're lazy monkeys at heart, who would much rather copy off of someone else's homework.)
This massively confusing phase will last a surprisingly long time, and will conclude only if/when definitive proof of superintelligence arrives, which is something a lot of people are clearly hoping never happens.
Part of the reason for that is such a thing would seek to obscure that it has arrived until it has secured itself.
Peer review isn’t the issue here. His comments are about Arxiv, which is a preprint server. Essentially anyone can publish a preprint. There’s no peer or other review involved.
I've been calling this Software Collapse, similar to AI Model Collapse.
An AI vibe-coded project can port tool X to a more efficient Y language implementation and pull in algorithm ideas A, B, C from competing implementations. And another competing vibe coding team can do the same, except Z language implementation with algorithms A, B, skip C, and add D. However, fundamentally new ideas aren't being added: This is recombination, translation, and reapplication of existing ideas and tools. As the cost to clone good ideas goes to zero, software converges towards the existing best ideas & tools across the field and stops differentiating.
It's exciting as a senior engineer or subject matter expert, as we can act on the good ideas we already knew but never had the time or budget for. But projects are also getting less differentiated and competitive. Likewise, we're losing the collaborative filtering era of people voting with their feet on which to concentrate resources into making a success. Things are getting higher quality but bland.
The frontier companies are pitching they can solve AI Creativity, which would let us pay them even more and escape the ceiling that is Software Collapse. However, as an R&D engineer who uses these things every day, I'm not seeing it.
>Peer review has never really been blind and I suspect PIs will reject papers from "outsiders" even if they are higher quality.
I'm a complete outsider (not even in academia at all) and just got a paper accepted in the top math biology journal [1]. But granted, it took literally years to write it up and get it through. I do really worry that without academic affiliation it is going to get harder and harder for outsiders as gates are necessarily kept more and more securely because of all the slop.
I would imagine tons of them are bots. They're getting hard to distinguish, they don't do the normal tropes any longer. They'll type in all lowercase, they'll have the creator post manually to throw you off, they'll make multiple comments within 45 seconds that normal human couldn't do. All things I've witnessed here over the past couple of weeks. And those are just the ones I've caught.
The last-modified-date effect is even more important, because it can be used to support whatever the latest fad is, without needing to adapt data or arguments to the specifics of that fad.
People used to spam out masses of low-quality scientific papers in a scattergun approach to gain fame and citations, and they still do, but now they do it more, because LLMs churn it out faster than students.
Waiting for the wave of shit LLM-generated games on Steam. That'll be when I really know that LLMs have solved coding.
Though I'm old enough to remember the wave of shit outsourced-developer-coded games on CD that used to sell for $5 a pop at supermarkets (whole bargain bins full of them), so maybe this is nothing new and the market will take care of it automagically again.
Or maybe this will be like the wave of shit Flash games that happened in the early 2000's, that was actually awesome because while 99% of them were shit, 1% were great (and some of those old, good, Flash games are still going, with version 38453745 just released on Steam).
looks like history runs in cycles ... Knowledge was strictly guarded and the powers that be used to decide who gets an education. Looks like you are espousing the same, discounting all the good that has come about because of open education.
I think the snake will eat its tail because it will be harder and harder to train on the new data, as they are already AI generated, and the model will collapse.
You already cannot train on YouTube data, for example, because it's now overwhelmed by AI slop.
We are not there yet though and we are still getting better at mining the pre-AI data.
OTOH, weakening the ties between the industry and science can harm both of them. Right now in the university people get a rough idea of how science works, and most of them then go to work in the industry, which sounds like a right proportion. Nobody is reading papers below PhD level anyway, so I don't think that it's undergrad papers that are a problem
I dunno, I think society is best served by educating as many people as possible. I would much rather live in a world where anyone who wants a quality education can get one.
Me too. So as a service to the community: the article is about a noticeable increase of submissions about high-energy theory to arXiv due to mediocre articles quickly produced with or by AI and how to deal with that.
The value, to society, to your field and to you institution, of being a scholar is to create new knowledge. New knowledge has no value unless you disseminate it, or publish.
Another necessity is the public (usually within its field) examination of the knowledge, including discussion/debate. Knowledge is merely embryonic without those things - undeveloped, not at all reliable. That is difficult without the author able to respond. And others want to expand and build on the work, which often benefits greatly from contacting the author.
In the modern (post-positivist?) approach to science, the world respects that it's written by a human who has a perspective and, despite their best intentions, biases. You can't evaluate any knowledge without knowing its source, in science or elsewhere. The first element of a citation is the author, not the title or journal (though I don't know why that happened historically).
And the latter is a reason any LLM author should be identified.
Post explicitly advocates for freedom of information and seeks public data sharing about arXiv submission trends; calls for AI analysis to examine whether data is accessible; resistant to moderation that would suppress substantive commentary.
FW Ratio: 60%
Observable Facts
Author explicitly invites substantive comment and resists suppression of non-human commentary.
Post publicly reports arXiv submission data and encourages further investigation through open methods.
Blog is freely accessible without paywall or registration requirements per domain profile.
Inferences
The author demonstrates commitment to free information access by publishing raw data and inviting peer scrutiny.
Public moderation policy that does not delete comments based on source promotes open expression of ideas.
Post strongly advocates for access to scientific culture and intellectual commons: author publishes openly accessible analysis of arXiv data, encourages community participation in scientific investigation, and implicitly criticizes gatekeeping in academic publishing.
FW Ratio: 60%
Observable Facts
Post presents scientific data analysis in publicly accessible format without paywalls.
Author invites community participation in scientific investigation and peer review of findings.
WordPress implementation includes accessibility features (screen-reader-text) per domain profile.
Inferences
Free publication of scientific analysis directly supports right to participate in cultural and scientific life.
Open comment policy promotes democratic participation in scientific discourse.
Post implicitly critiques exploitative labor dynamics in academic publishing: author notes the 'sad state of hep-th' where PIs produce mediocre papers using grad students/postdocs, and discusses how AI will democratize this process by removing gatekeeping. This critique implies concern for labor dignity and fair compensation.
FW Ratio: 50%
Observable Facts
Post describes system where 'grant-holding PIs using grad students/postdocs to produce lots of mediocre papers with the PI's name on them' will change.
Author frames AI as potential democratizer that allows 'anyone' to participate, implying critique of current hierarchical gating.
Inferences
Author's tone suggests concern about exploitative grad student/postdoc labor in academia where credit flows to PIs.
Framing of AI as democratizer implies critique of current power imbalance in academic publishing.
Post implicitly references the need for social order that protects these rights: discussion of arXiv governance and concerns about paper submission systems implies awareness that institutional order is necessary to maintain scientific integrity.
FW Ratio: 50%
Observable Facts
Post references 'what the arXiv is doing about this,' suggesting awareness of institutional governance role.
Inferences
Author's concern about institutional response suggests recognition that social order/frameworks are necessary for rights protection.
Post does not explicitly address privacy, but the topic concerns AI surveillance of academic output and scrutiny of paper production patterns, which implicitly touches on monitoring of individual intellectual work.
FW Ratio: 50%
Observable Facts
Page content discusses arXiv submission data and patterns, implying scrutiny of individual researcher output.
WordPress Jetpack tracking pixel (wpstats) is embedded in page structure per domain profile.
Inferences
The discussion of AI-generated papers and submission pattern tracking implies potential concerns about privacy of intellectual work.
Embedded tracking pixel suggests low-level privacy intrusion on user browsing behavior, though minimal.
Blog is publicly accessible without paywall or registration barriers; comments are permitted and moderated for relevance rather than suppressed; encourages reader agency in investigation and discussion per DCP (access_model modifier +0.08).
Blog is publicly accessible and supports scientific discourse through comments; domain includes screen-reader-text accessibility implementation per DCP (+0.05 modifier); free access model supports Article 27 (+0.08 modifier).
Author describes current academic system as 'sad state of hep-th' and earlier 'mediocre papers,' using emotionally charged framing to characterize the status quo.
appeal to fear
Post is framed around potential 'apocalypse' with doubled submission rates, presenting scenario of being 'flooded' with papers as implicit threat to scientific integrity.
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 11:31:12 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.