Itâs almost impossible to overstate the importance and impact of arXiv, the science repository that, for a time, almost single-handedly justified the existence of the internet. ArXiv (pronounced âarchiveâ or âArr-ex-eye-veeâ depending on who you ask) is a preprint repository, where, since 1991, scientists and researchers have announced âhey I just wrote thisâ to the rest of the science world. Peer review moves glacially, but is necessary. ArXiv just requires a quick once-over from a moderator instead of a painstaking review, so it adds an easy middle step between discovery and peer review, where all the latest discoveries and innovations canâcautiouslyâbe treated with the urgency they deserve more or less instantly.
But the use of AI has wounded ArXiv and itâs bleeding. And itâs not clear the bleeding can ever be stopped.
As a recent story in The Atlantic notes, ArXiv creator and Cornell information science professor Paul Ginsparg has been fretting since the rise of ChatGPT that AI can be used to breach the slight but necessary barriers preventing the publication of junk on ArXiv. Last year, Ginsparg collaborated on a piece of analysis that looked into probable AI in arXiv submissions. Rather horrifyingly, scientists evidently using LLMs to generate plausible-looking papers were more prolific than those who didnât use AI. The number of papers from posters of AI-written or augmented work was 33 percent higher.
AI can be used legitimately, the analysis says, for things like surmounting the language barrier. It continues:
âHowever, traditional signals of scientific quality such as language complexity are becoming unreliable indicators of merit, just as we are experiencing an upswing in the quantity of scientific work. As AI systems advance, they will challenge our fundamental assumptions about research quality, scholarly communication, and the nature of intellectual labor.â
Itâs not just ArXiv. Itâs a rough time overall for the reliability of scholarship in general. An astonishing self-own published last week in Nature described the AI misadventure of a bumbling scientist working in Germany named Marcel Bucher, who had been using ChatGPT to generate emails, course information, lectures, and tests. As if that wasnât bad enough, ChatGPT was also helping him analyze responses from students and was being incorporated into interactive parts of his teaching. Then one day, Bucher tried to âtemporarilyâ disable what he called the âdata consentâ option, and when ChatGPT suddenly deleted all the information he was storing exclusively in the appâthat is: on OpenAIâs serversâhe whined in the pages of Nature that âtwo years of carefully structured academic work disappeared.â
Widespread, AI-induced laziness on display in the exact area where rigor and attention to detail are expected and assumed is despair-inducing. It was safe to assume there was a problem when the number of publications spiked just months after ChatGPT was first released, but now, as The Atlantic points out, weâre starting to get the details on the actual substance and scale of that problemânot so much the Bucher-like, AI-pilled individuals experiencing publish-or-perish anxiety and hurrying out a quickie fake paper, but industrial scale fraud.
For instance, in cancer research, bad actors can prompt for boring papers that claim to document âthe interactions between a tumor cell and just one protein of the many thousands that exist,â the Atlantic notes. If the paper claims to be groundbreaking, itâll raise eyebrows, meaning the trick is more likely to be noticed, but if the fake conclusion of the fake cancer experiment is ho-hum, that slop will be much more likely to see publicationâeven in a credible publication. All the better if it comes with AI generated images of gel electrophoresis blobs that are also boring, but add additional plausibility at first glance.
In short, a flood of slop has arrived in science, and everyone has to get less lazy, from busy academics planning their lessons, to peer reviewers and ArXiv moderators. Otherwise, the repositories of knowledge that used to be among the few remaining trustworthy sources of information are about to be overwhelmed by the disease that has alreadyâpossibly irrevocablyâinfected them. And does 2026 feel like a time when anyone, anywhere, is getting less lazy?
