Are AI tools reshaping scientific prose on arXiv

Tech culture has spent years imagining a future where AI writes papers for us, but the quieter tremor is in the way science talks to itself. A study led by Abdulkareem Alsudais at Prince Sattam bin Abdulaziz University in Saudi Arabia asks a simple, unsettling question: has the rise of accessible language models nudged the prose of science toward greater complexity? By peering at a colossal archive of abstracts on arXiv from 2010 through mid-2024, the researchers measure readability with four standard formulas and then look for tells that something different happened after ChatGPT burst onto the scene in late 2022. The answer, they find, is not a single dramatic rewrite but a steady acceleration toward denser, more intricate language, with a clearly noticeable shift beginning in 2023 and continuing into 2024. This is not a manifesto about robots replacing authors; it’s a window into how the tools we use to shape language might subtly reshape the texture and tempo of scientific communication.

Readability, in the strict, technical sense used here, is a proxy rather than a measure of understanding. It relies on surface features—how many characters a text contains, how many words and sentences it uses, how long those sentences are, and how many syllables are packed into words. The researchers apply four formulas to every abstract: ARI, CLI, Flesch Reading Ease, and Flesch–Kincaid Grade Level. When you average thousands of abstracts across years and categories, you start to see trends that feel almost historical: the prose climbs in complexity year after year, and the post-2022 period carries a sharper inflection. The lead author behind this data-rich, long-view boil is Abdulkareem Alsudais, a scholar from Prince Sattam bin Abdulaziz University, whose team treats language as a kind of geological layer—the imprint of how science writes itself into the record over time.

What makes this study particularly striking is not just the observation that complexity grows, but the way the authors triangulate the signal. They examine eight broad fields on arXiv and verify that the trend holds across disciplines, not just in one corner of science. They also test different ways of measuring publication timing, because arXiv abstracts come with a version trail rather than a single publish date. In other words, they ask whether the observed shift is an artifact of data wrangling or a real change in how scientists phrase their work. The result is a robust pattern: a steady climb in readability scores since 2010, and a pronounced jump in 2023 and 2024 that aligns with the rising visibility of generative AI tools. The conclusion is careful, but provocative: as AI-assisted writing becomes more common, the DNA of scientific prose may be changing in a detectable, measurable way.

Tracking readability through a decade of arXiv abstracts

The dataset is ambitious: every abstract posted to arXiv from 2010 through June 7, 2024, across eight broad categories that cover physics, mathematics, computer science, economics, and more. Some papers carry multiple category tags, a reminder that modern research often sits at the crossroads of fields. For each abstract, the team computed four readability scores. ARI and CLI translate textual features into a rough U.S. grade level, while Flesch Reading Ease and Flesch–Kincaid Grade Level translate those features into a sense of ease or difficulty. The four metrics pull in slightly different directions, but they converge on a single, telling pattern: abstracts drift toward greater linguistic density over time, even as the volume of published work grows. In a field that prizes rapid communication, this drift is easy to miss in daily writing, but it accumulates when you examine a decade of abstracts across thousands of papers.

One important nuance is that readability is not a moral or intellectual verdict. A more complex sentence structure can mirror deeper, more nuanced ideas. The authors stress this point to guard against oversimplifying a trend into a value judgment. Still, the data are consistent enough to raise meaningful questions about accessibility. If abstracts become denser, do potential readers—early-career researchers, scholars crossing disciplines, or students in related fields—face steeper barriers to discovering relevant work? The authors are careful not to blame AI for every wrinkle, but the timing invites a closer look at what tools that edit or generate text might be doing to the linguistic landscape of science.

From a methodological standpoint, the study also showcases how to extract signal from a noisy, messy dataset. Readability formulas rely on surface-level features rather than semantic meaning. So the researchers emphasize that higher ARI or CLI scores—or lower FRE scores—do not automatically imply that a paper is