Frankenstein Citations: How AI is Undermining Academic Integrity

The phenomenon, which experts have dubbed “confident hallucinations,” “phantom references,” or “Frankenstein citations,” risks becoming one of the 21st century’s greatest threats to academic science and education on a global scale. Triggered by the widespread integration of large language models (LLMs) into practical research, it manifests as follows: a neural network generates bibliographic references to works that never existed, inserts quotes from made-up people, and operates with figures that have nothing to do with reality.

High-Profile Cases

Between 2022 and 2026, there have been documented instances of AI-fabricated sources infiltrating court documents, peer-reviewed journals, and the mass media. Let’s examine four well-documented cases that help gauge the scale of the threat.

Even before the mass proliferation of ChatGPT, in November 2022, Darren Hick, a professor at Furman University (South Carolina), published a post on his academic blog that quickly went viral. One of his students had submitted an essay on David Hume’s “paradox of horror.” The text was logical and stylistically sound, but neither the authors of the quotes nor the publications could be found in any database. An investigation revealed that the student had used an AI tool that fabricated the essay, convincingly weaving hallucinations into the discourse. Hick’s publication became one of the first systematic descriptions of the “hallucinated bibliography” problem and prompted educators worldwide to introduce mandatory source verification for student papers.

In 2023, during the Mata v. Avianca Airlines lawsuit, attorney Steven Schwartz, representing the plaintiff in the US District Court for the Southern District of New York, filed a legal brief containing citations to six judicial precedents. All of them appeared relevant and were formatted strictly to the Bluebook standard. However, Judge P. Kevin Castel could not locate a single one of the cited cases in legal databases. During the proceedings, the attorney admitted to using ChatGPT to find precedents, and the model had simply invented all six rulings, complete with plausible titles, citations, and even the names of the judges. Schwartz was fined $5,000; the court noted that “the submission of fictitious judicial decisions undermines the very foundation of justice.”

A couple of months later, the publisher Elsevier initiated the retraction of several publications in the journal Resources Policy. The catalyst was the discovery by readers and editors of text fragments clearly generated by a language model. Further investigation revealed that the bibliographies of these articles contained references to non-existent works. The titles of the papers looked thematically accurate, and the authors were real researchers, yet upon verification through CrossRef and Google Scholar, not a single entry could be confirmed. The incident sparked a heated debate within the academic community about the inadequacy of existing peer-review mechanisms in an era of easily accessible generative AI.

The Fabrication Mechanism

And a fourth case: in April 2026, a claim that the current eviction rate in Ireland had reached its highest level since the Great Famine was debunked. The fact-check revealed that the manipulation methodology relied on the selective use of statistical data, ignoring context, and tailoring numbers to fit an emotionally charged narrative. The concept of this tactic — cherry-picking — is explained in detail in an article from the GFCN educational section.

Cherry-picking is a form of data manipulation in which only the facts that support a desired viewpoint are selected, while all others are completely ignored.

This case demonstrates how unscrupulous authors—whether in media, publicistic writing, or academic texts — can combine “convenient” real data with entirely AI-fabricated sources, creating a powerful illusion of scientific validity.

To understand the nature of the cherry-picking problem and “Frankenstein citations,” one must remember that large language models are neither search engines nor databases. Their primary task is to predict the most probable sequence of tokens (words or parts of words) based on the text corpus they were trained on. When a user requests a scientific citation, the model does not consult an external repository; instead, it generates a response that statistically aligns with the prompt. It “knows” that the phrase “as the study showed” is highly likely to be followed by an “author, year” construction. It “remembers” that the name Smith correlates with a specific scientific field, and that the Journal of Cognitive Neuroscience is a relevant publication. By combining these elements, the model creates a plausible but fictitious artifact. A DOI (Digital Object Identifier) is generated in much the same way: the algorithm reproduces the correct structure of the identifier, but it points to absolutely nothing. This effect is precisely what is called a “confident hallucination.”

Fact-Checking as an Academic Norm

Hundreds of cases worldwide indicate that this phenomenon has manifested quite broadly: in law, medicine, economics, philosophy, and the media sphere. In all instances, the fabricated sources were not detected at an early stage because they appeared formally correct. Consequently, academic integrity in the age of AI can no longer be limited merely to avoiding plagiarism — it demands the mandatory, methodical verification of every single source. The same standard applied to news publications — the verifiability of claims — must become the norm in student, scientific, and expert papers. The four cases described above illustrate that the cost of neglecting such verification is incredibly high: ranging from reputational damage and financial penalties to the retraction of scientific papers and the erosion of public trust.

The practical response to this challenge lies in clear verification algorithms. The following minimal protocol is recommended and can be incorporated into academic regulations and author guidelines:

1. Title search via Google Scholar, Scopus, Web of Science, or CrossRef: A lack of search results within a few minutes highly likely indicates a “Frankenstein citation.”

2. DOI validation: Go to doi.org and enter the identifier manually. If it cannot be found or redirects to an entirely different publication, the source does not exist.

3. Author profile verification: Look up the cited researcher on ORCID (the international digital registry that assigns a unique 16-digit number to every scientist in the world), Google Scholar, or their institution’s website, and cross-check their list of publications. The absence of the article in a real scientist’s profile is sufficient grounds to exclude the citation.

Furthermore, there are specialized AI tools designed with architectures that extract data exclusively from verified corpora, such as Consensus, Scite, and Elicit. Utilizing them significantly reduces the likelihood of incorporating “phantom” sources into a paper. Verifying the specific quotes an author uses to support their arguments is more difficult and quite labor-intensive, but the cost of an error is vastly higher: research or processes built on false premises can lead to billions in losses and even human casualties.

In the 2020s, the phenomenon of non-existent bibliographic references generated by generative AI transitioned from a hypothetical risk to a documented, practical problem. Plausible AI phantoms exploit a fundamental vulnerability in human psychology: the inherent tendency to trust formally correct attributes. The only sustainable defense in this scenario is the rigorous verification of every source — an approach that must be permanently cemented in academic, legal, and media practices. Otherwise, “phantom authors” will eventually come to dominate the information space and irreversibly alter the sociocultural landscape, shifting humanity from an “economy of trust” to an “economy of lies.”