Skip to content

[ RESEARCH INTEGRITY ] February 2, 2024

What tools should we use in the fight against data fabrication?

Academic misconduct is surging, and data fabrication is stealing the spotlight. How do we tackle this issue and restore integrity?

What is data fabrication?

In scholarly publishing, a well-known bias toward positive results is setting the stage for a concerning trend — the widespread alteration of data to fit a manuscript's narrative. The issue is highlighted by a recent study from German researchers revealing that a one in five journal articles may contain manipulated data.

Adding fuel to the fire are paper mills, profit-driven entities that produce fraudulent scholarly content. According to Clear Skies’ Adam Day, these mills contributed to an estimated 1.5% to 2% of all papers published in 2022, flooding journals with fake data.

Data fabrication, spanning from subtle changes to outright falsifications, threatens to damage scientific integrity and public trust in research.


Harvard’s data deception

Last year, Harvard professor Francesca Gino gained attention for all the wrong reasons. Initially known for her groundbreaking insights into dishonesty and its connections to networking, taxes, and more, Gino found herself under scrutiny when the blog Data Colada initiated an investigation, uncovering evidence of data fabrication.

The fraud extended beyond a single study, as a thorough examination of Gino's work spanning over a decade revealed instances of misconduct, including in papers published as recently as 2020. The Hartford, an insurance company collaborating on one implicated study, confirmed to NPR that the data they provided for the study had been inappropriately manipulated after submission to Gino. 

This case serves as a stark wake-up call to the state of research integrity. The irony is shocking as even a professor at one of the world's most prestigious universities engaged in misconduct, particularly given Gino's focus on the study of dishonesty


Chat GPT: crafting fiction or facts?

In November 2023, a groundbreaking study shed light on the concerning landscape of data fabrication in research and the unsettling potential of new technologies to worsen the issue.

Published in JAMA Ophthalmology, the study revealed that when prompted to generate data supporting a specific conclusion, Chat GPT could utilize a set of parameters to produce semi-random datasets aligning with the desired outcomes. The inspiration for this investigation stemmed from the widespread use of plagiarism through Chat GPT. 

Nature collaborated with researchers Jack Wilkinson and Zewen Lu and evaluated the dataset, revealing numerous errors, including mismatches in names and genders of 'patients' and a lack of correlation between pre- and post-operative vision capacities.

Yet, despite these flaws, the creation of fake data by Chat GPT raises alarming concerns about the potential trajectory of this technology.


Is AI the answer?

As we've emphasized, the issue of data fabrication looms large within scholarly publishing. According to the 2023 Research Integrity survey conducted by Morressier, 38% of respondents said data fabrication was the most common type of research misconduct, followed by plagiarism. The key question now facing our industry is: How do we confront this challenge head-on?

While we have acknowledged that AI has the potential to worsen this issue, it prompts the question: If AI can be used to generate false data, why not harness its power to identify fabrications?

When data undergoes modification or is generated without proper evidence, AI tools may prove more useful than manual systems in detecting this misconduct. Armed with sophisticated algorithms, AI systems can rapidly analyze datasets, pinpointing unusual patterns that could signal rogue practices.



At Morressier, we are actively aware of the threat of data fabrication within our industry. In response, we provide cutting-edge AI-powered workflows and robust integrity checks, placing the tools in the hands of publishers, societies, and organizers to rigorously assess the quality of the content they share across every stage of the research lifecycle.

We proactively incorporate a comprehensive suite of third-party solutions to detect, address, and prevent fraud and uphold the highest standards of integrity.

guide to research integrity