Pillar A — Detection
The Psychology-Statistic Lens: Understanding the “Why” Behind the Fraud
Most forensic approaches ask “Is this data real?” We start with a
different question: “Why would someone fabricate this?” By combining
behavioral psychology with statistical forensics, we create a dual-layer detection system
that is both mathematically rigorous and psychologically informed.
Human beings are poor random number generators. When researchers fabricate data, their
cognitive biases leave statistical fingerprints — patterns of “too-clean”
results, terminal digit preferences, and distributions that lack the natural noise of
authentic measurement. Our approach detects these signatures using:
- GRIM Testing — Verifying whether reported means are mathematically possible given stated sample sizes
- SPRITE Reconstruction — Reverse-engineering plausible raw datasets from reported summary statistics
- P-Curve Analysis — Evaluating whether a body of work shows the distribution of p-values expected from genuine effects versus selective reporting
- Benford’s Law Application — Identifying unnatural digit distributions in large quantitative datasets
- Behavioral Pattern Analysis — Assessing “motivational context” — career pressure, grant cycles, and promotion timelines that correlate with anomalous output
Pillar B — Mapping
Bibliometric Network Analysis: Exposing Hidden Structures
Individual fraud is a symptom. Systemic manipulation is the disease. Pillar B uses
scientometric and bibliometric tools to map the networks, cartels, and mill operations
that operate beneath the surface of legitimate scholarly publishing.
Citation cartels — closed networks of editors, authors, or journals that artificially
inflate each other’s metrics — are invisible to traditional analysis. They only
become visible when you map the relational structure of citations across time and geography.
Paper mills are similarly hidden until you analyze co-authorship patterns, submission velocities,
and template reuse at scale.
- Co-Citation Clustering — Identifying journals and authors that cite each other at statistically improbable rates
- Reciprocity Ratio Analysis — Measuring bilateral citation exchange to flag coordinated inflation
- Authorship Network Mapping — Detecting “stranger clusters” — co-authors with no institutional, geographic, or disciplinary connection
- Submission Velocity Auditing — Flagging authors with publication rates that exceed human research capacity
- Template Fingerprinting — Identifying structural similarities across manuscripts that suggest automated or mill-based production
Pillar C — Accountability
The RI² Index: Research Integrity Risk at the Institutional Level
Individual papers can be retracted. Individual researchers can be investigated. But what
about the institutions that create the conditions for misconduct? The RI² (Research
Integrity Risk Index), inspired by the frameworks proposed by Meho and Ioannidis, shifts
the lens from individual actors to institutional environments.
The RI² evaluates institutions across three core risk metrics:
- D-Rate (Delisted Journal Exposure) — What percentage of an institution’s total publications appear in journals that have been delisted from Scopus or Web of Science for quality concerns? A high D-Rate suggests systemic tolerance for low-quality publishing channels.
- R-Rate (Retraction Density) — What is the ratio of retracted papers to total output, normalized by field and institution size? The R-Rate reveals whether retractions are isolated incidents or structural patterns.
- S-Rate (Self-Citation Inflation) — What proportion of an institution’s citation impact is generated internally through self-citation? Elevated S-Rates suggest metric gaming at the organizational level.
Combined, these three metrics produce a composite RI² score that classifies institutions
as Green (low risk),
Amber (moderate risk / monitoring recommended),
or Red (high risk / investigation warranted).