CitePlag - Citation-based Plagiarism Detection

CitePlag is the first plagiarism detection system to implement Citation-based Plagiarism Detection (CbPD) – a novel approach capable of detecting also heavily disguised plagiarism in academic texts [2010].

Existing plagiarism detection software only examines literal text similarity, and thus typically fails to detect disguised plagiarism forms, including paraphrases, translations, or idea plagiarism. CbPD addresses this shortcoming by additionally analyzing the citation placement in the full-text of documents to form a language-independent semantic “fingerprint” of document similarity [2011a].

CbPD can be applied to any text containing citations – this includes academic documents, scientific publications, patents, legal cases, etc. The approach overcomes the shortcoming of existing text-based plagiarism detection methods. Existing methods typically fail to detect translated and strongly disguised plagiarism instances, since they only examine words (i.e. text overlap) in documents to detect suspicious similarity.

Our observations confirmed that citation pattern similarity often remains detectable even if text has been translated or strongly paraphrased [2013]. Thus, in many instances, CbPD allows detecting plagiarisms that could otherwise not have been automatically identified using the traditional text-based approaches: for example, when text was sufficiently disguised by synonyms or word rearrangement, or because copied text was translated. That citation patterns in plagiarized texts often have suspicious similarities with the citation patterns in the original source document(s) was also confirmed in our analysis of the plagiarized doctoral thesis of Karl-Theodor zu Guttenberg [2011b] as well an analysis of the VroniPlag Wiki performed in [2014b]. An evaluation of the citation-based approach on a large scale collection of over 200,000 scientific publications in the PubMed Central Open Access Subset demonstrated the practicability of the approach in a real-world setting and on a range of realistically disguised plagiarism forms [2014a].

For details and an in-depth analysis of the CbPD approach, refer to the doctoral dissertation of Bela Gipp, which is available as a book from Springer Vieweg Research [2014b] and for download here.

CitePlag implements several citation-based algorithms to analyze the citation patterns of publications. The screenshot above shows two publications visualized in the CitePlag prototype. Matching citations are highlighted and connected in a central column for quick document examination. The documents share no literal text similarity: the left publication is in English and the right in Chinese. However, one can see that the overlap of citations is high, and the order in which sources are cited is nearly identical in several paragraphs. The English text only contains 7 citations that are not shared (gray circles) with the Chinese text.

Try CitePlag here.

CitePlag is available as open source from GitHub 

Media Coverage of our Research on Plagiarism Detection

National daily newspaper "Die Welt"

  • "Wer sich für eigene Texte fremder Quellen bedient, muss immer öfter

    damit rechnen, entdeckt zu werden. Informatiker haben einen neuen Weg

    gefunden, um Plagiate aus medizinischen Studien aufzuspüren."

Show more

National public radio broadcaster Deutschlandradio Kultur

  • "Simples "Copy Paste" war gestern, heute sind Plagiatoren geschickter.

    Doch die Plagiatsjäger sind ihnen auf den Fersen - mit immer

    ausgeklügelterer Software."

Show more

Regional daily newspaper Schwäbische Zeitung

  • "Der Konstanzer Professor Bela Gipp hat mit einem Mitarbeiter eine neuartige Software für das Erkennen von Plagiaten im Bereich Biomedizin entwickelt."

Show more

National public radio broadcaster Deutschlandfunk

  • "Ob Karl-Theodor zu Guttenberg oder Annette Schavan - Abschreiben ist in

    der Wissenschaft offenbar weit verbreitet. Neben lückenloser Aufklärung

    tut darum auch Vorbeugung not. Das finden auch Plagiatsforscher von der

    Uni Konstanz und haben darum das Projekt „Refairenz“ ins Leben gerufen."

Show more

University magazine uniKon

  • "'I want to point out that I'm not a plagiarist hunter', says Bela Gipp. 'My goal is to increase the effort of plagiarizing until it is no longer worth it.'"

Show more

National public radio broadcaster Deutschlandradio

  • "Die Software CitePlag soll Plagiate besser als andere Programme

    erkennen. Ihr Trick: Sie durchsucht die Quellen der wissenschaftlichen


Show more

Related Publications

[2014a] Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus

B. Gipp, N. Meuschke, and C. Breitinger

Journal of the American Society for Information Science and Technology, vol. 65, iss. 2, pp. 1527-1540, 2014



[2014b] Citation-based Plagiarism Detection - Detecting Disguised and Cross-language Plagiarism using Citation Pattern Analysis

B. Gipp

Springer Vieweg Research, 2014



[2013] Demonstration of Citation Pattern Analysis for Plagiarism Detection

B. Gipp, N. Meuschke, C. Breitinger, M. Lipinski, and A. Nuernberger

in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, UK, 2013



[2011a] Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence

B. Gipp and N. Meuschke

Proceedings of the 11th ACM symposium on Document engineering (DocEng ’11), Mountain, View, CA, USA, 2011



[2011b] Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag

B. Gipp, N. Meuschke, and J. Beel

in Proceedings of 11th annual international ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11), Ottawa, Canada, 2011



[2010] Citation Based Plagiarism Detection – A New Approach to Identify Plagiarized Work Language Independently

B. Gipp and J. Beel

Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT’10), New York, NY, USA, 2010