Text Alignment System for Plagiarism Detection, version 2.0 (2015)

The version 1.0 of our system was the best-performing at the first corpus and third best-performing at the second corpus in the Text Alignment task at the 2014 international competition PAN - Uncovering Plagiarism, Authorship, and Social Software Misuse. This version 2.0 of our system, which was improved since the version 1.0 (2014), participated in the PAN 2015 competition, but no winner was announced in that year.

The system is described in the following paper(s):

Version 2.0:

  1. Sanchez-Perez, M.A., Gelbukh, A.F., Sidorov, G. Dynamically adjustable approach through obfuscation type recognition. Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, September 8-11, 2015. CEUR Workshop Proceedings, vol. 1391, CEUR-WS.org, 2015, http://ceur-ws.org/Vol-1391/92-CR.pdf.

Version 1.0:

  1. Sanchez-Perez, M.A., Gelbukh, A., Sidorov, G. Adaptive algorithm for plagiarism detection: The best-performing approach at PAN 2014 text alignment competition. Lecture Notes in Computer Science, vol. 9283, Springer, 2015, pp. 402-413. doi: 10.1007/978-3-319-24027-5_42
  2. Miguel Sanchez-Perez, Grigori Sidorov, Alexander Gelbukh. The Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014. In: L. Cappellato, N. Ferro, M. Halvey, W. Kraaij (eds.). Notebook for PAN at CLEF 2014. CLEF 2014. CLEF2014 Working Notes. Sheffield, UK, September 15-18, 2014. CEUR Workshop Proceedings, ISSN 1613-0073, Vol. 1180, CEUR-WS.org, 2014, pp. 1004–1011.
    (This paper can be also mistakenly indexed as “A Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014”.)
    Abstract of the same paper published separately.

Any work that uses this data or software should cite the abovementioned paper(s).

License: free for non-commercial academic purposes. Any publication that benefited from these data or software must state the origin of the data and software and cite the abovementioned paper(s). We will be grateful to you if you let us know of the use of the data or software and of citing our papers. Any derived work should specify the original source and its authors and contain this license, including the publication references mentioned above. If you modify this corpus or software, correct errors in it, or add annotation/functionality to it, we will be grateful if you send us the new version, to be available from this site. See also individual license files or comments in the specific files, if any.

Version 2.0 (current):

    This is the version that was submitted to PAN 2015 (there was no winner announced).

    Download the Text Alignment System for Plagiarism Detection version 2.0: all files in one ZIP, separate files: license, readme, code.

Previous versions:

  1. Version 1.0 (superseded by version 2.0)