CITREC - Open Evaluation Framework for Citation-based and Text-based Similarity Measures

CITREC is an open evaluation framework for citation-based and text-based similarity measures. CITREC prepares the data of two formerly separate collections for a citation-based analysis and provides the tools necessary for performing evaluations of similarity measures. The first collection is the PubMed Central Open Access Subset (PMC OAS), the second is the collection used for the Genomics Tracks at the Text REtrieval Conferences (TREC) ’06 and ’07 (overview paper for the TREC Gen collection).

CITREC extends the PMC OAS and TREC Genomics collections by providing:

  1. citation and reference information that includes the position of in-text citations for documents in both collections;
  2. code and pre-computed scores for 35 citation-based and text-based similarity measures;
  3. two gold standards based on Medical Subject Headings (MeSH) descriptors and the relevance feedback gathered for the TREC Genomics collection;
  4. a web-based system (Literature Recommendation Evaluator – LRE) that allows evaluating similarity measures on their ability to identify documents that are relevant to user-defined information needs;
  5. tools to statistically analyze and compare the scores that individual similarity measures yield.

Demo System

The demo database (User: citrec_demo / Password: citrec) allows you to get a first impression of the data that CITREC offers and the kind of analysis the framework allows performing.

This Excel spreadsheet exemplifies a possible evaluation using CITREC data. The spreadsheet compares the scores calculated using different similarity measures dependent on the maximum Co-Citation score (i).