Published December 1, 2021 | Version 1.0.0
Dataset Open

The CLEF-IP 2012 Test Collection

  • 1. Technische Universität Wien, Vienna, Austria
  • 2. ROR icon TU Wien
  • 3. Information Retrieval Facility, Vienna, Austria
  • 4. SAIC-Frederick Inc., US
  • 5. University of Birmingham, UK
  • 6. Qatar Foundation, Qatar

Description

CLEF-IP: Cross-Language Evaluation Forum - Intellectual Property

The CLEF-IP track ran from 2009 to 2013 and aimed to investigate IR techniques for patent retrieval.The track utilizes a collection of more than 1.3M patent documents (~2.6 million files) derived from EPO (European Patent Office) sources and EuroPCT Applications (more than 400K documents) published by WIPO (World Intelectual Property Organization). The collection contains documents in English, French and German with at least 150,000 documents in each language, all published before 2001.

There were three tasks in 2012: The first one was to find patent documents that are candidates to constitute prior art for a given claim taken from a patent document. The second task, flowchart recognition, asked participants to extract the information in these images and return it in a predefined textual format. The third task, chemical structure regonition, participants had to identify the location of the chemical structures depicted on images of patent pages and, for each of them, return the corresponding structure in a MOL file (a chemical structure file format).

Files

  1. Document Collection
    The first one is a set of XML files representing a total of over 1.3 million patent documents.
    NOTE: the document collection is the same as the one published for CLEF-IP 2011, excluding images.
  2. Topics and Answers
    Both the training and the test topic sets contain also the relevance assessments for the topics.

 

Files

Files (13.6 GiB)

Name Size
md5:388bbf99f7d156e356af97f523e1595b
13.3 GiB Download
md5:397c2aa4308508a4f88c74489b90adf3
353.3 MiB Download

Additional details

Related works