Published November 30, 2021 | Version 1.0.0
Dataset Open

The CLEF-IP 2011 Test Collection

  • 1. Technische Universität Wien, Vienna, Austria
  • 2. ROR icon TU Wien
  • 3. max-recall GmbH, Vienna, Austria
  • 4. Information Retrieval Facility, Vienna, Austria

Description

CLEF-IP: Cross-Language Evaluation Forum - Intellectual Property

The CLEF-IP track ran from 2009 to 2013 and aimed to investigate IR techniques for patent retrieval.The track utilizes a collection of more than 1.3M patent documents (~2.6 million files) derived from EPO (European Patent Office) sources and EuroPCT Applications (more than 400K documents) published by WIPO (World Intelectual Property Organization). The collection contains documents in English, French and German with at least 150,000 documents in each language, all published before 2001.

There were four tasks in 2011: The first one was to find patent documents that are candidates to constitute prior art for a given document. The second task was to classify a given document according to the International Patent Classification system (IPC). The third task was to retrieve patent document that are candidates for prior art, where, in addition to the first task, images contained in the patent documents must be proceesd as well. The fourth task was to classifiy patent images into 9 classes in use at patent office. Relevance judgements are produced using the patent citations and meta-data.

Files

  1. Document Collection
    The corpus consists of two parts. The first one is a set of XML files representing a total of over 1.3 million patent documents - this collection is to be used for the first task. The second part is a subset of the first one to which images containing patent drawings are added - this collection is to be used for the third task. 
  2. Topics and Answers
    Both the training and the test topic sets contain also the relevance assessments for the topics.
  3. Guidelines
    Detailed explanation on how to work with the tasks from the corpus.

Files

Files (24.3 GiB)

Name Size
md5:4b825389321ba1e989d48eb2da170f99
23.4 GiB Download
md5:c3b93131fb09c0df2b4eca3c2668d647
933.5 MiB Download
md5:26f5799805463c60f3e206ac6a3ffb25
265.0 KiB Download

Additional details

Related works