Published April 11, 2024 | Version 1.0.0
Dataset Open

LongEval 2024 Train Collection

Description

The collection consists of queries and documents provided by the Qwant search Engine (https://www.qwant.com). The queries, which were issued by the users of Qwant, are based on the selected trending topics. The documents in the collection were selected with respect to these queries using the Qwant click model. Apart from the documents selected using this model, the collection also contains randomly selected documents from the Qwant index. All the data were collected over January 2023. In total, the collection contains 599 train queries, with corresponding 9,785 relevance assessments coming from the Qwant click model. The set of documents consist of 2,049,729 downloaded, cleaned and filtered Web Pages. Apart from their original French versions, the collection also contains translations of the webpages and queries into English. The collection serves as the official training collection for the 2024 LongEval Information Retrieval Lab (https://clef-longeval.github.io/) organised at CLEF.

 

The data is released under the Qwant LongEval Attribution-NonCommercial-ShareAlike License.

Files

LongEval 2024 Train Collection Readme.pdf
Files (14.4 GiB)
Name Size
md5:406acc34ab03b0e85fd9973de5df624e
14.4 GiB Preview Download
md5:6ae2d0833c6b9d7fb868b465a189b481
42.5 KiB Preview Download
Created:
April 12, 2024
Modified:
April 16, 2024