Computer Science > Information Retrieval

arXiv:2206.02743 (cs)

[Submitted on 6 Jun 2022 (v1), last revised 12 Feb 2023 (this version, v3)]

Title:A Neural Corpus Indexer for Document Retrieval

Authors:Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Allen Sun, Weiwei Deng, Qi Zhang, Mao Yang

View PDF

Abstract:Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.

Comments:	19 pages, 6 figures, accepted by NeurIPS 2022
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2206.02743 [cs.IR]
	(or arXiv:2206.02743v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2206.02743

Submission history

From: Yingyan Hou [view email]
[v1] Mon, 6 Jun 2022 16:56:52 UTC (1,118 KB)
[v2] Fri, 14 Oct 2022 03:03:52 UTC (2,280 KB)
[v3] Sun, 12 Feb 2023 14:47:08 UTC (1,261 KB)

Computer Science > Information Retrieval

Title:A Neural Corpus Indexer for Document Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:A Neural Corpus Indexer for Document Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators