Abstract
With the ever increasing number of filed patent applications every year, the need for effective and efficient systems for managing such tremendous amounts of data becomes inevitably important. Patent retrieval (PR) is considered the pillar of almost all patent analysis tasks. PR is a subfield of information retrieval (IR) which is concerned with developing techniques and methods that effectively and efficiently retrieve relevant patent documents in response to a given search request. In this paper, we present a comprehensive review on PR methods and approaches. It is clear that recent successes and maturity in IR applications such as Web search cannot be transferred directly to PR without deliberate domain adaptation and customization. Furthermore, state-of-the-art performance in automatic PR is still around average in terms of recall. These observations motivate the need for interactive search tools which provide cognitive assistance to patent professionals with minimal effort. These tools must also be developed in hand with patent professionals considering their practices and expectations. We additionally touch on related tasks to PR such as patent valuation, litigation, licensing, and highlight potential opportunities and open directions for computational scientists in these domains.
Similar content being viewed by others
Notes
It is required to achieve 100% recall at acceptable precision.
References
Al-Shboul B, Myaeng SH (2014) Wikipedia-based query phrase expansion in patent class search. Inf Retr 17(5–6):430–451
Allison JR, Lemley MA, Schwartz DL (2013) Understanding the realities of modern patent litigation. Tex L Rev 92:1769
Allison JR, Lemley MA, Schwartz DL (2015) Our divided patent system. Univ Chic Law Rev 82(3):1073–1154
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York
Bashir S, Rauber A (2010) Improving retrievability of patents in prior-art search. In: Advances in information retrieval, Springer, pp 457–470
Bouadjenek MR, Sanner S, Ferraro G (2015) A study of query reformulation for patent prior art search with partial patent applications. In: Proceedings of the 15th international conference on artificial intelligence and law, ACM, pp 23–32
Brin S, Page L (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833
Cao G, Nie JY, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 243–250
Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 335–336
Chen Y, Spangler S, Kreulen J, Boyer S, Griffin TD, Alba A, Behal A, He B, Kato L, Lelescu A, et al (2009) Simple: a strategic information mining platform for licensing and execution. In: IEEE international conference on data mining workshops, 2009. ICDMW’09, IEEE, pp 270–275
Chen YL, Chiu YT (2011) An IPC-based vector space model for patent retrieval. Inf Process Manag 47(3):309–322
Cormack GV, Grossman MR (2014) Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 153–162
Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 299–306
Czarnitzki D, Hussinger K, Leten B (2011) The market value of blocking patent citations. ZEW - Zentrum für Europäische Wirtschaftsforschung/Center for European Economic Research
D’hondt E, Verberne S (2010) Clef-ip 2010: Prior art retrieval using the different sections in patent documents. In: CLEF (Notebook Papers/LABs/Workshops)
Eisinger D, Tsatsaronis G, Bundschus M, Wieneke U, Schroeder M (2013) Automated patent categorization and guided patent search using IPC as inspired by mesh and pubmed. J Biomed Semant 4(1):1
Fafalios P, Tzitzikas Y (2014) Exploratory professional search through semantic post-analysis of search results. In: Professional search in the modern world, Springer, pp 166–192
Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books, Cambridge
Fujii A (2007) Enhancing patent retrieval by citation analysis. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 793–794
Fujii A, Iwayama M, Kando N (2004) Overview of patent retrieval task at ntcir-4. In: NTCIR
Fujii A, Iwayama M, Kando N (2005) Overview of patent retrieval task at ntcir-5. In: In Proceedings of the fifth NTCIR workshop meeting on evaluation of information access technologies: information retrieval, question answering and cross-lingual information access, pp 269–277
Fujii A, Iwayama M, Kando N (2007) Overview of the patent retrieval task at the ntcir-6 workshop. In: NTCIR
Ganguly D, Leveling J, Magdy W, Jones GJ (2011) Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp 1953–1956
Giachanou A, Salampasis M, Paltoglou G (2015) Multilayer source selection as a tool for supporting patent search and classification. Inf Retr J 18(6):559–585
Gobeill J, Pasche E, Teodoro D, Ruch P (2009) Simple pre and post processing strategies for patent searching in CLEF intellectual property track 2009. In: Multilingual information access evaluation I: text retrieval experiments, Springer, pp 444–451
Golestan Far M, Sanne S, Bouadjenek MR, Ferraro G, Hawking D (2015) On term selection techniques for patent prior art search. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 803–806
Graf E, Azzopardi L (2008) A methodology for building a patent test collection for prior art search. In: Proceedings of the 2nd international workshop on evaluating information access, EVIA
Grossman MR, Cormack GV (2011) Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Rich JL & Tech 17:11–16
Hall BH, Jaffe A, Trajtenberg M (2005) Market value and patent citations. RAND J Econ 36(1):16–38
Harbert T (2013) The law machine. Spectrum 50(11):31–54
Harhoff D, Narin F, Scherer FM, Vopel K (1999) Citation frequency and the value of patented inventions. Rev Econ Stat 81(3):511–515
Harris CG, Foster S, Arens R, Srinivasan P (2009) On the role of classification in patent invalidity searches. In: Proceedings of the 2nd international workshop on patent information retrieval, ACM, pp 29–32
Harris CG, Arens R, Srinivasan P (2010) Comparison of ipc and uspc classification systems in patent prior art searches. In: Proceedings of the 3rd international workshop on patent information retrieval, ACM, pp 27–32
Hasan MA, Spangler WS, Griffin T, Alba A (2009) COA: finding novel patents through text analysis. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1175–1184
Hido S, Suzuki S, Nishiyama R, Imamichi T, Takahashi R, Nasukawa T, Idé T, Kanehira Y, Yohda R, Ueno T et al (2012) Modeling patent quality: a system for large-scale patentability analysis using text mining. Inf Med Technol 7(3):1180–1191
Hiemstra D, Robertson S, Zaragoza H (2004) Parsimonious language models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 178–185
Hu P, Huang M, Xu P, Li W, Usadi AK, Zhu X (2012) Finding nuggets in IP portfolios: core patent mining through textual temporal analysis. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, pp 1819–1823
Iwayama M, Fujii A, Kando N, Takano A (2003) Overview of patent retrieval task at NTCIR-3. In: Proceedings of the ACL-2003 workshop on Patent corpus processing, vol 20, association for computational linguistics, pp 24–32
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446
Jin X, Spangler S, Chen Y, Cai K, Ma R, Zhang L, Wu X, Han J (2011) Patent maintenance recommendation with patent information network model. In: 2011 IEEE 11th international conference on data mining (ICDM), IEEE, pp 280–289
Jürgens JJ, Hansen P, Womser-Hacker C (2012) Going beyond CLEF-IP: the reality for patent searchers? In: Information access evaluation. Multilinguality, multimodality, and visual analytics, Springer, pp 30–35
Kim J, Kang IS, Lee JH (2006) Cluster-based patent retrieval using international patent classification system. In: Computer processing of oriental languages. Beyond the orient, the research challenges ahead, Springer, pp 205–212
Konishi K (2005) Query terms extraction from patent document for invalidity search. In: NTCIR
Krestel R, Smyth P (2013) Recommending patents based on latent topics. In: Proceedings of the 7th ACM conference on Recommender systems, ACM, pp 395–398
Lanjouw JO, Pakes A, Putnam J (1998) How to count patents and value intellectual property: the uses of patent renewal and application data. J Ind Econ 46(4):405–432
Liu S, Liu F, Yu C, Meng W (2004) An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 266–272
Liu Y, Hseuh Py, Lawrence R, Meliksetian S, Perlich C, Veen A (2011) Latent graphical models for quantifying and predicting patent quality. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1145–1153
Lopez P, Romary L (2009) Multiple retrieval models and regression models for prior art search. In: CLEF 2009 workshop
Lopez P, Romary L (2010) Experiments with citation mining and key-term extraction for prior art search. In: CLEF 2010-conference on multilingual and multimodal information access evaluation
Lupu M, Huang J, Zhu J, Tait J (2009) Trec-chem: large scale chemical information retrieval evaluation at trec. In: ACM SIGIR forum, ACM, vol 43, pp 63–70
Lupu M, Tait J, Huang J, Zhu J (2010) Trec-chem 2010: notebook report. Proc TREC 2010:2
Lupu M, Mayer K, Tait J, Trippe AJ (2011a) Current challenges in patent information retrieval, vol 29. Springer, Berlin
Lupu M, Zhao J, Huang J, Gurulingappa H, Fluck J, Zimmermann M, Filippov IV, Tait J (2011b) Overview of the TREC 2011 chemical IR track. In: TREC
Lv Y, Zhai C (2009) Adaptive relevance feedback in information retrieval. In: Proceedings of the 18th ACM conference on Information and knowledge management, ACM, pp 255–264
Magdy W, Jones GJF (2010) Applying the KISS principle for the CLEF-IP 2010 prior art candidate patent search task. Dublin City University, CLEF labs
Magdy W, Jones GJ (2010b) Pres: a score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 611–618
Magdy W, Jones GJ (2011) A study on query expansion methods for patent retrieval. In: Proceedings of the 4th workshop on Patent information retrieval, ACM, pp 19–24
Magdy W, Leveling J, Jones GJ (2009) Exploring structured documents and query formulation techniques for patent retrieval. In: Multilingual information access evaluation I: text retrieval experiments, Springer, pp 410–417
Magdy W, Lopez P, Jones GJ (2011) Simple vs. sophisticated approaches for patent prior-art search. In: Advances in information retrieval, Springer, pp 725–728
Mahdabi P, Crestani F (2012) Learning-based pseudo-relevance feedback for patent retrieval. In: Multidisciplinary information retrieval, Springer, pp 1–11
Mahdabi P, Crestani F (2014a) The effect of citation analysis on query expansion for patent retrieval. Inf Retr 17(5–6):412–429
Mahdabi P, Crestani F (2014b) Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Trans Inf Syst (TOIS) 32(4):16
Mahdabi P, Crestani F (2014c) Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International conference on information and knowledge management, ACM, pp 1659–1668
Mahdabi P, Keikha M, Gerani S, Landoni M, Crestani F (2011) Building queries for prior-art search. Springer, Berlin
Mahdabi P, Gerani S, Huang JX, Crestani F (2013) Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 113–122
Mann RJ, Underweiser M (2012) A new look at patent quality: relating patent prosecution to validity. J Empir Leg Stud 9(1):1–32
Meij E, Weerkamp W, de Rijke M (2009) A query model based on normalized log-likelihood. In: Proceedings of the 18th ACM conference on Information and knowledge management, ACM, pp 1903–1906
NTCIR (2015) NTCIR test collections. http://research.nii.ac.jp/ntcir/permission/data-en.htm, http://research.nii.ac.jp/ntcir/permission/data-en.htm. Accessed 30 Apr 2016
Osbeck MK (2015) Using data analytics tools to supplement traditional research and analysis in forecasting case outcomes. U of Michigan Public Law Research Paper Series (446)
Osborn M, Strzalkowski T, Marinescu M (1997) Evaluating document retrieval in patent database: a preliminary report. In: Proceedings of the sixth international conference on Information and knowledge management, ACM, pp 216–221
Piroi F, Lupu M, Hanbury A, Sexton AP, Magdy W, Filippov IV (2010) CLEF-IP 2010: retrieval experiments in the intellectual property domain. In: CLEF (notebook papers/labs/workshops)
Piroi F, Lupu M, Hanbury A, Zenz V (2011) CLEF-IP 2011: retrieval in the intellectual property domain. In: CLEF (notebook papers/labs/workshop), Citeseer
Piroi F, Lupu M, Hanbury A, Magdy W, Sexton A, Filippov I (2012) CLEF-IP 2012: retrieval experiments in the intellectual property domain, vol 1178, CEUR-WS
Piroi F, Lupu M, Hanbury A (2013) Information access evaluation. In: Proceedings of CLEF 2013 4th international conference of the CLEF initiative multilinguality, multimodality, and visualization, Valencia, Spain, September 23–26, 2013, Springer, Berlin, chap Overview of CLEF-IP 2013 Lab, pp 232–249. https://doi.org/10.1007/978-3-642-40802-1_25
Rajshekhar K, Shalaby W, Zadrozny W (2016) Analytics in post-grant patent review: possibilities and challenges (preliminary report). In: Proceedings of the American society for engineering management 2016 international annual conference
Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. NIST Spec Publ SP 109:109
Roda G, Tait J, Piroi F, Zenz V (2010) Multilingual information access evaluation I. Text retrieval experiments: 10th workshop of the cross-language evaluation forum, CLEF 2009, Corfu, Greece, September 30–October 2, 2009, Revised selected papers, Springer, Berlin, chap CLEF-IP 2009: retrieval experiments in the intellectual property domain, pp 385–409. https://doi.org/10.1007/978-3-642-15754-7_47
Salampasis M, Hanbury A (2014) Perfedpat: an integrated federated system for patent search. World Pat Inf 38:4–11
Salampasis M, Giachanou A, Hanbury A (2014) An evaluation of an interactive federated patent search system. In: Multidisciplinary information retrieval, Springer, pp 120–131
Salton G (1971) The SMART retrieval system-experiments in automatic document processing. Prentice-Hall Inc, Upper Saddle River
Schwartz DL, Sichelman TM (2015) Data sources on patents, copyrights, trademarks, and other intellectual property. Copyrights, Trademarks, and Other Intellectual Property (August 17, 2015) 2
Shalaby W, Zadrozny W (2015) Measuring semantic relatedness using mined semantic analysis. arXiv preprint arXiv:1512.03465
Shalaby W, Zadrozny W (2016) Innovation analytics using mined semantic analysis. In: Proceedings of the 29th international FLAIRS conference
Shalaby W, Rajshekhar K, Zadrozny W (2016) A visual semantic framework for innovation analytics. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI-16). http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12303/12306
Spangler S, Chen Y, Kreulen J, Boyer S, Griffin T, Alba A, Kato L, Lelescu A, Yan S (2010) Simple: interactive analytics on patent data. In: 2010 IEEE international conference on data mining workshops (ICDMW), IEEE, pp 426–433
Spangler S, Ying C, Kreulen J, Boyer S, Griffin T, Alba A, Kato L, Lelescu A, Yan S (2011) Exploratory analytics on patent data sets using the simple platform. World Pat Inf 33(4):328–339
Tannebaum W, Rauber A (2012a) Acquiring lexical knowledge from query logs for query expansion in patent searching. In: 2012 IEEE sixth international conference on semantic computing (ICSC), IEEE, pp 336–338
Tannebaum W, Rauber A (2012b) Analyzing query logs of USPTO examiners to identify useful query terms in patent documents for query expansion in patent searching: a preliminary study. In: Multidisciplinary information retrieval, Springer, pp 127–136
Tannebaum W, Rauber A (2013) Mining query logs of uspto patent examiners. In: Information access evaluation. Multilinguality, multimodality, and visualization, Springer, pp 136–142
Tannebaum W, Rauber A (2014) Using query logs of uspto patent examiners for automatic query expansion in patent searching. Inf Retr 17(5–6):452–470
Tannebaum W, Rauber A (2015) Patnet: a lexical database for the patent domain. In: Advances in information retrieval, Springer, pp 550–555
Tannebaum W, Mahdabi P, Rauber A (2015) Effect of log-based query term expansion on retrieval effectiveness in patent searching. In: Experimental IR meets multilinguality, multimodality, and interaction, Springer, pp 300–305
Trajtenberg M (1990) A penny for your quotes: patent citations and the value of innovations. Rand J Econ 21(1):172–187
Verberne S, D’hondt E (2009) Prior art retrieval using the claims section as a bag of words. In: Multilingual information access evaluation I: text retrieval experiments. Springer, pp 497–501
Verma M, Varma V (2011) Patent search using IPC classification vectors. In: Proceedings of the 4th workshop on patent information retrieval, ACM, pp 9–12
Voorhees EM (1998) Using wordnet for text retrieval. Fellbaum (Fellbaum, 1998) pp 285–303
Wajda J, Zadrozny W (2016) Challenging problems and solutions in intelligent systems. In: Chap prior-art relevance ranking based on the examiner’s query log content, Springer International Publishing, Cham, pp 323–333. https://doi.org/10.1007/978-3-319-30165-5_15
Wanagiri MZ, Adriani M (2010) Prior art retrieval using various patent document fields contents. In: CLEF (Notebook Papers/LABs/Workshops)
Wang F, Lin L (2015) Query construction based on concept importance for effective patent retrieval. In: 2015 12th international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, pp 1455–1459
Wang S, Lei Z, Lee WC (2014) Exploring legal patent citations for patent valuation. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, ACM, pp 1379–1388
Xue X, Croft WB (2009) Transforming patents into prior-art queries. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, ACM, pp 808–809
Acknowledgements
This work was supported by the National Science Foundation (Grant No. 1624035). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shalaby, W., Zadrozny, W. Patent retrieval: a literature review. Knowl Inf Syst 61, 631–660 (2019). https://doi.org/10.1007/s10115-018-1322-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1322-7