Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Patent retrieval: a literature review

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

With the ever increasing number of filed patent applications every year, the need for effective and efficient systems for managing such tremendous amounts of data becomes inevitably important. Patent retrieval (PR) is considered the pillar of almost all patent analysis tasks. PR is a subfield of information retrieval (IR) which is concerned with developing techniques and methods that effectively and efficiently retrieve relevant patent documents in response to a given search request. In this paper, we present a comprehensive review on PR methods and approaches. It is clear that recent successes and maturity in IR applications such as Web search cannot be transferred directly to PR without deliberate domain adaptation and customization. Furthermore, state-of-the-art performance in automatic PR is still around average in terms of recall. These observations motivate the need for interactive search tools which provide cognitive assistance to patent professionals with minimal effort. These tools must also be developed in hand with patent professionals considering their practices and expectations. We additionally touch on related tasks to PR such as patent valuation, litigation, licensing, and highlight potential opportunities and open directions for computational scientists in these domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.uspto.gov/.

  2. http://www.epo.org/.

  3. http://www.wipo.int/portal/en/index.html.

  4. http://www.clef-initiative.eu/.

  5. http://www.ir-facility.org/.

  6. http://www.ifs.tuwien.ac.at/imp/marec.shtml.

  7. http://research.nii.ac.jp/ntcir.

  8. https://sites.google.com/site/patentdataproject/.

  9. http://rosencrantz.berkeley.edu/batchsql/.

  10. http://portal.uspto.gov/pair/PublicPair.

  11. https://www.google.com/googlebooks/uspto-patents.html.

  12. https://www.uspto.gov/learning-and-resources/bulk-data-products.

  13. It is required to achieve 100% recall at acceptable precision.

  14. http://www.uspto.gov/patents-application-process/appealing-patent-decisions/trials/inter-partes-review.

  15. http://www.uspto.gov/patents-application-process/appealing-patent-decisions/trials/post-grant-review.

  16. https://ptabtrials.uspto.gov.

References

  1. Al-Shboul B, Myaeng SH (2014) Wikipedia-based query phrase expansion in patent class search. Inf Retr 17(5–6):430–451

    Article  Google Scholar 

  2. Allison JR, Lemley MA, Schwartz DL (2013) Understanding the realities of modern patent litigation. Tex L Rev 92:1769

    Google Scholar 

  3. Allison JR, Lemley MA, Schwartz DL (2015) Our divided patent system. Univ Chic Law Rev 82(3):1073–1154

    Google Scholar 

  4. Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York

    Google Scholar 

  5. Bashir S, Rauber A (2010) Improving retrievability of patents in prior-art search. In: Advances in information retrieval, Springer, pp 457–470

  6. Bouadjenek MR, Sanner S, Ferraro G (2015) A study of query reformulation for patent prior art search with partial patent applications. In: Proceedings of the 15th international conference on artificial intelligence and law, ACM, pp 23–32

  7. Brin S, Page L (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833

    Article  Google Scholar 

  8. Cao G, Nie JY, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 243–250

  9. Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 335–336

  10. Chen Y, Spangler S, Kreulen J, Boyer S, Griffin TD, Alba A, Behal A, He B, Kato L, Lelescu A, et al (2009) Simple: a strategic information mining platform for licensing and execution. In: IEEE international conference on data mining workshops, 2009. ICDMW’09, IEEE, pp 270–275

  11. Chen YL, Chiu YT (2011) An IPC-based vector space model for patent retrieval. Inf Process Manag 47(3):309–322

    Article  Google Scholar 

  12. Cormack GV, Grossman MR (2014) Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 153–162

  13. Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 299–306

  14. Czarnitzki D, Hussinger K, Leten B (2011) The market value of blocking patent citations. ZEW - Zentrum für Europäische Wirtschaftsforschung/Center for European Economic Research

  15. D’hondt E, Verberne S (2010) Clef-ip 2010: Prior art retrieval using the different sections in patent documents. In: CLEF (Notebook Papers/LABs/Workshops)

  16. Eisinger D, Tsatsaronis G, Bundschus M, Wieneke U, Schroeder M (2013) Automated patent categorization and guided patent search using IPC as inspired by mesh and pubmed. J Biomed Semant 4(1):1

    Article  Google Scholar 

  17. Fafalios P, Tzitzikas Y (2014) Exploratory professional search through semantic post-analysis of search results. In: Professional search in the modern world, Springer, pp 166–192

  18. Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books, Cambridge

    Book  MATH  Google Scholar 

  19. Fujii A (2007) Enhancing patent retrieval by citation analysis. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 793–794

  20. Fujii A, Iwayama M, Kando N (2004) Overview of patent retrieval task at ntcir-4. In: NTCIR

  21. Fujii A, Iwayama M, Kando N (2005) Overview of patent retrieval task at ntcir-5. In: In Proceedings of the fifth NTCIR workshop meeting on evaluation of information access technologies: information retrieval, question answering and cross-lingual information access, pp 269–277

  22. Fujii A, Iwayama M, Kando N (2007) Overview of the patent retrieval task at the ntcir-6 workshop. In: NTCIR

  23. Ganguly D, Leveling J, Magdy W, Jones GJ (2011) Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp 1953–1956

  24. Giachanou A, Salampasis M, Paltoglou G (2015) Multilayer source selection as a tool for supporting patent search and classification. Inf Retr J 18(6):559–585

    Article  Google Scholar 

  25. Gobeill J, Pasche E, Teodoro D, Ruch P (2009) Simple pre and post processing strategies for patent searching in CLEF intellectual property track 2009. In: Multilingual information access evaluation I: text retrieval experiments, Springer, pp 444–451

  26. Golestan Far M, Sanne S, Bouadjenek MR, Ferraro G, Hawking D (2015) On term selection techniques for patent prior art search. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 803–806

  27. Graf E, Azzopardi L (2008) A methodology for building a patent test collection for prior art search. In: Proceedings of the 2nd international workshop on evaluating information access, EVIA

  28. Grossman MR, Cormack GV (2011) Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Rich JL & Tech 17:11–16

    Google Scholar 

  29. Hall BH, Jaffe A, Trajtenberg M (2005) Market value and patent citations. RAND J Econ 36(1):16–38

    Google Scholar 

  30. Harbert T (2013) The law machine. Spectrum 50(11):31–54

    Article  Google Scholar 

  31. Harhoff D, Narin F, Scherer FM, Vopel K (1999) Citation frequency and the value of patented inventions. Rev Econ Stat 81(3):511–515

    Article  Google Scholar 

  32. Harris CG, Foster S, Arens R, Srinivasan P (2009) On the role of classification in patent invalidity searches. In: Proceedings of the 2nd international workshop on patent information retrieval, ACM, pp 29–32

  33. Harris CG, Arens R, Srinivasan P (2010) Comparison of ipc and uspc classification systems in patent prior art searches. In: Proceedings of the 3rd international workshop on patent information retrieval, ACM, pp 27–32

  34. Hasan MA, Spangler WS, Griffin T, Alba A (2009) COA: finding novel patents through text analysis. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1175–1184

  35. Hido S, Suzuki S, Nishiyama R, Imamichi T, Takahashi R, Nasukawa T, Idé T, Kanehira Y, Yohda R, Ueno T et al (2012) Modeling patent quality: a system for large-scale patentability analysis using text mining. Inf Med Technol 7(3):1180–1191

    Google Scholar 

  36. Hiemstra D, Robertson S, Zaragoza H (2004) Parsimonious language models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 178–185

  37. Hu P, Huang M, Xu P, Li W, Usadi AK, Zhu X (2012) Finding nuggets in IP portfolios: core patent mining through textual temporal analysis. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, pp 1819–1823

  38. Iwayama M, Fujii A, Kando N, Takano A (2003) Overview of patent retrieval task at NTCIR-3. In: Proceedings of the ACL-2003 workshop on Patent corpus processing, vol 20, association for computational linguistics, pp 24–32

  39. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446

    Article  Google Scholar 

  40. Jin X, Spangler S, Chen Y, Cai K, Ma R, Zhang L, Wu X, Han J (2011) Patent maintenance recommendation with patent information network model. In: 2011 IEEE 11th international conference on data mining (ICDM), IEEE, pp 280–289

  41. Jürgens JJ, Hansen P, Womser-Hacker C (2012) Going beyond CLEF-IP: the reality for patent searchers? In: Information access evaluation. Multilinguality, multimodality, and visual analytics, Springer, pp 30–35

  42. Kim J, Kang IS, Lee JH (2006) Cluster-based patent retrieval using international patent classification system. In: Computer processing of oriental languages. Beyond the orient, the research challenges ahead, Springer, pp 205–212

  43. Konishi K (2005) Query terms extraction from patent document for invalidity search. In: NTCIR

  44. Krestel R, Smyth P (2013) Recommending patents based on latent topics. In: Proceedings of the 7th ACM conference on Recommender systems, ACM, pp 395–398

  45. Lanjouw JO, Pakes A, Putnam J (1998) How to count patents and value intellectual property: the uses of patent renewal and application data. J Ind Econ 46(4):405–432

    Article  Google Scholar 

  46. Liu S, Liu F, Yu C, Meng W (2004) An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 266–272

  47. Liu Y, Hseuh Py, Lawrence R, Meliksetian S, Perlich C, Veen A (2011) Latent graphical models for quantifying and predicting patent quality. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1145–1153

  48. Lopez P, Romary L (2009) Multiple retrieval models and regression models for prior art search. In: CLEF 2009 workshop

  49. Lopez P, Romary L (2010) Experiments with citation mining and key-term extraction for prior art search. In: CLEF 2010-conference on multilingual and multimodal information access evaluation

  50. Lupu M, Huang J, Zhu J, Tait J (2009) Trec-chem: large scale chemical information retrieval evaluation at trec. In: ACM SIGIR forum, ACM, vol 43, pp 63–70

  51. Lupu M, Tait J, Huang J, Zhu J (2010) Trec-chem 2010: notebook report. Proc TREC 2010:2

    Google Scholar 

  52. Lupu M, Mayer K, Tait J, Trippe AJ (2011a) Current challenges in patent information retrieval, vol 29. Springer, Berlin

    Book  Google Scholar 

  53. Lupu M, Zhao J, Huang J, Gurulingappa H, Fluck J, Zimmermann M, Filippov IV, Tait J (2011b) Overview of the TREC 2011 chemical IR track. In: TREC

  54. Lv Y, Zhai C (2009) Adaptive relevance feedback in information retrieval. In: Proceedings of the 18th ACM conference on Information and knowledge management, ACM, pp 255–264

  55. Magdy W, Jones GJF (2010) Applying the KISS principle for the CLEF-IP 2010 prior art candidate patent search task. Dublin City University, CLEF labs

    Google Scholar 

  56. Magdy W, Jones GJ (2010b) Pres: a score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 611–618

  57. Magdy W, Jones GJ (2011) A study on query expansion methods for patent retrieval. In: Proceedings of the 4th workshop on Patent information retrieval, ACM, pp 19–24

  58. Magdy W, Leveling J, Jones GJ (2009) Exploring structured documents and query formulation techniques for patent retrieval. In: Multilingual information access evaluation I: text retrieval experiments, Springer, pp 410–417

  59. Magdy W, Lopez P, Jones GJ (2011) Simple vs. sophisticated approaches for patent prior-art search. In: Advances in information retrieval, Springer, pp 725–728

  60. Mahdabi P, Crestani F (2012) Learning-based pseudo-relevance feedback for patent retrieval. In: Multidisciplinary information retrieval, Springer, pp 1–11

  61. Mahdabi P, Crestani F (2014a) The effect of citation analysis on query expansion for patent retrieval. Inf Retr 17(5–6):412–429

    Article  Google Scholar 

  62. Mahdabi P, Crestani F (2014b) Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Trans Inf Syst (TOIS) 32(4):16

    Article  Google Scholar 

  63. Mahdabi P, Crestani F (2014c) Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International conference on information and knowledge management, ACM, pp 1659–1668

  64. Mahdabi P, Keikha M, Gerani S, Landoni M, Crestani F (2011) Building queries for prior-art search. Springer, Berlin

    Book  Google Scholar 

  65. Mahdabi P, Gerani S, Huang JX, Crestani F (2013) Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 113–122

  66. Mann RJ, Underweiser M (2012) A new look at patent quality: relating patent prosecution to validity. J Empir Leg Stud 9(1):1–32

    Article  Google Scholar 

  67. Meij E, Weerkamp W, de Rijke M (2009) A query model based on normalized log-likelihood. In: Proceedings of the 18th ACM conference on Information and knowledge management, ACM, pp 1903–1906

  68. NTCIR (2015) NTCIR test collections. http://research.nii.ac.jp/ntcir/permission/data-en.htm, http://research.nii.ac.jp/ntcir/permission/data-en.htm. Accessed 30 Apr 2016

  69. Osbeck MK (2015) Using data analytics tools to supplement traditional research and analysis in forecasting case outcomes. U of Michigan Public Law Research Paper Series (446)

  70. Osborn M, Strzalkowski T, Marinescu M (1997) Evaluating document retrieval in patent database: a preliminary report. In: Proceedings of the sixth international conference on Information and knowledge management, ACM, pp 216–221

  71. Piroi F, Lupu M, Hanbury A, Sexton AP, Magdy W, Filippov IV (2010) CLEF-IP 2010: retrieval experiments in the intellectual property domain. In: CLEF (notebook papers/labs/workshops)

  72. Piroi F, Lupu M, Hanbury A, Zenz V (2011) CLEF-IP 2011: retrieval in the intellectual property domain. In: CLEF (notebook papers/labs/workshop), Citeseer

  73. Piroi F, Lupu M, Hanbury A, Magdy W, Sexton A, Filippov I (2012) CLEF-IP 2012: retrieval experiments in the intellectual property domain, vol 1178, CEUR-WS

  74. Piroi F, Lupu M, Hanbury A (2013) Information access evaluation. In: Proceedings of CLEF 2013 4th international conference of the CLEF initiative multilinguality, multimodality, and visualization, Valencia, Spain, September 23–26, 2013, Springer, Berlin, chap Overview of CLEF-IP 2013 Lab, pp 232–249. https://doi.org/10.1007/978-3-642-40802-1_25

  75. Rajshekhar K, Shalaby W, Zadrozny W (2016) Analytics in post-grant patent review: possibilities and challenges (preliminary report). In: Proceedings of the American society for engineering management 2016 international annual conference

  76. Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. NIST Spec Publ SP 109:109

    Google Scholar 

  77. Roda G, Tait J, Piroi F, Zenz V (2010) Multilingual information access evaluation I. Text retrieval experiments: 10th workshop of the cross-language evaluation forum, CLEF 2009, Corfu, Greece, September 30–October 2, 2009, Revised selected papers, Springer, Berlin, chap CLEF-IP 2009: retrieval experiments in the intellectual property domain, pp 385–409. https://doi.org/10.1007/978-3-642-15754-7_47

  78. Salampasis M, Hanbury A (2014) Perfedpat: an integrated federated system for patent search. World Pat Inf 38:4–11

    Article  Google Scholar 

  79. Salampasis M, Giachanou A, Hanbury A (2014) An evaluation of an interactive federated patent search system. In: Multidisciplinary information retrieval, Springer, pp 120–131

  80. Salton G (1971) The SMART retrieval system-experiments in automatic document processing. Prentice-Hall Inc, Upper Saddle River

    Google Scholar 

  81. Schwartz DL, Sichelman TM (2015) Data sources on patents, copyrights, trademarks, and other intellectual property. Copyrights, Trademarks, and Other Intellectual Property (August 17, 2015) 2

  82. Shalaby W, Zadrozny W (2015) Measuring semantic relatedness using mined semantic analysis. arXiv preprint arXiv:1512.03465

  83. Shalaby W, Zadrozny W (2016) Innovation analytics using mined semantic analysis. In: Proceedings of the 29th international FLAIRS conference

  84. Shalaby W, Rajshekhar K, Zadrozny W (2016) A visual semantic framework for innovation analytics. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI-16). http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12303/12306

  85. Spangler S, Chen Y, Kreulen J, Boyer S, Griffin T, Alba A, Kato L, Lelescu A, Yan S (2010) Simple: interactive analytics on patent data. In: 2010 IEEE international conference on data mining workshops (ICDMW), IEEE, pp 426–433

  86. Spangler S, Ying C, Kreulen J, Boyer S, Griffin T, Alba A, Kato L, Lelescu A, Yan S (2011) Exploratory analytics on patent data sets using the simple platform. World Pat Inf 33(4):328–339

    Article  Google Scholar 

  87. Tannebaum W, Rauber A (2012a) Acquiring lexical knowledge from query logs for query expansion in patent searching. In: 2012 IEEE sixth international conference on semantic computing (ICSC), IEEE, pp 336–338

  88. Tannebaum W, Rauber A (2012b) Analyzing query logs of USPTO examiners to identify useful query terms in patent documents for query expansion in patent searching: a preliminary study. In: Multidisciplinary information retrieval, Springer, pp 127–136

  89. Tannebaum W, Rauber A (2013) Mining query logs of uspto patent examiners. In: Information access evaluation. Multilinguality, multimodality, and visualization, Springer, pp 136–142

  90. Tannebaum W, Rauber A (2014) Using query logs of uspto patent examiners for automatic query expansion in patent searching. Inf Retr 17(5–6):452–470

    Article  Google Scholar 

  91. Tannebaum W, Rauber A (2015) Patnet: a lexical database for the patent domain. In: Advances in information retrieval, Springer, pp 550–555

  92. Tannebaum W, Mahdabi P, Rauber A (2015) Effect of log-based query term expansion on retrieval effectiveness in patent searching. In: Experimental IR meets multilinguality, multimodality, and interaction, Springer, pp 300–305

  93. Trajtenberg M (1990) A penny for your quotes: patent citations and the value of innovations. Rand J Econ 21(1):172–187

    Article  Google Scholar 

  94. Verberne S, D’hondt E (2009) Prior art retrieval using the claims section as a bag of words. In: Multilingual information access evaluation I: text retrieval experiments. Springer, pp 497–501

  95. Verma M, Varma V (2011) Patent search using IPC classification vectors. In: Proceedings of the 4th workshop on patent information retrieval, ACM, pp 9–12

  96. Voorhees EM (1998) Using wordnet for text retrieval. Fellbaum (Fellbaum, 1998) pp 285–303

  97. Wajda J, Zadrozny W (2016) Challenging problems and solutions in intelligent systems. In: Chap prior-art relevance ranking based on the examiner’s query log content, Springer International Publishing, Cham, pp 323–333. https://doi.org/10.1007/978-3-319-30165-5_15

  98. Wanagiri MZ, Adriani M (2010) Prior art retrieval using various patent document fields contents. In: CLEF (Notebook Papers/LABs/Workshops)

  99. Wang F, Lin L (2015) Query construction based on concept importance for effective patent retrieval. In: 2015 12th international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, pp 1455–1459

  100. Wang S, Lei Z, Lee WC (2014) Exploring legal patent citations for patent valuation. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, ACM, pp 1379–1388

  101. Xue X, Croft WB (2009) Transforming patents into prior-art queries. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, ACM, pp 808–809

Download references

Acknowledgements

This work was supported by the National Science Foundation (Grant No. 1624035). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Walid Shalaby.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shalaby, W., Zadrozny, W. Patent retrieval: a literature review. Knowl Inf Syst 61, 631–660 (2019). https://doi.org/10.1007/s10115-018-1322-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1322-7

Keywords

Navigation