article

Automated categorization in the international patent classification

Authors:

A. Törcsvári,

G. KaretkaAuthors Info & Claims

ACM SIGIR Forum, Volume 37, Issue 1

Pages 10 - 25

https://doi.org/10.1145/945546.945547

Published: 01 April 2003 Publication History

Abstract

A new reference collection of patent documents for training and testing automated categorization systems is established and described in detail. This collection is tailored for automating the attribution of international patent classification codes to patent applications and is made publicly available for future research work. We report the results of applying a variety of machine learning algorithms to the automated categorization of English-language patent documents. This procedure involves a complex hierarchical taxonomy, within which we classify documents into 114 classes and 451 subclasses. Several measures of categorization success are described and evaluated. We investigate how best to resolve the training problems related to the attribution of multiple classification codes to each patent document.

References

[1]

S. Adams. Using the International Patent Classification in an online environment, World Patent Information 22, 291--300, 2000.

[2]

J. Calvert and M. Makarov. The reform of the IPC, World Patent Information 23, 133--136, 2001.

[3]

A. J. Carlson, C. M. Cumby, J. L. Rosen and D. Roth. SNoW User's Guide, UIUC Tech. Report UIUC-DCS-R-99-210, 1999.

Digital Library

[4]

S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Using taxonomy, discriminants, and signatures for navigating in text databases, proceedings of 23rd VLDB conference, 1997.

Digital Library

[5]

S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies, VLDB Journal 7, 163--178, 1998.

Digital Library

[6]

S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks, Proc. SIGMOD98, ACM International Conference on Management of Data, ACM Press, New York, 307--318, 1998.

Digital Library

[7]

F. C. Gey, M. Buckland, C. Chen, and R. Larson. Entry Vocabulary--a Technology to Enhance Digital Search, in Proceedings of the First International Conference on Human Language Technology, San Diego, pp 91--95, 2001.

Digital Library

[8]

D. Hull, S. Aït-Mokhtar, M. Chuat, A. Eisele, E. Gaussier, G. Grefenstette, P. Isabelle, C. Samuelsson, and F. Segond. Language technologies and patent search and classification, World Patent Information 23, 265--268, 2001.

[9]

K. Kakimoto. Intellectual Property Cooperation Center, personal communication, 2003.

[10]

N. Kando. What shall we evaluate? Preliminary discussion for the NTCIR Patent IR Challenge based on the brainstorming with the specialized intermediaries in patent searching and patent attorneys, Proc. ACM-SIGIR Workshop on Patent Retrieval, (pp. 37--42). Athens, Greece, July 2000.

[11]

C. H. A. Koster, M. Seutter, and J. Beney. Classifying Patent Applications with Winnow, Proc. Benelearn 2001 conf., Antwerpen, 2001.

[12]

T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, J., Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection, IEEE transactions on neural networks 11 (3), 574--585, 2000.

Digital Library

[13]

M. Krier and F. Zaccà. Automatic categorization applications at the European patent office, World Patent Information 24, 187--196, 2002.

[14]

L. S. Larkey. Some Issues in the Automatic Classification of U.S. patents, Working Notes for the Workshop on Learning for Text Categorization, 15th Nat. Conf. on Artif. Intell. (AAAi-98), Madison, Wisconsin, 1998.

[15]

L. S. Larkey. A Patent Search and Classification System, Proc. DL-99, 4th ACM Conference on Digital Libraries, 179--187, 1999.

Digital Library

[16]

D. D. Lewis, Y. Yang, T. Rose, F. Li. RCV1: A New Benchmark Collection for Text Categorization Research, to appear in J. Machine Learning Research, 2003.

Digital Library

[17]

A. K. McCallum (1996) Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering, www.cs.cmu.edu/~mccallum/bow.

[18]

H. Smith. Automation of patent classification, World Patent Information 24, 269--271, 2002.

[19]

T. Vachon, N. Grandjean, and P. Parisot. Interactive Exploration of Patent Data for Competitive Intelligence: Applications in Ulix (Novartis Knowledge Miner), Proc. Int. Chem. Inform. Conf., Nimes, France, October 2001.

[20]

WIPO. International Patent Classification: Guide, Survey of Classes and Summary of Main Groups, Seventh Edition, Volume 9, World Intellectual Property Organization, Geneva, 1999.

Cited By

Lee YKim JLee H(2025)A Methodology for Patent Classification through Bigbird-Pegasus Based Claim Abstractive SummarizationJournal of the Korean Institute of Industrial Engineers10.7232/JKIIE.2025.51.1.06151:1(61-72)Online publication date: 15-Feb-2025
https://doi.org/10.7232/JKIIE.2025.51.1.061
Kamateri ESalampasis M(2025)An Ensemble Framework for Text ClassificationInformation10.3390/info1602008516:2(85)Online publication date: 23-Jan-2025
https://doi.org/10.3390/info16020085
Yücesoy Kahraman SDurmuşoğlu ADereli T(2024)Ön eğitimli Bert modeli ile patent sınıflandırılmasıGazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi10.17341/gazimmfd.129254339:4(2484-2496)Online publication date: 20-May-2024
https://doi.org/10.17341/gazimmfd.1292543
Show More Cited By

Index Terms

Automated categorization in the international patent classification
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

A three-phase method for patent classification

An automatic patent categorization system would be invaluable to individual inventors and patent attorneys, saving them time and effort by quickly identifying conflicts with existing patents. In recent years, it has become more and more common to ...
Automated Single-Label Patent Classification using Ensemble Classifiers
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

Many thousands of patent applications arrive at patent offices around the world every day. One important task when a patent application is submitted is to assign one or more classification codes from the complex and hierarchical patent classification ...
Interactive overlay maps for US patent (USPTO) data based on International Patent Classification (IPC)

We report on the development of an interface to the US Patent and Trademark Office (USPTO) that allows for the mapping of patent portfolios as overlays to basemaps constructed from citation relations among all patents contained in this database during ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGIR Forum

ACM SIGIR Forum Volume 37, Issue 1

Spring 2003

43 pages

ISSN:0163-5840

DOI:10.1145/945546

Issue’s Table of Contents

Copyright © 2003 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2003

Published in SIGIR Volume 37, Issue 1

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

159
Total Citations
View Citations
2,330
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)15

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee YKim JLee H(2025)A Methodology for Patent Classification through Bigbird-Pegasus Based Claim Abstractive SummarizationJournal of the Korean Institute of Industrial Engineers10.7232/JKIIE.2025.51.1.06151:1(61-72)Online publication date: 15-Feb-2025
https://doi.org/10.7232/JKIIE.2025.51.1.061
Kamateri ESalampasis M(2025)An Ensemble Framework for Text ClassificationInformation10.3390/info1602008516:2(85)Online publication date: 23-Jan-2025
https://doi.org/10.3390/info16020085
Yücesoy Kahraman SDurmuşoğlu ADereli T(2024)Ön eğitimli Bert modeli ile patent sınıflandırılmasıGazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi10.17341/gazimmfd.129254339:4(2484-2496)Online publication date: 20-May-2024
https://doi.org/10.17341/gazimmfd.1292543
Wang ZLiu Y(2024)SEA-PSJournal of Information Science10.1177/0165551522110665150:4(831-850)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1177/01655515221106651
Zhang JRen HGuo SSun J(2024)Multi-model Collaboration and Prompt-driven Patent Classification MethodsProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690464(332-336)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3690407.3690464
Zhang BWang NShao YNiu Z(2024)Hierarchy-aware BERT-GCN Dual-Channel Global Model for Hierarchical Text Classification2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498363(820-825)Online publication date: 19-Jan-2024
https://doi.org/10.1109/NNICE61279.2024.10498363
Zhao XWu WWu D(2024)Technological trajectory analysis in lithium battery manufacturing: Based on patent claims perspectiveJournal of Energy Storage10.1016/j.est.2024.11289498(112894)Online publication date: Sep-2024
https://doi.org/10.1016/j.est.2024.112894
Lu YChen LTong XPeng YZhu H(2024)Research on cross-lingual multi-label patent classification based on pre-trained modelScientometrics10.1007/s11192-024-05024-0129:6(3067-3087)Online publication date: 6-May-2024
https://doi.org/10.1007/s11192-024-05024-0
Jo TJo T(2024)Text Categorization: Conceptual ViewText Mining10.1007/978-3-031-75976-5_5(81-102)Online publication date: 8-Oct-2024
https://doi.org/10.1007/978-3-031-75976-5_5
Bhutto A(2023)Second Component of the LCIConverting Ideas to Innovation With Lean Canvas for Invention10.4018/978-1-6684-8341-1.ch003(25-37)Online publication date: 13-Oct-2023
https://doi.org/10.4018/978-1-6684-8341-1.ch003
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents