Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Automated categorization in the international patent classification

Published: 01 April 2003 Publication History

Abstract

A new reference collection of patent documents for training and testing automated categorization systems is established and described in detail. This collection is tailored for automating the attribution of international patent classification codes to patent applications and is made publicly available for future research work. We report the results of applying a variety of machine learning algorithms to the automated categorization of English-language patent documents. This procedure involves a complex hierarchical taxonomy, within which we classify documents into 114 classes and 451 subclasses. Several measures of categorization success are described and evaluated. We investigate how best to resolve the training problems related to the attribution of multiple classification codes to each patent document.

References

[1]
S. Adams. Using the International Patent Classification in an online environment, World Patent Information 22, 291--300, 2000.
[2]
J. Calvert and M. Makarov. The reform of the IPC, World Patent Information 23, 133--136, 2001.
[3]
A. J. Carlson, C. M. Cumby, J. L. Rosen and D. Roth. SNoW User's Guide, UIUC Tech. Report UIUC-DCS-R-99-210, 1999.
[4]
S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Using taxonomy, discriminants, and signatures for navigating in text databases, proceedings of 23rd VLDB conference, 1997.
[5]
S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies, VLDB Journal 7, 163--178, 1998.
[6]
S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks, Proc. SIGMOD98, ACM International Conference on Management of Data, ACM Press, New York, 307--318, 1998.
[7]
F. C. Gey, M. Buckland, C. Chen, and R. Larson. Entry Vocabulary--a Technology to Enhance Digital Search, in Proceedings of the First International Conference on Human Language Technology, San Diego, pp 91--95, 2001.
[8]
D. Hull, S. Aït-Mokhtar, M. Chuat, A. Eisele, E. Gaussier, G. Grefenstette, P. Isabelle, C. Samuelsson, and F. Segond. Language technologies and patent search and classification, World Patent Information 23, 265--268, 2001.
[9]
K. Kakimoto. Intellectual Property Cooperation Center, personal communication, 2003.
[10]
N. Kando. What shall we evaluate? Preliminary discussion for the NTCIR Patent IR Challenge based on the brainstorming with the specialized intermediaries in patent searching and patent attorneys, Proc. ACM-SIGIR Workshop on Patent Retrieval, (pp. 37--42). Athens, Greece, July 2000.
[11]
C. H. A. Koster, M. Seutter, and J. Beney. Classifying Patent Applications with Winnow, Proc. Benelearn 2001 conf., Antwerpen, 2001.
[12]
T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, J., Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection, IEEE transactions on neural networks 11 (3), 574--585, 2000.
[13]
M. Krier and F. Zaccà. Automatic categorization applications at the European patent office, World Patent Information 24, 187--196, 2002.
[14]
L. S. Larkey. Some Issues in the Automatic Classification of U.S. patents, Working Notes for the Workshop on Learning for Text Categorization, 15th Nat. Conf. on Artif. Intell. (AAAi-98), Madison, Wisconsin, 1998.
[15]
L. S. Larkey. A Patent Search and Classification System, Proc. DL-99, 4th ACM Conference on Digital Libraries, 179--187, 1999.
[16]
D. D. Lewis, Y. Yang, T. Rose, F. Li. RCV1: A New Benchmark Collection for Text Categorization Research, to appear in J. Machine Learning Research, 2003.
[17]
A. K. McCallum (1996) Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering, www.cs.cmu.edu/~mccallum/bow.
[18]
H. Smith. Automation of patent classification, World Patent Information 24, 269--271, 2002.
[19]
T. Vachon, N. Grandjean, and P. Parisot. Interactive Exploration of Patent Data for Competitive Intelligence: Applications in Ulix (Novartis Knowledge Miner), Proc. Int. Chem. Inform. Conf., Nimes, France, October 2001.
[20]
WIPO. International Patent Classification: Guide, Survey of Classes and Summary of Main Groups, Seventh Edition, Volume 9, World Intellectual Property Organization, Geneva, 1999.

Cited By

View all
  • (2024)Ön eğitimli Bert modeli ile patent sınıflandırılmasıGazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi10.17341/gazimmfd.129254339:4(2484-2496)Online publication date: 20-May-2024
  • (2024)SEA-PSJournal of Information Science10.1177/0165551522110665150:4(831-850)Online publication date: 1-Aug-2024
  • (2024)Multi-model Collaboration and Prompt-driven Patent Classification MethodsProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690464(332-336)Online publication date: 21-Jun-2024
  • Show More Cited By

Index Terms

  1. Automated categorization in the international patent classification

    Recommendations

    Reviews

    Fabrizio Sebastiani

    The availability of standard benchmarks (also known as test collections) is a key factor in the progress of disciplines such as information retrieval, which are heavily based on the experimental method. Text categorization (TC), the subfield of information retrieval concerned with automatically building text classifiers from a training set of preclassified documents, is no exception; one may observe that the explosion of TC research in the mid-1990s closely followed the appearance of TC benchmarks, such as Reuters-21578 and OHSUMED. This paper announces the availability of World Intellectual Property Organization (WIPO)-alpha, a new benchmark for patent classification (the task of automatically classifying patent descriptions under a taxonomy of patent classes), and discusses a set of TC experiments performed on this benchmark using a set of off-the-shelf TC packages. WIPO-alpha contains about 75,000 documents, classified under a subset of the International Patent Classification (IPC) taxonomy consisting of about 100 broad categories and 450 finer-grained ones. The reported experiments say nothing novel about the comparative performance of different TC systems. For example, the fact that support vector machines tend to outperform all other classification methods just confirms a fact well known in TC. Nevertheless, the discussion of the newly available test collection is indeed interesting and worthwhile. Patent classification is an important application of TC, since the accuracy of classification is of critical importance in this case, and the task is a hard one, since patent applicants often try to disguise the lack of novelty underlying their claimed inventions by the use of nonstandard language, which puts text analysis software under added strain. This paper also reveals that there are some nonstandard aspects of patent classification that had not previously been considered by TC research, such as the fact that a document may have a primary category and several secondary categories; this may call for the definition of new measures of what accuracy means. The availability of this new benchmark is likely going to encourage research into patent classification, and this paper will be an important reference for this field. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGIR Forum
    ACM SIGIR Forum  Volume 37, Issue 1
    Spring 2003
    43 pages
    ISSN:0163-5840
    DOI:10.1145/945546
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 April 2003
    Published in SIGIR Volume 37, Issue 1

    Check for updates

    Author Tags

    1. IPC taxonomy
    2. automated categorization
    3. patent
    4. support vector machines

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)93
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Ön eğitimli Bert modeli ile patent sınıflandırılmasıGazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi10.17341/gazimmfd.129254339:4(2484-2496)Online publication date: 20-May-2024
    • (2024)SEA-PSJournal of Information Science10.1177/0165551522110665150:4(831-850)Online publication date: 1-Aug-2024
    • (2024)Multi-model Collaboration and Prompt-driven Patent Classification MethodsProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690464(332-336)Online publication date: 21-Jun-2024
    • (2024)Hierarchy-aware BERT-GCN Dual-Channel Global Model for Hierarchical Text Classification2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498363(820-825)Online publication date: 19-Jan-2024
    • (2024)Technological trajectory analysis in lithium battery manufacturing: Based on patent claims perspectiveJournal of Energy Storage10.1016/j.est.2024.11289498(112894)Online publication date: Sep-2024
    • (2024)Research on cross-lingual multi-label patent classification based on pre-trained modelScientometrics10.1007/s11192-024-05024-0129:6(3067-3087)Online publication date: 6-May-2024
    • (2023)Second Component of the LCIConverting Ideas to Innovation With Lean Canvas for Invention10.4018/978-1-6684-8341-1.ch003(25-37)Online publication date: 13-Oct-2023
    • (2023)IPC prediction of patent documents using neural network with attention for hierarchical structurePLOS ONE10.1371/journal.pone.028236118:3(e0282361)Online publication date: 2-Mar-2023
    • (2023)HmcNet: A General Approach for Hierarchical Multi-Label ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320751135:9(8713-8728)Online publication date: 1-Sep-2023
    • (2023)Hierarchical Multi-label Classifier Based on Transformer Encoder for Grassroots Social Network Governance2023 3rd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT)10.1109/ICFEICT59519.2023.00033(135-141)Online publication date: May-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media