Abstract
As a result of historical reasons and writing habits, the effects of medicine in Traditional Chinese Medicine (TCM) patents are often described using four character phrases. These four character phrases are not easily identified by the Chinese word segmentation system, thus greatly affects the results of patent analysis and mining. This paper proposes a semi-supervised learning method to collect four character effect phrases from the abstracts texts of TCM patents, which can help enrich the lexicon of Chinese word segmentation system, and also provide support for semantic patent retrieval and analysis. The experimental results show the validity of the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lupu, M., Fujii, A., Oard, D.W., Iwayama, M., Kando, N.: Patent-Related Tasks at NTCIR. Current Challenges in Patent Information Retrieval Series, vol. 37. Springer, Berlin, Heidelberg, New York (2017)
Roda, G., Tait, J., Piroi, F., Zenz, V.: CLEF-IP 2009: Retrieval Experiments in the Intellectual Property Domain. Lecture Notes in Computer Science, vol. 6241. Springer, Berlin, Heidelberg, New York (2009)
Sharma, P., Tripathi, R., Singh, V.K., Tripathi, R.C.: Automated patents search through semantic similarity. In: International Conference on Computer, Communication and Control (IC4). IEEE, Piscataway, NJ (2016)
Wang, F., Lin, L.: Domain lexicon-based query expansion for patent retrieval. In: International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, pp. 1543–1547. IEEE, Piscataway, NJ (2016)
Zhang, L., Lei, L., Tao, L.: Patent mining: a survey. ACM Sigkdd Explor. Newsl. 16(2), 1–19 (2015)
Magali, R.G.M., Juan, R.S., Zenilton, K.G., Paulo, E.M.: Automatic patent clustering using SOM and bibliographic coupling. Braz. J. Inf. Syst. 10(1), 6–18 (2017)
Shanie, T., Suprijadi, J.: Text Grouping in Patent Analysis Using Adaptive K-means Clustering Algorithm. American Institute of Physics Conference Series, vol. 1827. AIP Publishing (2017) Article ID 020041
Shamsi, F.A., Aung, Z.: Automatic patent classification by a three-phase model with document frequency matrix and boosted tree. In: 5th International Conference on Electronic Devices, Systems and Applications, pp. 1–4. IEEE, Piscataway, NJ (2017)
Li, W.Q., Li, Y., Chen, J., Hou, C.Y.: Product Functional Information Based Automatic Patent Classification: Method and Experimental Studies, Information Systems, vol. 67, pp. 71–82. Elsevier, Amsterdam (2017)
Triulzi, G., Alstott, J., Magee, C.L.: Predicting technology performance improvement rates by mining patent data. In: SSRN Electronic Journal. SSRN, Rochester, NY (2017)
Fu, T., Lei, Z., Lee, W.C.: Patent citation recommendation for examiners. In: IEEE International Conference on Data Mining, pp. 751–756. IEEE, Piscataway, NJ (2016)
Wang, F., Lin, L. F., Yang, Z.: An ontology-based automatic semantic annotation approach for patent document retrieval in product innovation design. In: Applied Mechanics and Materials, vol. 446–447, pp. 1581–1590. Trans Tech Publications Inc, Switzerland (2013)
Okamoto, M., Shan, Z., Orihara, R.: Applying information extraction for patent structure analysis. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 989–992. ACM, New York (2017)
Xu, M., Sun, F., Jiang, X.: Multi-label learning with co-training based on semi-supervised regression. In: 2014 International Conference on Security, Pattern Analysis, and Cybernetic, pp. 175–180. IEEE, Piscataway, NJ (2014)
Wang, W., Lee, X. D., Hu, A.L., Geng, G.G.: Co-training based Semi-supervised web spam detection. In: International Conference on Fuzzy Systems & Knowledge Discovery, pp. 789–793. IEEE, Piscataway, NJ (2013)
Iosifidis, V., Ntoutsi, E.: Large scale sentiment learning with limited labels. In: Acm Sigkdd International Conference on Knowledge Discovery & Data Mining, pp. 1823–1832. ACM, New York (2017)
Blum, A.: Combining labeled and unlabeled data with co-training. In: Conference on Computational Learning Theory, pp. 92–100. ACM, New York (1998)
Deng, N., Chen, X., Ruan, O., Wang, C., Ye, Z., & Tian, J.: The construction method of clue words thesaurus in Chinese patents based on iteration and self-filtering. In: International Conference on Emerging Internetworking. Springer, Berlin, Heidelberg, New York (2017)
Deng, N., Chen, X., Li, D.: Intelligent recommendation of Chinese traditional medicine patents supporting new medicine’s R&D. J. Comput. Theor. Nanosci. 13, 5907–5913 (2016)
Na, D., Xu, C.: Automatically generation and evaluation of stop words list for Chinese patents. Telkomnika 13(4), 1414–1421 (2015)
Deng, N., Chen, X., Ruan, O., Wang, C., Ye, Z., Tian, J.: PaEffExtr: a method to extract effect statements automatically from patents. In: 11th International Conference on Complex, Intelligent and Software Intensive Systems. Springer, Berlin, Heidelberg, New York (2017)
Chen, X., Deng, N.: A semi-supervised machine learning method for Chinese patent effect annotation. In: 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 243–250. IEEE
Acknowledgments
This research is supported by National Key Research and Development Program of China under grant number 2017YFC1405403, National Natural Science Foundation of China under grant number 61075059, Green Industry Technology Leding Project (product development category) of Hubei University of Technology under grant number CPYF2017008, Natural Science Foundation of Anhui Province under grant number 1708085MF161, and Key Project of Natural Science Research of Universities in Anhui under grant number KJ2015A236.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Na, D. et al. (2020). A Method of Collecting Four Character Medicine Effect Phrases in TCM Patents Based on Semi-supervised Learning. In: Barolli, L., Hussain, F., Ikeda, M. (eds) Complex, Intelligent, and Software Intensive Systems. CISIS 2019. Advances in Intelligent Systems and Computing, vol 993. Springer, Cham. https://doi.org/10.1007/978-3-030-22354-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-22354-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22353-3
Online ISBN: 978-3-030-22354-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)