Abstract
An essential matter in the knowledge-based information society is how to extract useful information quickly from a large volume of literature. Since most existing data mining frameworks deal with structured input data, many limitations are faced in analyzing unstructured scientific literature and extracting new information. This study proposes a scientific-knowledge processing framework, which offers high performance by using grid computing technology for extracting important entities and their relations from the scientific literature. Since the grid computing provides a large volume of data storage and high-speed computing, the proposed framework can efficiently analyze the massive body of scientific literature and process knowledge. The workflow tool that we have developed for the proposed framework enables users to easily design and execute complicated applications that consist of complicated scientific-knowledge processes. The experimental results showed that the proposed framework reduced working time by approximately 83 % when the number of running nodes was assigned in accordance with the workload ratio of each step in scientific-knowledge processes. As a result, it is possible to effectively process a large volume of scientific literature by flexibly adjusting the number of computing nodes that constitute the grid environment as the number of documents for processing increases.
Similar content being viewed by others
References
Alsairafi S, Emmanouil F, Ghanem M, Giannadakis N, Guo Y, Kalaitzopoulos D, Osmond M, Rowe A, Syed J, Wendel P (2003) The design of discovery net: towards open grid services for knowledge discovery. Int J High Perform Comput Appl 17(3):297–315
Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S (2004) Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management: 423–424
Brezany P, Janciak I, Tjoa A (2005) GridMiner: a fundamental infrastructure for building intelligent grid systems. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence: 150–156
Choi S, Myaeng S (2010) Simplicity is better: revisiting single kernel PPI extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics
Chun H, Jeong C, Song S, Choi Y, Choi S, Sung W (2011) Relation extraction based on composite kernel combining pattern similarity of predicate-argument structure. In: Proceedings of U-and E-Service, Science and Technology: 269–273
Congiusta A, Talia D, Trunfio P (2007) Service-oriented middleware for distributed data mining on the grid. J Parallel Distrib Comput 68(1):3–15
Goble C, Wroe C, Stevens R (2003) The myGrid project: services, architecture and demonstrator. In: Proceedings of UK e-Science All Hands Meeting: 595–603
Harrison A, Wang I, Taylor I, Shields M (2007) WS-RF workflow in Triana. International Journal of High Performance Computing Applications Special Issue on Workflow Systems in Grid Environments
Hull D, Wolstencroft K, Stevens R, Goble C, Pocock M, Li P, Oinn T (2006) Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34(Web Server issue):729–732
Le-Khac N, Kechadi T, Carthy J (2006) ADMIRE framework: distributed data mining on data grid platforms. In: Proceedings of the 1st International Conference on Software and Data Technologies: 67–72
Song S, Choi Y, Chun H, Jeong C, Choi S, Sung W (2011) Multi-words terminology recognition using web search. In: Proceedings of U-and E-Service, Science and Technology: 233–238
Stankovski V, Trnkoczy J, Swain M, Dubitzky W, Kravtsov V, Schuster A, Niessen T, Wegener D, May M, Rohm M, Franke J (2008) Digging deep into the data mine with DataMiningGrid. IEEE Internet Comput 12(6):69–76
Talia D, Trunfio P (2007) How distributed data mining tasks can thrive as services on Grids. In: Proceedings of National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation
Talia D, Trunfio P (2010) How distributed data mining tasks can thrive as knowledge services. Commun ACM 53(7):132–137
Talia D, Trunfio P, Verta O (2008) The Weka4WS framework for distributed data mining in service-oriented Grids. Concurrency Comput Pract Ex 20(16):1933–1951
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jeong, CH., Choi, YS., Chun, HW. et al. Grid-based framework for high-performance processing of scientific knowledge. Multimed Tools Appl 71, 783–798 (2014). https://doi.org/10.1007/s11042-013-1411-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1411-2