Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-19T22:56:41.716Z Has data issue: false hasContentIssue false

From data to knowledge mining

Published online by Cambridge University Press:  17 April 2009

Ana Cristina Bicharra Garcia
Affiliation:
Laboratório de Documentação Ativa e Design Inteligente, Universidade Federal Fluminense, Fluminense, Brazil
Inhauma Ferraz
Affiliation:
Laboratório de Documentação Ativa e Design Inteligente, Universidade Federal Fluminense, Fluminense, Brazil
Adriana S. Vivacqua
Affiliation:
Laboratório de Documentação Ativa e Design Inteligente, Universidade Federal Fluminense, Fluminense, Brazil

Abstract

Most past approaches to data mining have been based on association rules. However, the simple application of association rules usually only changes the user's problem from dealing with millions of data points to dealing with thousands of rules. Although this may somewhat reduce the scale of the problem, it is not a completely satisfactory solution. This paper presents a new data mining technique, called knowledge cohesion (KC), which takes into account a domain ontology and the user's interest in exploring certain data sets to extract knowledge, in the form of semantic nets, from large data sets. The KC method has been successfully applied to mine causal relations from oil platform accident reports. In a comparison with association rule techniques for the same domain, KC has shown a significant improvement in the extraction of relevant knowledge, using processing complexity and knowledge manageability as the evaluation criteria.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proc. 20th Int. Conf. Very Large Data Bases, pp. 487499.Google Scholar
Barbehenn, M. (1998). A note on the complexity of Dijkstra's algorithm for graphs with weighted vertices. IEEE Transactions on Computers 47 ( 2), 263.CrossRefGoogle Scholar
Chen, C., Khoo, L.P., & Yan, W. (2003). A strategy for acquiring customer requirement patterns using laddering technique and ART2 neural network. Advanced Engineering Informatics 16 ( 3), 229240.CrossRefGoogle Scholar
Dhar, V., Chou, D., & Provost, F. (2000). Discovering interesting patterns for investment decision making with GLOWER—a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery Journal 4 ( 4), 251280.CrossRefGoogle Scholar
Etzkorn, L.H. (2006). Semantic metrics, conceptual metrics, and ontology metrics: an analysis of software quality using IR-based systems, potential applications and collaborations. Proc. Int. Conf. Software Maintenance, Philadelphia, PA.Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (Eds.). (1996). From data mining to knowledge discovery: an overview. In Advances in Knowledge Discovery and Data Mining, pp. 134. Menlo Park, CA: AAAI Press/MIT Press.Google Scholar
Gruber, T.R. (1993). A translation approach to portable ontologies. Knowledge Acquisition 5 ( 2), 199220.CrossRefGoogle Scholar
Jadhav, P., Wong, S.C., Wills, G.B., Crowder, R.M., & Shadbolt, N.R. (2007). Data mining to support engineering design decision. Proc. Workshop Semantic Web and Web 2.0 in Architectural, Product and Engineering Design Within the 6th Int. Semantic Web Conf. (ISWC), Busan, Korea.Google Scholar
Kitchenham, B., Pfleeger, S.L., & Fenton, N.E. (1995). Towards a framework for software measurement validation. IEEE Transaction on Software Engineering ( 21) 12, 929943.CrossRefGoogle Scholar
Koh, C.G., Chen, Y.F., & Liaw, C.Y. (2003). A hybrid computational strategy for identification of structural parameters. Computers and Structures 81, 107117.CrossRefGoogle Scholar
Kramer, S., & Kaidl, H. (2004). Coupling cohesion metrics for knowledge-based systems using frames and rules. ACM Transaction on Software Engineering and Methodology 3 ( 13), 332358.CrossRefGoogle Scholar
Lavrac, N., Flach, P. & Zupan, B. (1999). Rule evaluation measures: a unifying view. Proc. 9th Int. Workshop on Inductive Logic Programming (ILP '99), pp. 174185.CrossRefGoogle Scholar
Lenat, D., Guha, R.V., Pittman, K., Pratt, D., & Shepherd, M. (1990). Cyc: towards programs with common sense. Communications of the ACM 33 ( 8), 3049.CrossRefGoogle Scholar
Lenca, P., Vaillant, B., Meyer, P., & Lallich, S. (2007). Association rule interestingness measures: experimental and theoretical studies. In Quality Measures in Data Mining (Guillet, F., & Hamilton, H.J., Eds.), pp. 251276. New York: Springer.Google Scholar
Li, Z., & Ramani, K. (2007). Ontology-based design information extraction and retrieval. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 21 ( 2), 137154.CrossRefGoogle Scholar
Lo, D., Khoo, S., & Liu, C. (2007). Efficient mining of iterative patterns for software specification discovery. Proc. 13th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD '07), San Jose, CA.CrossRefGoogle Scholar
Maher, M.L., & Garza, A.G.S. (1997). Case-based reasoning in design. IEEE Expert: Intelligent Systems and Their Applications 12 ( 2), 3441.CrossRefGoogle Scholar
Modak, S.V., Kundra, T.K., & Nakra, B.C. (2002). Comparative study of model updating methods using simulated experimental data. Computers and Structures 80 ( 5), 437447.CrossRefGoogle Scholar
Orme, A.M., Yao, H., & Etzkorn, L.H. (2006). Coupling metrics for ontology-based systems. IEEE Software 23 ( 2), 102108.CrossRefGoogle Scholar
Refrat, R.M., Gero, J., & Peng, W. (2004). Using data mining for improving building life cycle. Proc. 8th Pacific–Asia Conf. Knowledge Discovery and Data Mining, Sydney, Australia.Google Scholar
Roddick, J.F., Hornsby, K., & de Vries, D. (2003). A unifying semantic distance model for determining the similarity of attribute values. Proc. 26th Australasian Computer Science Conf., pp. 111118.Google Scholar
Saitta, S., Raphael, B., & Smith, I.F.C. (2005). Data mining techniques for improving the reliability of system identification. Advanced Engineering Informatics 19 ( 4), 289298.CrossRefGoogle Scholar
Shirky, C. (2007). Ontology is overrated: categories, links, and tags. Accessed at http://shirky.com/writings/ontology_overrated.htmlGoogle Scholar
Silberschatz, A., & Tuzhilin, A. (1995). On subjective measures of interestingness in knowledge discovery Proc. 1st ACM SIGKDD Conf. Knowledge Discovery and Data Mining, pp. 275281.Google Scholar
Sirim, E., Hendler, J., & Parsia, B. (2003). Semi-automatic composition of web services using semantic description. Proc. Web Services: Modeling Architectures and Infrastructures Workshop in Conjunction With ICEIS 2003.Google Scholar
Smith, I.F.C., Saitta, S., Ravindran, S., & Kripakaran, P. (2006). Challenges of data interpretation. Proc. 18th SAMCO Workshop, pp. 3757.Google Scholar
Soibelman, L., & Kim, H. (2000). Generating construction knowledge with knowledge discovery in databases. Proc. 8th Int. Conf. Computing in Civil and Building Engineering (VIII-ICCCBE), Stanford, CA.CrossRefGoogle Scholar
Varde, A., Ma, S., Maniruzzaman, M., Brown, D.C., Rundensteiner, E., & Sisson, R. Jr. (2007). Comparing mathematical and heuristic approaches for scientific data analysis. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 22 ( 1), 5369.CrossRefGoogle Scholar
Yao, H. (2005). Cohesion metrics for ontology design and application. Journal of Computer Science 1 ( 1), 107113.Google Scholar