Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Empirical investigation of dimension hierarchy sharing-based metrics for multidimensional schema understandability

Published: 25 May 2019 Publication History

Abstract

Over the last years, quality has gained lot of importance in the development of data warehouse systems. Predicting understandability of multidimensional schemas could play a key role in controlling data warehouse quality at early stages of development. In this area, some effort has been spent to define structural metrics and identity models for assessing quality of these systems. Of the structural properties used to define metrics, aspects of dimension hierarchies and its sharing plays primary role to enhance analytical capabilities of multidimensional schemas, thereby affecting their quality. The authors have previously proposed structural metrics based on aforementioned aspects. The objective of this study is to apply principal component analysis PCA to find whether our metrics are improvements over the other existing metrics; and to apply logistic regression to study whether the metrics selected as relevant in the extracted principal components combined together are indicators of multidimensional schema understandability. The results of PCA confirm that our structural metrics based on the concept of sharing are different from other such metrics existing in the literature. Further, the metrics selected as principal components can be used in combination to predict understandability of data warehouse multidimensional schemas.

References

[1]
Bandi, R.K., Vaishnavi, V.K. and Turk, D.E. (2003) 'Predicting maintenance performance using object-oriented design complexity metrics', IEEE Transactions on Software Engineering, Vol. 29, No. 1, pp. 77-87.
[2]
Basili, V.R. and Weiss, D.M. (1984) 'A methodology for collecting valid software engineering data', IEEE Transactions on Software Engineering, Vol. 10, No. 6, pp. 728-738.
[3]
Bellini, P., Bruno, I., Nesi, P. and Rogai, D. (2005) 'Comparing fault-proneness estimation models' in Proceedings of 10th IEEE International Conference on Engineering of Complex Computer Systems, ICECCS 2005, IEEE, pp. 205-214.
[4]
Berenguer, G., Romero, R., Trujillo, J., Serrano, M. and Piattini, M. (2005) 'A set of quality indicators and their corresponding metrics for conceptual models of data warehouses', Data Warehousing and Knowledge Discovery, Proceedings of 7th International Conference, DaWaK 2005, Copenhagen, Denmark, 22-26 August 2005, A Min, Tjoa Juan Trujillo (Eds.), pp. 95-104.
[5]
Briand, L.C., Morasca, S. and Basili, V.R. (1996) 'Property-based software engineering measurement', IEEE Transactions on Software Engineering, Vol. 22, No. 1, pp. 68-86.
[6]
Briand, L.C., Wüst, J., Daly, J.W. and Porter, D.V. (2000) 'Exploring the relationships between design measures and software quality in object-oriented systems', Journal of Systems and Software, Vol. 51, No. 3, pp. 245-273.
[7]
Calero, C., Piattini, M., Pascual, C. and Serrano, M.A. (2001) 'Towards data warehouse quality metrics', in Proceedings of 3rd International Workshop on Design and Management of Data Warehouse, Interlaken, Switzerland, p.2.
[8]
Cherfi, S.S.S. and Prat, N. (2003) 'Multidimensional schemas quality: assessing and balancing analyzability and simplicity', in International Conference on Conceptual Modeling, Springer, Berlin, Heidelberg, pp. 140-151.
[9]
Cruz-Lemus, J.A., Maes, A., Genero, M., Poels, G. and Piattini, M. (2010) 'The impact of structural complexity on the understandability of UML statechart diagrams', Information Sciences, Vol. 180, No. 11, pp. 2209-2220.
[10]
Dahiya, N. (2016) 'Study of conceptual models from the perspective of quality metrics', International Journal of Emerging Research in Management & Technology, Vol. 5, No. 2, pp. 38-40, ISSN: 2278-9359.
[11]
Dahiya, N., Bhatnagar, V. and Singh, M. (2015) 'Enhancing consistency of conceptual data warehouse design', International Journal of Computational Systems Engineering, Vol. 2, No. 1, pp. 11-24.
[12]
Dillon, W.R.G., Dillon, M.W.R. and Goldstein, M. (1984) Multivariate Analysis Methods and Applications, No. 519.535 D5.
[13]
Fenton, N. and Bieman, J. (2014) Software Metrics: A Rigorous and Practical Approach, CRC Press, Boca Raton, Florida.
[14]
Fioravanti, F. and Nesi, P. (2001) 'A study on fault-proneness detection of object-oriented systems', in Fifth European Conference on Software Maintenance and Reengineering, IEEE, pp. 121-130.
[15]
Genero, M., Piattini, M. and Calero, C. (2002) 'An empirical study to validate metrics for class diagrams', in Proc. of International Database Engineering and Applications Symposium (IDEAS'02), Edmonton, Canada, pp. 1-10.
[16]
Gosain A. and Singh J. (2017) 'Quality metrics emphasizing dimension hierarchy sharing in multidimensional models for data warehouse: a theoretical and empirical evaluation', International Journal of System Assurance Engineering and Management, June,
[17]
Gosain, A. and Mann, S. (2014) 'Empirical validation of metrics for object oriented multidimensional model for data warehouse', International Journal of System Assurance Engineering and Management, Vol. 5, No. 3, pp. 262-275.
[18]
Gosain, A. and Singh, J. (2015a) 'Quality metrics for data warehouse multidimensional models with focus on dimension hierarchy sharing', in El-Alfy, E.S.M., Thampi, S.M., Takagi, H., Piramuthu, S. and Hanne, T. (Eds.): Advances in Intelligent Informatics, pp. 429-443, Springer International Publishing, India.
[19]
Gosain, A. and Singh, J. (2015b) 'Conceptual multidimensional modeling for data warehouses: a survey', in Satapathy, S., Biswal, B., Udgata, S. and Mandal, J. (Eds.): Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, Vol. 327, pp. 305-316, Springer, Cham.
[20]
Gosain, A., Nagpal, S. and Sabharwal, S. (2011) 'Quality metrics for conceptual models for data warehouse focusing on dimension hierarchies', ACM SIGSOFT Software Engineering Notes, Vol. 36, No. 4, pp. 1-5, ACM.
[21]
Gosain, A., Nagpal, S. and Sabharwal, S. (2013) 'Validating dimension hierarchy metrics for the understandability of multidimensional models for data warehouse', Software, IET, Vol. 7, No. 2, pp. 93-103.
[22]
Inmon, W.H. (2005) Building the Data Warehouse, 4th ed., John Wiley and Sons, Inc., New York, USA.
[23]
Jarke, M., Lenzerini, M., Vassiliou, Y. and Vassiliadis, P. (2003) Fundamentals of Data Warehouses, 2nd ed., Springer Science and Business Media, Berlin.
[24]
Kimball, R. and Ross, M. (2002) The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd ed., Wiley, New York.
[25]
Kitchenham, B.A., Pfleeger, S.L., Pickard, L.M., Jones, P.W., Hoaglin, D.C., El Emam, K. and Rosenberg, J. (2002) 'Preliminary guidelines for empirical research in software engineering', IEEE Transactions on Software Engineering, Vol. 28, No. 8, pp. 721-734.
[26]
Kohavi, R. (1995) 'A study of cross-validation and bootstrap for accuracy estimation and model selection', in International Joint Conference on Artificial Intelligence, Vol. 14, No. 2, pp. 1137-1145.
[27]
Kumar, M., Gosain, A. and Singh, Y. (2014) 'Empirical validation of structural metrics for predicting understandability of conceptual schemas for data warehouse', International Journal of System Assurance Engineering and Management, Vol. 5, No. 3, pp. 291-306.
[28]
Lanubile, F. and Visaggio, G. (1997) Evaluating predictive quality models derived from software measures: lessons learned', Journal of Systems and Software, Vol. 38, No. 3, pp. 225-234.
[29]
Lemeshow, S. and Hosmer, D. (2000) Applied Logistic Regression, Wiley Series in Probability and Statistics, Wiley-Interscience, Hoboken, John Wiley & Sons, New York.
[30]
Lujan-Mora, S., Trujillo, J. and Song, I.Y. (2006) 'A UML profile for multidimensional modeling in data warehouses', Data and Knowledge Engineering, Vol. 59, No. 3, pp. 725-769.
[31]
Madbouly, A.I. and Barakat, T.M. (2016) 'Enhanced relevant feature selection model for intrusion detection systems', International Journal of Intelligent Engineering Informatics, Vol. 4, No. 1, pp. 21-45.
[32]
Malinowski, E. and Zimanyi, E. (2006) 'Hierarchies in a multidimensional model: from conceptual modeling to logical representation', Data and Knowledge Engineering, Vol. 59, No. 2, pp. 348-377.
[33]
Manning, C.D., Raghavan P. and Schütze H. (2008) Introduction to Information Retrieval, Cambridge University Press, New York, USA.
[34]
Mansmann, S. and Scholl, M.H. (2007) 'Extending the multidimensional data model to handle complex data', Journal of Computing Science and Engineering, Vol. 1, No. 2, pp. 125-160.
[35]
Nagpal, S., Gosain, A. and Sabharwal, S. (2012) 'Complexity metric for multidimensional models for data warehouse', in Proceedings of the CUBE International Information Technology Conference, ACM, pp. 360-365.
[36]
Nagpal, S., Gosain, A. and Sabharwal, S. (2013) 'Theoretical and empirical validation of comprehensive complexity metric for multidimensional models for data warehouse', International Journal of System Assurance Engineering and Management, Vol. 4, No. 2, pp. 193-204.
[37]
Poels, G. and Dedene, G. (1999) Distance: A Framework for Software Measure Construction, DTEW Research Report 9937, Department of Applies Economics Katholieke Universiteit Lueven, Belgium, pp. 1-47.
[38]
Prakasha, S., Raju, G.T. and Singh, M.K. (2016) 'Cluster optimisation in information retrieval using self-exploration-based PSO', International Journal of Intelligent Engineering Informatics, Vol. 4, No. 1, pp. 91-115.
[39]
Sabharwal, S., Nagpal, S. and Aggarwal, G. (2015) 'Coupling metrics for object-oriented data warehouse design', in 2nd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 918-922.
[40]
Sabharwal, S., Nagpal, S. and Aggarwal, G. (2016) Int J Syst Assur Eng Manag. Springer, India.
[41]
Serrano, M. (2004) Definition of a Set of Metrics for Assuring Data Warehouse Quality, University of Castilla, La Mancha, Spain.
[42]
Serrano, M.A., Calero, C. and Piattini, M. (2002) 'Validating metrics for data warehouse', Software IEEE Proceeding, Vol. 149, No. 5, pp. 161-166.
[43]
Serrano, M., Calero, C. and Piattini, M. (2003) 'Experimental validation of multidimensional data models metrics', in Proceedings of the 36th Annual Hawaii International Conference on System Sciences, IEEE, Hawaii.
[44]
Serrano, M., Calero, C. and Piattini, M. (2005) 'An experimental replication with data warehouse metrics', International Journal of Data Warehousing and Mining, Vol. 1, No. 4, pp. 1-21.
[45]
Serrano, M., Calero, C., Trujillo, J., Lujan-Mora, S. and Piattini, M. (2004) 'Empirical validation of metrics for conceptual models for data warehouse', in Advanced Information Systems Engineering, pp. 506-520, Springer, Berlin, Heidelberg.
[46]
Serrano, M., Trujillo, J., Calero, C. and Piattini, M. (2007) 'Metrics for data warehouse conceptual models understandability', Information and Software Technology, Vol. 49, No. 8, pp. 851-870.
[47]
Serrano, M.A., Calero, C., Sahraoui, H.A. and Piattini, M. (2008) 'Empirical studies to assess the understandability of data warehouse schemas using structural metrics', Software Quality Journal, Vol. 16, No. 1, pp. 79-106.
[48]
Zuse, H. (1998) A Framework of Software Measurement, Walter de Gruyter, Berlin.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Intelligent Engineering Informatics
International Journal of Intelligent Engineering Informatics  Volume 7, Issue 2-3
January 2019
197 pages
ISSN:1758-8715
EISSN:1758-8723
Issue’s Table of Contents

Publisher

Inderscience Publishers

Geneva 15, Switzerland

Publication History

Published: 25 May 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media