Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2804322.2804326acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Testing data transformations in MapReduce programs

Published: 30 August 2015 Publication History

Abstract

MapReduce is a parallel data processing paradigm oriented to process large volumes of information in data-intensive applications, such as Big Data environments. A characteristic of these applications is that they can have different data sources and data formats. For these reasons, the inputs could contain some poor quality data that could produce a failure if the program functionality does not handle properly the variety of input data. The output of these programs is obtained from a number of input transformations that represent the program logic. This paper proposes the testing technique called MRFlow that is based on data flow test criteria and oriented to transformations analysis between the input and the output in order to detect defects in MapReduce programs. MRFlow is applied over some MapReduce programs and detects several defects.

References

[1]
Hadoop: open-source software for reliable, scalable, distributed computing. http://hadoop.apache.org/ Accessed May, 2015.
[2]
Institutions that are using hadoop for educational or production uses. http://wiki.apache.org/hadoop/PoweredBy Accessed May, 2015.
[3]
Wordcount 1.0. http://hadoop.apache.org/docs/r2.7.0/hadoop-mapreduceclient/hadoop-mapreduce-clientcore/MapReduceTutorial.html#Example:_WordCount_v1.0 Accessed May, 2015.
[4]
IEEE draft international standard for software and systems engineering–software testing–part 4: Test techniques, 2014.
[5]
Alshahwan, N., and Harman, M. State aware test case regeneration for improving web application test suite coverage and fault detection. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (2012), ACM, pp. 45–55.
[6]
Bertolino, A. Software testing research: Achievements, challenges, dreams. In 2007 Future of Software Engineering (2007), IEEE Computer Society, pp. 85–103.
[7]
Camargo, L. C., and Vergilio, S. R. Classicação de defeitos para programas mapreduce: resultados de um estudo empírico. In AST - 7th Brazilian Workshop on Systematic and Automated Software Testing (2013).
[8]
Camargo, L. C., and Vergilio, S. R. Mapreduce program testing: a systematic mapping study. In Chilean Computer Science Society (SCCC), 32nd International Conference of the Computation (2013).
[9]
Chen, Y., Ganapathi, A., Griffith, R., and Katz, R. The case for evaluating mapreduce performance using workload suites. In Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2011 IEEE 19th International Symposium on (2011), IEEE, pp. 390–399.
[10]
Csallner, C., Fegaras, L., and Li, C. New ideas track: testing mapreduce-style programs. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (2011), ACM, pp. 504– 507.
[11]
Dean, J., and Ghemawat, S. Mapreduce: simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107–113.
[12]
Dörre, J., Apel, S., and Lengauer, C. Static type checking of hadoop mapreduce programs. In Proceedings of the second international workshop on MapReduce and its applications (2011), ACM, pp. 17–24.
[13]
Gudipati, M., Rao, S., Mohan, N. D., and Gajja, N. K. Big data: Testing approach to overcome quality challenges. Big Data: Challenges and Opportunities (2013), 65–72.
[14]
Huang, S., Huang, J., Dai, J., Xie, T., and Huang, B. The hibench benchmark suite: Characterization of the mapreducebased data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on (2010), IEEE, pp. 41–51.
[15]
Kavulya, S., Tan, J., Gandhi, R., and Narasimhan, P. An analysis of traces from a production mapreduce cluster. In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on (2010), IEEE, pp. 94–103.
[16]
Kim, K., Jeon, K., Han, H., Kim, S.-g., Jung, H., and Yeom, H. Y. Mrbench: A benchmark for mapreduce framework. In Parallel and Distributed Systems, 2008. ICPADS’08. 14th IEEE International Conference on (2008), IEEE, pp. 11–18.
[17]
Kocakulak, H., and Temizel, T. T. A hadoop solution for ballistic image analysis and recognition. In High Performance Computing and Simulation (HPCS), 2011 International Conference on (2011), IEEE, pp. 836–842.
[18]
Li, N., Escalona, A., Guo, Y., and Offutt, J. A scalable big data test framework. In Software Testing, Verification and Validation (ICST), 2015 IEEE 8th International Conference on (2015), IEEE, pp. 1–2.
[19]
Li, S., Zhou, H., Lin, H., Xiao, T., Lin, H., Lin, W., and Xie, T. A characteristic study on failures of production distributed data-parallel programs. In Proceedings of the 2013 International Conference on Software Engineering (2013), IEEE Press, pp. 963–972.
[20]
Mattos, A. J. d. Test data generation for testing mapreduce systems. Master’s thesis, Universidade Federal do Paraná, 2011.
[21]
Mittal, A. Trustworthiness of big data. International Journal of Computer Applications 80, 9 (2013), 35–40.
[22]
Morán, J., De La Riva, C., and Tuya, J. Mrtree: Functional testing based on mapreduce’s execution behaviour. In Future Internet of Things and Cloud (FiCloud), 2014 International Conference on (2014), IEEE, pp. 379–384.
[23]
Nachiyappan, S., and Justus, S. Getting ready for bigdata testing: A practitioner’s perception. In Computing, Communications and Networking Technologies (ICCCNT), 2013 Fourth International Conference on (2013), IEEE, pp. 1–5.
[24]
Owens, J. R., Femiano, B., and Lentz, J. Hadoop Real World Solutions Cookbook. Packt Publishing Ltd, 2013.
[25]
Rapps, S., and Weyuker, E. J. Selecting software test data using data flow information. Software Engineering, IEEE Transactions on, 4 (1985), 367–375.
[26]
Ren, K., Kwon, Y., Balazinska, M., and Howe, B. Hadoop’s adolescence: an analysis of hadoop usage in scientific workloads. Proceedings of the VLDB Endowment 6, 10 (2013), 853–864.
[27]
Schatz, M. C. Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics 25, 11 (2009), 1363–1369.
[28]
Sharma, M., Hasteer, N., Tuli, A., and Bansal, A. Investigating the inclinations of research and practices in hadoop: A systematic review. In Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference- (2014), IEEE, pp. 227– 231.
[29]
Sneed, H. M., and Erdoes, K. Testing big data (assuring the quality of large databases). In Software Testing, Verification and Validation Workshops (ICSTW), 2015 IEEE Eighth International Conference on (2015), IEEE, pp. 1–6.

Cited By

View all
  • (2024)A method of test case set generation in the commutativity test of reduce functionsScience of Computer Programming10.1016/j.scico.2023.103006231:COnline publication date: 1-Jan-2024
  • (2023)Big Data, Bigger Challenges: A Comparative Study of Performance Testing2023 Seventh International Conference on Image Information Processing (ICIIP)10.1109/ICIIP61524.2023.10537762(870-875)Online publication date: 22-Nov-2023
  • (2022)Comparative Analysis of Techniques for Big-data Performance Testing2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC56933.2022.10053306(292-297)Online publication date: 25-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
A-TEST 2015: Proceedings of the 6th International Workshop on Automating Test Case Design, Selection and Evaluation
August 2015
46 pages
ISBN:9781450338134
DOI:10.1145/2804322
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Flow Testing
  2. MapReduce programs
  3. Software Testing

Qualifiers

  • Research-article

Conference

ESEC/FSE'15
Sponsor:

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A method of test case set generation in the commutativity test of reduce functionsScience of Computer Programming10.1016/j.scico.2023.103006231:COnline publication date: 1-Jan-2024
  • (2023)Big Data, Bigger Challenges: A Comparative Study of Performance Testing2023 Seventh International Conference on Image Information Processing (ICIIP)10.1109/ICIIP61524.2023.10537762(870-875)Online publication date: 22-Nov-2023
  • (2022)Comparative Analysis of Techniques for Big-data Performance Testing2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC56933.2022.10053306(292-297)Online publication date: 25-Nov-2022
  • (2022)TRANSMUT‐Spark: Transformation mutation for Apache SparkSoftware Testing, Verification and Reliability10.1002/stvr.180932:8Online publication date: 10-Feb-2022
  • (2020)Testing MapReduce program using Induction Method2020 IEEE International Students' Conference on Electrical,Electronics and Computer Science (SCEECS)10.1109/SCEECS48394.2020.178(1-5)Online publication date: Feb-2020
  • (2020)Mutation Operators for Large Scale Data Processing Programs in SparkAdvanced Information Systems Engineering10.1007/978-3-030-49435-3_30(482-497)Online publication date: 3-Jun-2020
  • (2019)Testing MapReduce programsJournal of Software: Evolution and Process10.1002/smr.212031:3Online publication date: 25-Mar-2019
  • (2018)Automatic Testing of Design Faults in MapReduce ApplicationsIEEE Transactions on Reliability10.1109/TR.2018.280204767:3(717-732)Online publication date: Sep-2018
  • (2018)A Method-Level Test Generation Framework for Debugging Big Data Applications2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622248(221-230)Online publication date: Dec-2018
  • (2017)Towards Ex Vivo Testing of MapReduce Applications2017 IEEE International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS.2017.17(73-80)Online publication date: Jul-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media