Relational Equivalence Proofs Between Imperative and MapReduce Algorithms

Bernhard Beckert¹⁷,
Timo Bingmann¹⁷,
Moritz Kiefer¹⁷,
Peter Sanders¹⁷,
Mattias Ulbrich¹⁷ &
…
Alexander Weigl¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11294))

Included in the following conference series:

Working Conference on Verified Software: Theories, Tools, and Experiments

632 Accesses
7 Citations

Abstract

Distributed programming frameworks like MapReduce, Spark and Thrill, are widely used for the implementation of algorithms operating on large datasets. However, implementing in these frameworks is more demanding than coming up with sequential implementations. One way to achieve correctness of an optimized implementation is by deriving it from an existing imperative sequential algorithm description through a sequence of behavior-preserving transformations.

We present a novel approach for proving equivalence between imperative and deterministic MapReduce algorithms based on partitioning the equivalence proof into a sequence of equivalence proofs between intermediate programs with smaller differences. Our approach is based on the insight that proofs are best conducted using a combination of two kinds of steps: (1) uniform context-independent rewriting transformations; and (2) context-dependent flexible transformations that can be proved using relational reasoning with coupling invariants.

We demonstrate the feasibility of our approach by evaluating it on two prototypical algorithms commonly used as examples in MapReduce frameworks: k-means and PageRank. To carry out the proofs, we use a higher-order theorem prover with partial proof automation. The results show that our approach and its prototypical implementation enable equivalence proofs of non-trivial algorithms and could be automated to a large degree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Representing MapReduce Optimisations in the Nested Relational Calculus

MapReduce: an infrastructure review and research insights

Article 08 June 2019

Cluster Computing in Zero Knowledge

References

Barthe, G., Crespo, J.M., Kunz, C.: Relational verification using product programs. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 200–214. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21437-0_17
Chapter Google Scholar
Beckert, B., Bingmann, T., Kiefer, M., Sanders, P., Ulbrich, M., Weigl, A.: Relational Equivalence Proofs Between Imperative and MapReduce Algorithms. ArXiv e-prints, January 2018. arXiv:1801.08766
Beckert, B., Bingmann, T., Kiefer, M., Sanders, P., Ulbrich, M., Weigl, A.: Proving equivalence between imperative and mapreduce implementations using program transformations. In: Third Workshop Models for Formal Analysis of Real Systems and Sixth International Workshop on Verification and Program Transformation. Electronic Proceedings in Theoretical Computer Science, vol. 268, pp. 185–199. Open Publishing Association (2018)
Google Scholar
Bingmann, T., et al.: Thrill: high-performance algorithmic distributed batch data processing with C++. In: IEEE International Conference on Big Data, pp. 172–183. IEEE, December 2016. preprint arXiv:1608.05634
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X
Article Google Scholar
Chambers, C., et al.: FlumeJava: easy, efficient data-parallel pipelines. ACM SIGPLAN Notices 45(6), 363–375 (2010)
Article Google Scholar
Chen, Y.F., Hong, C.D., Lengál, O., Mu, S.C., Sinha, N., Wang, B.Y.: An Executable Sequential Specification for Spark Aggregation (2017). arXiv:1702.02439
Chen, Y.-F., Hong, C.-D., Sinha, N., Wang, B.-Y.: Commutativity of reducers. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 131–146. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_9
Chapter Google Scholar
Chen, Y., Song, L., Wu, Z.: The Commutativity Problem of the MapReduce Framework: A Transducer-based Approach. CoRR abs/1605.01497 (2016). arXiv:1605.01497
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Elenbogen, D., Katz, S., Strichman, O.: Proving mutual termination. Form. Methods Syst. Des. 47(2), 204–229 (2015). https://doi.org/10.1007/s10703-015-0234-3
Article MATH Google Scholar
Felsing, D., Grebing, S., Klebanov, V., Rümmer, P., Ulbrich, M.: Automating regression verification. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 349–360. ASE 2014. ACM, New York, NY, USA (2014)
Google Scholar
Godlin, B., Strichman, O.: Regression verification. In: Proceedings of the 46th Annual Design Automation Conference, pp. 466–471. DAC 2009. ACM, New York, NY, USA (2009)
Google Scholar
Grossman, S., Cohen, S., Itzhaky, S., Rinetzky, N., Sagiv, M.: Verifying equivalence of spark programs. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 282–300. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_15
Chapter Google Scholar
Hawblitzel, C., Kawaguchi, M., Lahiri, S., Rebêlo, H.: Mutual summaries: unifying program comparison techniques. In: Informal proceedings of BOOGIE 2011 workshop (2011). https://www.microsoft.com/en-us/research/publication/mutual-summaries-unifying-program-comparison-techniques/
Kahn, G.: Natural semantics. STACS 87, 22–39 (1987)
MathSciNet MATH Google Scholar
Kiefer, M., Klebanov, V., Ulbrich, M.: Relational program reasoning using compiler IR - combining static verification and dynamic analysis. J. Autom. Reason. (2017)
Google Scholar
Klebanov, V., Rümmer, P., Ulbrich, M.: Automating regression verification of pointer programs by predicate abstraction. J. Formal Methods Syst. Des. 52, 229–259 (2017)
Article Google Scholar
Lahiri, S.K., Hawblitzel, C., Kawaguchi, M., Rebêlo, H.: SYMDIFF: a language-agnostic semantic diff tool for imperative programs. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 712–717. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424-7_54
Chapter Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489
Article MathSciNet MATH Google Scholar
Radoi, C., Fink, S.J., Rabbah, R., Sridharan, M.: Translating Imperative Code to MapReduce. SIGPLAN Not. 49(10), 909–927 (2014)
Article Google Scholar
The Coq development team: The Coq proof assistant reference manual. LogiCal Project, version 8.6 (2004). http://coq.inria.fr
Verdoolaege, S., Janssens, G., Bruynooghe, M.: Equivalence checking of static affine programs using widening to handle recurrences. ACM Trans. Program. Lang. Syst. 34(3), 11:1–11:35 (2012). https://doi.org/10.1145/2362389.2362390
Article MATH Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Sebastopol (2012)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10–10. HotCloud 2010, USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113

Download references

Author information

Authors and Affiliations

Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Bernhard Beckert, Timo Bingmann, Moritz Kiefer, Peter Sanders, Mattias Ulbrich & Alexander Weigl

Authors

Bernhard Beckert
View author publications
You can also search for this author in PubMed Google Scholar
Timo Bingmann
View author publications
You can also search for this author in PubMed Google Scholar
Moritz Kiefer
View author publications
You can also search for this author in PubMed Google Scholar
Peter Sanders
View author publications
You can also search for this author in PubMed Google Scholar
Mattias Ulbrich
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Weigl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Weigl .

Editor information

Editors and Affiliations

Yale University, New Haven, CT, USA
Ruzica Piskac
Uppsala University, Uppsala, Sweden
Philipp Rümmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beckert, B., Bingmann, T., Kiefer, M., Sanders, P., Ulbrich, M., Weigl, A. (2018). Relational Equivalence Proofs Between Imperative and MapReduce Algorithms. In: Piskac, R., Rümmer, P. (eds) Verified Software. Theories, Tools, and Experiments. VSTTE 2018. Lecture Notes in Computer Science(), vol 11294. Springer, Cham. https://doi.org/10.1007/978-3-030-03592-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-03592-1_14
Published: 24 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03591-4
Online ISBN: 978-3-030-03592-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Relational Equivalence Proofs Between Imperative and MapReduce Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Representing MapReduce Optimisations in the Nested Relational Calculus

MapReduce: an infrastructure review and research insights

Cluster Computing in Zero Knowledge

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Relational Equivalence Proofs Between Imperative and MapReduce Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Representing MapReduce Optimisations in the Nested Relational Calculus

MapReduce: an infrastructure review and research insights

Cluster Computing in Zero Knowledge

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation