Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2931037.2931071acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

DSI: an evidence-based approach to identify dynamic data structures in C programs

Published: 18 July 2016 Publication History

Abstract

Comprehension of C programs containing pointer-based dynamic data structures can be a challenging task. To tackle this challenge we present Data Structure Investigator (DSI), a new dynamic analysis for automated data structure identification that targets C source code. Our technique first applies a novel abstraction on the evolving memory structures observed at runtime to discover data structure building blocks. By analyzing the interconnections between building blocks we are then able to identify, e.g., binary trees, doubly-linked lists, skip lists, and relationships between these such as nesting. Since the true shape of a data structure may be temporarily obscured by manipulation operations, we ensure robustness by first discovering and then reinforcing evidence for data structure observations. We show the utility of our DSI prototype implementation by applying it to both synthetic and real world examples. DSI outputs summarizations of the identified data structures, which will benefit software developers when maintaining (legacy) code and inform other applications such as memory visualization and program verification.

References

[1]
GNU bash v4.3.30. https://www.gnu.org/software/ bash/. Accessed: 17th May 2016.
[2]
libusb 1.0.20. http://www.libusb.info/. Accessed: 17th May 2016.
[3]
Linux Kernel 4.1 Cyclic DLL(include/linux/list.h). http://www.kernel.org/. Accessed: 17th May 2016.
[4]
Olden Benchmark v1.01. http://www.martincarlisle .com/olden.html. Accessed: 17th May 2016.
[5]
Predator/Forester GIT Repository. In particular, examples: lit1: tests/forester/dll-duplicate-sels.c, lit2: tests/skip-list/jonathan-skip-list.c, lit3: tests/forester/cav13_tests/slllistoftwoclists-linux.c, lit4: tests/predator-regre/test-0234.c. https://github.com/kdudka/predator. Accessed: 17th May 2016.
[6]
Supplemental Material for DSI. http://www. swt-bamberg.de/research/data-structures.html. Accessed: 17th May 2016.
[7]
The Computer Language Benchmarks Game: Binary Tree. https://benchmarksgame.alioth.debian.org/ u64q/program.php?test=binarytrees&lang=gcc&id=1, Contributed by Kevin Carson. Accessed: 17th May 2016.
[8]
E. E. Aftandilian, S. Kelley, C. Gramazio, N. Ricci, S. L. Su, and S. Z. Guyer. Heapviz: Interactive Heap Visualization for Program Understanding and Debugging. In SOFTVIS 2010, pages 53–62. ACM, 2010.
[9]
A. Braginsky and E. Petrank. Locality-Conscious Lock-Free Linked Lists. In ICDCN 2011, volume 6522 of LNCS, pages 107–118. Springer, 2011.
[10]
M. Brockschmidt, Y. Chen, B. Cook, P. Kohli, S. Krishna, D. Tarlow, and H. Zhu. Learning to Verify the Heap. Technical Report MSR-TR-2016-17, 2016.
[11]
J. Caballero, G. Grieco, M. Marron, Z. Lin, and D. Urbina. ARTISTE: Automatic Generation of Hybrid Data Structure Signatures from Binary Code Executions. Technical Report TR-IMDEA-SW-2012-001, IMDEA Software Institute, Spain, 2012.
[12]
P. Cousot and R. Cousot. Systematic Design of Program Analysis Frameworks. In POPL 1979, pages 269–282. ACM, 1979.
[13]
A. Cozzie, F. Stratton, H. Xue, and S. King. Digging for Data Structures. In OSDI 2008, pages 255–266. USENIX, 2008.
[14]
K. Dudka, L. Hol´ık, P. Peringer, M. Trt´ık, and T. Vojnar. From Low-Level Pointers to High-Level Containers. Technical Report FIT-TR-2015-03, Faculty of Information Technology, Brno University of Technology, October 2015.
[15]
K. Dudka, P. Peringer, and T. Vojnar. Byte-Precise Verification of Low-Level List Manipulation. In SAS 2013, volume 7935 of LNCS, pages 215–237. Springer, 2013.
[16]
I. Haller, A. Slowinska, and H. Bos. Scalable Data Structure Detection and Classification for C/C++ Binaries. Empirical Software Engineering, pages 1–33, 2015.
[17]
L. Hol´ık, O. Lengál, A. Rogalewicz, J. ˇ Simᡠcek, and T. Vojnar. Fully Automated Shape Analysis Based on Forest Automata. In CAV 2013, volume 8044 of LNCS, pages 740–755. Springer, 2013.
[18]
M. Jump and K. S. McKinley. Dynamic Shape Analysis via Degree Metrics. In ISMM 2009, pages 119–128. ACM, 2009.
[19]
C. Jung and N. Clark. DDT: Design and Evaluation of a Dynamic Program Analysis for Optimizing Data Structure Usage. In MICRO 2009, pages 56–66. IEEE, 2009.
[20]
C. Jung, S. Rus, B. P. Railing, N. Clark, and S. Pande. Brainy: Effective Selection of Data Structures. SIGPLAN Not., 46(6):86–97, June 2011.
[21]
J. Lee, T. Avgerinos, and D. Brumley. TIE: Principled Reverse Engineering of Types in Binary Programs. In NDSS 2011. The Internet Society, 2011.
[22]
Z. Lin, X. Zhang, and D. Xu. Automatic Reverse Engineering of Data Structures from Binary Execution. In NDSS 2010. The Internet Society, 2010.
[23]
M. Marron, C. Sanchez, Z. Su, and M. Fähndrich. Abstracting Runtime Heaps for Program Understanding. IEEE Trans. Softw. Eng., 39(6):774–786, 2013.
[24]
J. T. Mühlberg, D. H. White, M. Dodds, G. Lüttgen, and F. Piessens. Learning Assertions to Verify Linked-List Programs. In SEFM 2015, volume 9276 of LNCS, pages 37–52. Springer, 2015.
[25]
G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. In CC 2002, volume 2304 of LNCS, pages 213–228. Springer, 2002.
[26]
E. Raman and D. August. Recursive Data Structure Profiling. In MSP 2005, pages 5–14. ACM, 2005.
[27]
A. Slowinska, T. Stancescu, and H. Bos. Howard: A Dynamic Excavator for Reverse Engineering Data Structures. In NDSS 2011. The Internet Society, 2011.
[28]
M. Weiss. Data structures and algorithm analysis in C. Cummings, 1993.
[29]
D. H. White. dsOli: Data Structure Operation Location and Identification. In ICPC 2014, pages 48–52. ACM, 2014.
[30]
D. H. White and G. Lüttgen. Identifying Dynamic Data Structures by Learning Evolving Patterns in Memory. In TACAS 2013, volume 7795 of LNCS, pages 354–369. Springer, 2013.
[31]
J. Wolf. C von A bis Z. Galileo Computing, 2009.

Cited By

View all
  • (2022)Shape-analysis driven memory graph visualizationProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527913(298-308)Online publication date: 16-May-2022
  • (2022)Heap Patterns for Memory Graph Visualization2022 Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT55257.2022.00025(162-166)Online publication date: Oct-2022
  • (2020)Data Structure Visualization on the Web2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378249(3272-3279)Online publication date: 10-Dec-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis
July 2016
452 pages
ISBN:9781450343909
DOI:10.1145/2931037
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data structure identification
  2. dynamic data structures
  3. pointer programs
  4. program comprehension

Qualifiers

  • Research-article

Conference

ISSTA '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Shape-analysis driven memory graph visualizationProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527913(298-308)Online publication date: 16-May-2022
  • (2022)Heap Patterns for Memory Graph Visualization2022 Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT55257.2022.00025(162-166)Online publication date: Oct-2022
  • (2020)Data Structure Visualization on the Web2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378249(3272-3279)Online publication date: 10-Dec-2020
  • (2019)TypeMiner: Recovering Types in Binary Programs Using Machine LearningDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-030-22038-9_14(288-308)Online publication date: 6-Jun-2019
  • (2018)Generating Inductive Shape Predicates for Runtime Checking and Formal VerificationLeveraging Applications of Formal Methods, Verification and Validation. Verification10.1007/978-3-030-03421-4_5(64-74)Online publication date: 30-Oct-2018
  • (2017)DSIbin: identifying dynamic data structures in C/C++ binariesProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155607(331-341)Online publication date: 30-Oct-2017
  • (2017)TravioliProceedings of the 39th International Conference on Software Engineering10.1109/ICSE.2017.50(473-483)Online publication date: 20-May-2017
  • (2017)DSIbin: Identifying dynamic data structures in C/C++ binaries2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE.2017.8115646(331-341)Online publication date: Oct-2017
  • (2016)POSTERProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security10.1145/2976749.2989041(1772-1774)Online publication date: 24-Oct-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media