Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Enabling On-the-Fly Hardware Tracing of Data Reads in Multicores

Published: 10 June 2019 Publication History

Abstract

Software debugging is one of the most challenging aspects of embedded system development due to growing hardware and software complexity, limited visibility of system components, and tightening time-to-market. To find software bugs faster, developers often rely on on-chip trace modules with large buffers to capture program execution traces with minimum interference with program execution. However, the high volumes of trace data and the high cost of trace modules limit the visibility into the system operation to short program segments. This article introduces a new hardware/software technique for capturing and filtering read data value traces in multicores that enables a complete reconstruction of parallel program execution. The proposed technique exploits tracking of data reads in data caches and cache coherence protocol states to minimize the number of trace messages streamed out of the target platform to the software debugger. The effectiveness of the proposed technique is determined by analyzing the required trace port bandwidth and trace buffer sizes as a function of the data cache size and the number of processor cores. The results show that the proposed technique significantly reduces the required trace port bandwidth, from 12.2 to 73.9 times, when compared to the Nexus-like read data value tracing, thus enabling continuous on-the-fly data tracing at modest hardware cost.

References

[1]
Arm. 2018. Arm Embedded Trace Macrocell Architecture Specification ETMv4.0 to ETMv4.4. Retrieved June 7, 2018 from https://static.docs.arm.com/ihi0064/f/etm_v4_4_architecture_specification_IHI0064F.pdf.
[2]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. 72.
[3]
Mike Burrows and David J. Wheeler. 1994. A Block-sorting Lossless Data Compression Algorithm. Digital SRC. Retrieved from https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf.
[4]
James Campbell, Valeriy Kazantsev, and Hugh O'Keeffe. 2017. Real-Time Trace: A Better Way to Debug Embedded Applications. Ashling Microsystems. Retrieved July 12, 2017 from http://www.ashling.com/wp-content/uploads/Real-time_trace_a_better_way_to_debug_embedded_applications.pdf.
[5]
Yunji Chen, Weiwu Hu, Tianshi Chen, and Ruiyang Wu. 2010. LReplay: A pending period based deterministic replay scheme. In Proceedings of the 37th Annual International Symposium on Computer Architecture. 187--197.
[6]
John L. Hennessy and David A. Patterson. 2012. Computer Architecture: A Quantitative Approach (5th ed.). Morgan Kaufmann/Elsevier, Waltham MA.
[7]
Christian Hochberger and Alexander Weiss. 2008. Acquiring an exhaustive, continuous and real-time trace from SoCs. In Proceedings of the IEEE International Conference on Computer Design 2008 (ICCD’08). 356--362.
[8]
Andrew B. T. Hopkins and Klaus D. McDonald-Maier. 2006. Debug support strategy for systems-on-chips with multiple processor cores. IEEE Trans. Comput. 55, 2 (2006), 174--184.
[9]
IEEE-ISTO. 2012. The Nexus 5001 Forum Standard for a Global Embedded Processor Debug Interface V 3.01. Retrieved November 28, 2015 from http://www.nexus5001.org/standard.
[10]
Intel. 2016. Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3C. Retrieved July 11, 2017 from https://goo.gl/QLKR85.
[11]
Intel. 2018. Nios II Processor Reference Guide. Intel. Retrieved June 7, 2018 from https://goo.gl/Ghp8xk.
[12]
Kai-uwe Irrgang and Thomas B. Preußer. 2015. An LZ77-style bit-level compression for trace data compaction. In Proceedings of the 2015 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--4.
[13]
Chung-Fu Kao, Shyh-Ming Huang, and Ing-Jer Huang. 2007. A Hardware Approach to Real-Time Program Trace Compression for Embedded Processors. IEEE Trans. Circ Syst. 54, 3 (2007), 530--543.
[14]
Georgios Kornaros and Dionisios Pnevmatikatos. 2013. A survey and taxonomy of on-chip monitoring of multicore systems-on-chip. ACM Trans. Autom. Electron. Syst. 18, 2 (2013), 17:1--17:38.
[15]
Felix Martin and Michael Deubzer. 2017. Hardware Tracing of Embedded Multi-Core Real-Time Systems. SAE International, Warrendale, PA.
[16]
Albrecht Mayer, Harry Siebert, and Klaus D. McDonald-Maier. 2007. Boosting debugging support for complex systems on chip. Computer 40, 4 (2007), 76--81.
[17]
Bojan Mihajlović, Željko Žilić, and Warren J. Gross. 2015. Architecture-aware real-time compression of execution traces. ACM Trans. Embed. Comput. Syst. 14, 4 (2015), 75:1--75:24.
[18]
Aleksandar Milenković, Vladimir Uzelac, Milena Milenković, and Burtscher Burtscher. 2011. Caches and predictors for real-time, unobtrusive, and cost-effective program tracing in embedded systems. IEEE Trans. Comput. 60, 7 (2011), 992--1005.
[19]
MIPS Technologies. 2012. MIPS PDtrace Specification. MIPS. Retrieved April 1, 2016 from http://www.t-es-t.hu/download/mips/md00439g.pdf.
[20]
Pablo Montesinos, Luis Ceze, and Josep Torrellas. 2008. Delorean: recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Proceedings of the 35th International Symposium on Computer Architecture, 289--300.
[21]
Satish Narayanasamy, Gilles Pokam, and Brad Calder. 2005. BugNet: Continuously recording program execution for deterministic replay debugging. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA’05). 284--295.
[22]
William Orme. 2008. Debug and Trace for Multicore SoCs. Retrieved March 28, 2016 from https://www.arm.com/files/pdf/CoresightWhitepaper.pdf.
[23]
Mounika Ponugoti and Aleksandar Milenković. 2016. Exploiting cache coherence for effective on-the-fly data tracing in multicores. In Proceedings of the 2016 IEEE 34th International Conference on Computer Design (ICCD’16). 312--319.
[24]
Mounika Ponugoti, Amrish K. Tewar, and Aleksandar Milenkovic. 2016. On-the-fly load data value tracing in multicores. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’16).
[25]
Suchakrapani Datt Sharma and Michel Dagenais. 2016. Hardware-assisted instruction profiling and latency detection. J. Eng. 2016, 10 (2016), 367--376.
[26]
Neal Stollon and R. Collins. 2006. Nexus based multi-core debug. In Proceedings of the Design Conference International Engineering Consortium. 805--822. Retrieved March 28, 2016 from http://nexus5001.org/wp-content/uploads/2015/02/DesignCon_2006_Nexus_FS2_Freescale.pdf.
[27]
Gregory Tassey. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing. Retrieved from http://www.rti.org/pubs/software_testing.pdf.
[28]
Amrish Tewar, Albert Myers, and Aleksandar Milenković. 2015. mcfTRaptor: Toward unobtrusive on-the-fly control-flow tracing in multicores. J. Syst. Archit. 61, 10 (2015), 601--614.
[29]
Henrik Thane and Hans Hansson. 2000. Using deterministic replay for debugging of distributed real-time systems. In Proceedings of the 12th Euromicro Conference on Real-time Systems (Euromicro-RTS’00). 265--272.
[30]
Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 335.
[31]
Vladimir Uzelac and Aleksandar Milenkovic. 2009. A real-time program trace compressor utilizing double move-to-front method. In Proceedings of the Design Automation Conference. 738--743.
[32]
Vladimir Uzelac and Aleksandar Milenkovic. 2013. Hardware-based load value trace filtering for on-the-fly debugging. Trans. Embed. Comput. Syst. 12, 2s (2013), 1--18.
[33]
Vladimir Uzelac, Aleksandar Milenković, Milena Milenković, and Martin Burtscher. 2014. Using branch predictors and variable encoding for on-the-fly program tracing. IEEE Trans. Comput. 63, 4 (2014), 1008--1020.
[34]
Michael Williams. 2012. ARMV8 debug and trace architectures. In Proceedings of the 2012 System, Software, SoC and Silicon Debug Conference. 1--6.
[35]
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24--36.
[36]
Min Xu, Rastislav Bodik, and Mark D. Hill. 2003. A “flight data recorder” for enabling full-system multiprocessor deterministic replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). 122--135.
[37]
Min Xu, Mark D. Hill, and Rastislav Bodik. 2006. A regulated transitive reduction (RTR) for longer memory race recording. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 49--60.
[38]
Jacob Ziv and Abraham Lempel. 2006. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23, 3 (2006), 337--343.
[39]
2005. Freescale—MPC565 Reference Manual. Retrieved from https://www.nxp.com/docs/en/data-sheet/MPC565RM.pdf.
[40]
International Technology Roadmap for Semiconductors 2007 Edition. Retrieved April 8, 2016 from https://goo.gl/TdZY52.
[41]
University of Cambridge Reverse Debugging Study. Retrieved December 17, 2017 from https://goo.gl/4asWCW.

Cited By

View all
  • (2024)A comprehensive evaluation of interrupt measurement techniques for predictability in safety-critical systemsProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3670451(1-10)Online publication date: 30-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 18, Issue 4
July 2019
217 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3340300
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 10 June 2019
Accepted: 01 March 2019
Revised: 01 December 2018
Received: 01 July 2018
Published in TECS Volume 18, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Real-time embedded systems
  2. multicores
  3. program tracing
  4. software testing and debugging

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A comprehensive evaluation of interrupt measurement techniques for predictability in safety-critical systemsProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3670451(1-10)Online publication date: 30-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media