Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Software trustworthiness 2.0-A semantic web enabled global source code analysis approach

Published: 01 March 2014 Publication History

Abstract

Introduction of a Semantic Web enabled global source code analysis infrastructure.Novel source code analysis approach combining crowdsourcing and linked-data.Novel proactive approach to improve trustworthiness of software systems.Case studies illustrating the applicability of the approach using different resources. There has been an ongoing trend toward collaborative software development using open and shared source code published in large software repositories on the Internet. While traditional source code analysis techniques perform well in single project contexts, new types of source code analysis techniques are ermerging, which focus on global source code analysis challenges. In this article, we discuss how the Semantic Web, can become an enabling technology to provide a standardized, formal, and semantic rich representations for modeling and analyzing large global source code corpora. Furthermore, inference services and other services provided by Semantic Web technologies can be used to support a variety of core source code analysis techniques, such as semantic code search, call graph construction, and clone detection. In this paper, we introduce SeCold, the first publicly available online linked data source code dataset for software engineering researchers and practitioners. Along with its dataset, SeCold also provides some Semantic Web enabled core services to support the analysis of Internet-scale source code repositories. We illustrated through several examples how this linked data combined with Semantic Web technologies can be harvested for different source code analysis tasks to support software trustworthiness. For the case studies, we combine both our linked-data set and Semantic Web enabled source code analysis services with knowledge extracted from StackOverflow, a crowdsourcing website. These case studies, we demonstrate that our approach is not only capable of crawling, processing, and scaling to traditional types of structured data (e.g., source code), but also supports emerging non-structured data sources, such as crowdsourced information (e.g., StackOverflow.com) to support a global source code analysis context.

References

[1]
F. Baader, D. Calvanese, D. McGuinness, D. Nardi, D. Patel, P. Schneider, The Description Logic Handbook, Cambridge University Press, 2003.
[2]
S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, C.C. Lopes, Sourcerer: a search engine for open source code supporting structure-based search, in: 21th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), 2006.
[3]
Berners Lee, T., Linked Data. http://www.w3.org/DesignIssu es/LinkedData.html (last visited January 2012).
[4]
D. Binkley, Source Code Analysis: A Road Map, Future of Software Engineering (FSE) (2007).
[5]
Y. Chen, E.R. Gansner, E. Koutsofios, A C++ data model supporting reachability analysis and dead code detection, IEEE Transactions on Software Engineering, 24 (1998).
[6]
B. Chess, J. West, Secure Programming with Static Analysis, Addison-Wesley, Upper Saddle River, NJ, 2007.
[7]
J.D. Choi, M. Burke, P. Carini, Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects, in: 20th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1993.
[8]
J.D. Choi, M. Burke, P. Carini, Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects, in: Proceedings of the 20th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1993.
[9]
Cyganiak, R., Jentzsch, A. Linking Open Data cloud diagram. http://lod-cloud.net/ (last visited January 2012).
[10]
C. David, M. Kohlhase, C. Lange, F. Rabe, N. Zhiltsov, V. Zholudev, Publishing Math Lecture Notes as Linked Data, Lecture Notes in Computer Science, 6089 (2010) 370-375.
[11]
DBpedia - Semantic Wiki DB. http://dbpedia.org (last visited January 2012).
[12]
D.M. German, Y. Manabe, K. Inoue, A sentence-matching method for automatic license identification of source code files, IEEE/ACM International Conference on Automated Software Engineering (ASE) (2010).
[13]
A. Hassan, The road ahead for mining software repositories, Frontiers of Software Maintenance (2008) 48-57.
[14]
W. Hasselbring, R. Reussner, Toward Trustworthy Software Systems, IEEE Computer, 39 (2006) 91-92.
[15]
M. Hirzel, Fast online pointer analysis, ACM Transactions on Programming Languages and Systems, 29 (2007).
[16]
R. Holt, A. Schurr, S. Sim, A. Winter, GXL: a graph-based standard exchange format for reengineering, Science of Computer Programming, 60 (2006 Apr) 149-170.
[17]
A. Iqbal, O. Ureche, M. Hausenblas, Integrating linked data driven software devel-opment interaction into an IDE, in: International Workshop on Semantic Web Enabled Software Engineering, 2009.
[18]
A. Jentzsch, Enabling tailored therapeutics with linked data, in: 2nd Workshop Linked Data on the Web, 2009.
[19]
A. Jentzsch, Enabling tailored therapeutics with linked data, in: In Proc. 2nd Workshop Linked Data on the Web, 2009.
[20]
Y. Jia, D. Binkley, M. Harman, J. Krinke, M. Matsushita, KClone: a proposed approach to fast precise code clone detection, in: Third International Workshop on Detection of Software Clones (IWSC), 2009.
[21]
I. Keivanloo, J. Rilling, Clone detection meets semantic web-based transitive closure computation, in: 1st ICSE International Workshop on Realizing AI Synergies in Software Engineering (RAISE), 2012.
[22]
I. Keivanloo, Roostapour, L. Schugerl, P. Rilling, Semantic web-based source code search, in: 6th Intl. Workshop on Semantic Web Enabled Software Engineering, 2010.
[23]
I. Keivanloo, L. Roostapour, Schugerl, P. Rilling, SE-CodeSearch: a scalable Se-mantic web-based source code search infrastructure, in: 26th IEEE International Conference on Software Maintenance, 2010.
[24]
I. Keivanloo, C. Forbes, J. Rilling, P. Charland, Towards sharing source code facts using linked data, ACM, New York, NY, USA, 2011.
[25]
I. Keivanloo, J. Rilling, P. Charland, Internet-scale real-time code clone search via multi-level indexing, 18th Working Conference on Reverse Engineering (2011).
[26]
I. Keivanloo, C.K. Roy, J. Rilling, Java Bytecode clone detection via relaxation on code fingerprint and semantic web reasoning, in: 6th International Workshop on Software Clones (IWSC), 2012.
[27]
C. Kiefer, A. Bernstein, J. Tappolet, Analyzing Software with iSPARQL, in: 3rd International Workshop on Semantic Web Enabled Software Engineering, 2007.
[28]
H. Knublauch, Ontology-driven software development in the context of the semantic web: An example scenario with Protege/OWL, in: 1st International Workshop on the Model-Driven Semantic Web, 2004.
[29]
R. Koschke, Large-scale inter-system clone detection using suffix trees, 16th European Conference on Software Maintenance and Reengineering (CSMR) (2012).
[30]
P. Lambrix, Towards a semantic Web for bioinformatics using ontology-based annotation, in: 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise, vol. no., 2005.
[31]
W. Li, R. Shatnawi, An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution, Journal of Systems and Software, 80 (2007) 1120-1128.
[32]
C. Lopes, S. Bajracharya, J. Ossher, P. Baldi, UCI Source Code Data Sets, University of California, Bren School of Information and Computer Sciences, 2010.
[33]
S. Mancoridis, B.S. Mitchell, Y.F. Chen, E.R. Gansner, Bunch: a clustering tool for the recovery and maintenance of software system structures, in: Proceedings of IEEE International Conference on Software Maintenance, 1999.
[34]
A. Milanova, A. Rountev, B.G. Ryder, Parameterized object sensitivity for points-to analysis for Java, ACM Transactions on Software Engineering and Methodology, 14 (2005) 1-41.
[35]
B. Motik, I. Horrocks, U. Sattler, Bridging the gap between OWL and relational databases, Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 7 (2009) 74-89.
[36]
OWL 2 Web Ontology Language Profiles. http://www.w3.org/TR/owl2-profiles (last visited January 2012).
[37]
Promise Data Project. http://promisedata.org/ (last visited January 2013).
[38]
PMD homepage. http://pmd.sourceforge.net/ (last visited October 2012).
[39]
RDF/XML Syntax Specification. http://www.w3.org/TR/REC-rdf-syntax (last visited January 2012).
[40]
RDF Schema. http://www.w3.org/TR/rdf-schema (last visited January 2012).
[41]
J. Rilling, R. Witte, P. Schuegerl, P. Charland, Beyond information silos-an omnipresent approach to software evolution, International Journal of Semantic Computing (IJSC), Special Issue on Ambient Semantic Computing, 2 (2008) 431-468.
[42]
M.P. Robillard, Topology analysis of software dependencies, ACM Transactions on Software Engineering and Methodology, 17 (2008) 1-36.
[43]
A. Rountev, S. Kagan, M. Gibas, Static and dynamic analysis of call chains in java, in: ACM SIGSOFT International Symposium on Software Testing and Analysis, 2004.
[44]
A. Rountev, S. Kagan, M. Gibas, Static and dynamic analysis of call chains in java, in: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis, 2004.
[45]
C.K. Roy, J.R. Cordy, R. Koschke, Comparison and evaluation of code clone detection techniques and tools: A qualitative approach, Science of Computer Programming (2009).
[46]
B.G. Ryder, F. Tip, Change impact analysis for object-oriented programs, in: InProceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, 2001.
[47]
P. Schugerl, D. Walsh, J. Rilling, P. Charland, A contextual guidance approach to software security, in: 33rd IEEE International Computer Software and Applications Conference (COMPSAC), 2009.
[48]
N. Schwarz, M. Lungu, R. Robbes, On how often code is cloned across repositories, in: International Conference on Software Engineering (ICSE), 2012.
[49]
M.L. Scott, Programming Language Pragmatics, Elsevier, 2005.
[50]
N. Shadbolt, W. Hall, T. Berners-Lee, The semantic web revisited, IEEE Intelligent Systems, 21 (2006) 96-101.
[51]
J. Tappolet, C. Kiefer, A. Bernstein, Semantic web enabled software analysis, Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 8 (2010) 225-240.
[52]
S. Thummalapenta, T. Xie, Parseweb: a programmer assistant for reusing open source code on the web, Automated Software Engineering (ASE) (2007) 204-213.
[53]
S. Tichelaar, D.D. Ducasse, S. Demeyer, FAMIX and XMI, in: Working Conference Reverse Engineering (WCRE), 2000.
[54]
S. Tichelaar, D. Ducasse, S. Demeyer, FAMIX and XMI, in: Working Conference Reverse Engineering (WCRE), 2000.
[55]
Triot Project. http://sourceforge.net/projects/triot/ (last visited January 2013).
[56]
M. Weiser, Programmers use slices when debugging, Commun. ACM 25, 7 (1982) 446-452.
[57]
M. Weiser, Programmers use slices when debugging, Communications of the ACM, 25 (1982) 446-452.
[58]
M. Würsch, G. Reif, S. Demeyer, H.C. Gall, Fostering synergies: how semantic web technology could influence software repositories," 2010 ICSE Workshop on Search-driven Development: Users, Infrastructure, Tools and Evaluation (SUITE) (2010) 45-48.

Cited By

View all
  • (2018)Toward the development of a conventional time series based web error forecasting frameworkEmpirical Software Engineering10.1007/s10664-017-9530-423:2(570-644)Online publication date: 1-Apr-2018

Index Terms

  1. Software trustworthiness 2.0-A semantic web enabled global source code analysis approach

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Journal of Systems and Software
    Journal of Systems and Software  Volume 89, Issue C
    March 2014
    207 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 March 2014

    Author Tags

    1. Global source code analysis
    2. Linked data
    3. Semantic Web
    4. Source code analysis

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Toward the development of a conventional time series based web error forecasting frameworkEmpirical Software Engineering10.1007/s10664-017-9530-423:2(570-644)Online publication date: 1-Apr-2018

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media