The Use of National Strategic Reference Framework Data in Knowledge Graphs and Data Mining to Identify Red Flags
<p>System Architecture.</p> "> Figure 2
<p>VFP classes and their relations.</p> "> Figure 3
<p>NSRF-GR Vocabulary classes and their relations.</p> "> Figure 4
<p>Example of a project’s IRI viewed using the RDFBrowser.</p> "> Figure 5
<p>Example of kinds of points DBSCAN uses in 3D feature space. <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <mi>P</mi> <mi>t</mi> <mi>s</mi> </mrow> </semantics></math> = 4. Points <math display="inline"><semantics> <msub> <mi>P</mi> <mn>1</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>P</mi> <mn>2</mn> </msub> </semantics></math> are core points, because the area surrounding these points in an <math display="inline"><semantics> <mi>ε</mi> </semantics></math> radius contain at least 4 points (including the project itself). Because they are all reachable from one another, they form a single cluster. Point <math display="inline"><semantics> <msub> <mi>P</mi> <mn>3</mn> </msub> </semantics></math> is not core point, but is reachable from <math display="inline"><semantics> <msub> <mi>P</mi> <mn>1</mn> </msub> </semantics></math> and thus belongs to the cluster as well. Point <math display="inline"><semantics> <msub> <mi>P</mi> <mi>r</mi> </msub> </semantics></math> is a noise point that is neither a core point nor directly-reachable.</p> "> Figure 6
<p>4-nearest neighbor distance plot.</p> "> Figure 7
<p>Performance Indicators of projects represented by two principal components.</p> "> Figure 8
<p>DBSCAN identified 71% of the rejected projects as Red Flags.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Overview
2.2. Data
- Projects: “A group of activities aiming at the realisation of a functionally complete and distinct result. Some projects may consist of other subprojects.” [3].
- Support-Grants: “An advantage in any form whatsoever conferred on a selective basis to organisations involved in economic activity private or public (’undertakings’) by national public authorities with the potential to distort competition and affect trade between member states of the European Union. The advantage can take different forms of assistance including the direct transfer of resources, such as grants and soft loans, and also indirect assistance, for example, relief from charges that an undertaking normally has to bear, such as a tax exemption or the provision of services, loans, at a favourable rate.” [3].
2.3. Semantic Data Modeling
2.4. NSRF Knowledge Graph and Data Retrieval
2.5. Performance Indicators
2.6. Density Based Clustering
- A point is a core point if at least points are within distance . Those points are said to be directly reachable from .
- A point is density reachable to a point with regard to and , if there is a path of core points where each point of the path is directly reachable from the previous one.
- A point is density connected to a point with regard to and , if there is a point such that and are density reachable from with respect to and .
- A group of density connected points form a density based cluster and points that are not reachable from any other point are outliers.
2.7. Red Flags
3. Results
4. User Requirements and Use Case Scenario
- 1.
- Rejected projects should raise Red Flags, in order to avert project failure if possible.
- 2.
- Assist competent authorities to organize the monitoring process efficiently, without loss or misspend of time, by avoiding to examine most of the non-problematic projects.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ministry of Economy and Finance and General Secretariat for Investments and Development. National Strategic Reference Framework 2007–2013. Available online: http://2007-2013.espa.gr/elibrary/NSRF%20document_english.pdf (accessed on 12 March 2018).
- Hellenic Parliament. Open Disposition and Further Use of Documents, Information and Data of the Public Sector, Amendment of Law 3448/2006 (A’57), Adjustment of the National Legislation to the Provisions of Directive 2013/37/EU of the European Parliament and of the Council, Further Strengthening of Clarity, Regulations of Issues Concerning the Entrance Competition to the National School of Public Administration and Local Government and Other Provisions. 2014. Available online: http://www.hellenicparliament.gr/en/Nomothetiko-Ergo/Anazitisi-Nomothetikou-Ergou?law_id=300b8ca3-3468-4893-9336-9973d3fa247d (accessed on 12 March 2018).
- Ministry of Economy & Development. What is ANAPTYKSI.gov.gr. Available online: http://2013.anaptyxi.gov.gr/Default.aspx?tabid=249&language=en-US (accessed on 12 March 2018).
- Fazekas, M.; Tóth, I.J. 13 Corruption in EU Funds? Europe-Wide Evidence on the Corruption Effect of EU Funded Public Contracting. Available online: http://real.mtak.hu/80734/1/10.4324_9781315401867_14_u.pdf (accessed on 3 January 2021).
- Kenny, C.; Musatova, M. ’Red Flags Of Corruption’ In World Bank Projects: An Analysis Of Infrastructure Contracts; The World Bank: Washington, DC, USA, 2010. [Google Scholar] [CrossRef]
- Apostolou, B.A.; Hassell, J.M.; Webber, S.A.; Sumners, G.E. The Relative Importance of Management Fraud Risk Factors. Behav. Res. Account. 2001, 13, 1–24. [Google Scholar] [CrossRef]
- Hackenbrack, K. The effect of experience with different sized clients on auditor evaluations of fraudulent financial reporting indicators. Auditing 1993, 12, 99. [Google Scholar]
- Loebbecke, J.; Eining, M.; Willingham, J. Auditorsexperience with material irregularities: Frequency, nature, and detectability. Auditing: Frequency. J. Pract. Theory 1989, 9, 1–28. [Google Scholar]
- Majid, A.; Gul, F.A.; Tsui, J.S.L. An Analysis of Hong Kong Auditors’ Perceptions of the Importance of Selected Red Flag Factors in Risk Assessment. J. Bus. Ethics 2001, 32, 263–274. [Google Scholar] [CrossRef]
- Mock, T.J.; Turner, J.L. Auditor Identification of Fraud Risk Factors and their Impact on Audit Programs. Int. J. Audit. 2005, 9, 59–77. [Google Scholar] [CrossRef]
- Coram, P.; Ferguson, C.; Moroney, R. Internal audit, alternative internal audit structures and the level of misappropriation of assets fraud. Account. Financ. 2008, 48, 543–559. [Google Scholar] [CrossRef]
- Liou, F. Fraudulent financial reporting detection and business failure prediction models: A comparison. Manag. Audit. J. 2008, 23, 650–662. [Google Scholar] [CrossRef]
- Gullkvist, B.; Jokipii, A. Perceived importance of red flags across fraud types. Crit. Perspect. Account. 2013, 24, 44–61. [Google Scholar] [CrossRef]
- Calderoni, F.; Milani, R.; Rotondi, M.; Carbone, C.; Savona, E.; Riccardi, M.; Mancuso, M. Public Procurement Risk Assessment Software for Authorities, 2018. Deliverable 4.4. of the DIGIWHIST project funded under the European Union‘s Horizon 2020 research and innovation Programme under the G.A. No: 645852. Available online: https://digiwhist.eu/wp-content/uploads/2018/02/4.4-MET-Risk-Assessment-tool.pdf (accessed on 24 September 2019).
- Dimulescu, V.; Pop, R.; Doroftei, I.M. Risks of corruption and the management of EU funds in Romania. Rom. J. Political Sci. 2013, 13, 101–123. [Google Scholar]
- Beblavý, M.; Sičáková-Beblavá, E. The Changing Faces of Europeanisation: How Did the European Union Influence Corruption in Slovakia Before and After Accession? Eur.-Asia Stud. 2014, 66, 536–556. [Google Scholar] [CrossRef]
- Fazekas, M.; Chvalkovska, J.; Skuhrovec, J.; Tóth, I.J.; King, L.P. Are EU funds a corruption risk? The impact of EU funds on grand corruption in Central and Eastern Europe. Anticorruption Frontline ANTICORRP Proj. 2013, 2, 68–89. [Google Scholar] [CrossRef]
- Hajek, P.; Henriques, R. Mining corporate annual reports for intelligent detection of financial statement fraud – A comparative study of machine learning methods. Knowl.-Based Syst. 2017, 128, 139–152. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD’96, Portland, OR, USA, 2–4 August 1996; AAAI Press: Menlo Park, CA, USA, 1996; pp. 226–231. [Google Scholar]
- Corizzo, R.; Pio, G.; Ceci, M.; Malerba, D. DENCAST: Distributed density-based clustering for multi-target regression. J. Big Data 2019, 6. [Google Scholar] [CrossRef]
- Li, H.; Wang, W.; Huang, P.; Li, Q. Fault diagnosis of rolling bearing using symmetrized dot pattern and density-based clustering. Measurement 2020, 152, 107293. [Google Scholar] [CrossRef]
- Thrun, M.; Ultsch, A. Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data. J. Classif. 2020. [Google Scholar] [CrossRef]
- Ristoski, P.; Paulheim, H. Semantic Web in data mining and knowledge discovery: A comprehensive survey. J. Web Semant. 2016, 36, 1–22. [Google Scholar] [CrossRef] [Green Version]
- Ministry of Economy & Development. Open Data API ANAPTYXI.gov.gr. Available online: http://2013.anaptyxi.gov.gr/Default.aspx?tabid=251&language=en-US (accessed on 12 December 2020).
- Chondrokostas, E.; Bratsas, C. Vocabulary of Fiscal Projects: VFP v1.1.0. 2019. Available online: https://doi.org/10.5281/zenodo.3242356 (accessed on 3 January 2021).
- Chondrokostas, E.; Bratsas, C. National Strategic Reference Framework Vocabulary: NSRF-GR v1.1.0. 2019. Available online: https://doi.org/10.5281/zenodo.3242355 (accessed on 3 January 2021).
- Knap, T.; Hanecák, P.; Klímek, J.; Mader, C.; Necaský, M.; Nuffelen, B.V.; Skoda, P. UnifiedViews: An ETL tool for RDF data management. Semant. Web 2018, 9, 661–676. [Google Scholar] [CrossRef] [Green Version]
- Filippidis, P.; Karampatakis, S.; Koupidis, K.; Ioannidis, L.; Bratsas, C. The code lists case: Identifying and linking the key parts of fiscal datasets. In Proceedings of the 2016 11th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), Thessaloniki, Greece, 20–21 October 2016; pp. 165–170. [Google Scholar] [CrossRef]
- Koupidis, K.; Bratsas, C.; Karampatakis, S.; Martzopoulou, A.; Antoniou, I. Fiscal Knowledge discovery in Municipalities of Athens and Thessaloniki via Linked Open Data. In Proceedings of the 2016 11th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), Thessaloniki, Greece, 20–21 October 2016; pp. 171–176. [Google Scholar] [CrossRef]
- Karampatakis, S. RDFBrowser: An Open Source Linked Data Content Negotiator and HTML Descritption Generator. Available online: https://github.com/okgreece/RDFBrowser (accessed on 12 December 2020).
- Smith, M.; Haji Omar, N.; Iskandar Zulkarnain Sayd Idris, S.; Baharuddin, I. Auditors’ perception of fraud risk indicators: Malaysian evidence. Manag. Audit. J. 2005, 20, 73–85. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996, 96, 34. [Google Scholar]
- Kassambara, A. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning; STHDA: Montpellier, France, 2017; Volume 1. [Google Scholar]
- Xie, J.; Xiong, Z.Y.; Zhang, Y.F.; Feng, Y.; Ma, J. Density core-based clustering algorithm with dynamic scanning radius. Knowl.-Based Syst. 2018, 142, 58–70. [Google Scholar] [CrossRef]
- Zhou, Z.; Si, G.; Zhang, Y.; Zheng, K. Robust clustering by identifying the veins of clusters based on kernel density estimation. Knowl.-Based Syst. 2018, 159, 309–320. [Google Scholar] [CrossRef]
- Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. Data Min. Knowl. Discov. 1998, 2, 169–194. [Google Scholar] [CrossRef]
- Office of the State Comptroller, State of New York. Red Flags for Fraud. Available online: https://www.osc.state.ny.us/localgov/pubs/red_flags_fraud.pdf. (accessed on 12 December 2020).
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
- RStudio Team. RStudio: Integrated Development Environment for R; RStudio, Inc.: Boston, MA, USA, 2018. [Google Scholar]
- Chang, W.; Cheng, J.; Allaire, J.; Xie, Y.; McPherson, J. Shiny: Web Application Framework for R. Available online: https://cran.r-project.org/web/packages/shiny/index.html (accessed on 12 December 2020).
- Van Hage, W.R.; Tomi, K.; Graeler, B.; Davis, C.; Hoeksema, J.; Ruttenberg, A.; Bahls, D. SPARQL: SPARQL Client. Available online: https://cran.r-project.org/web/packages/SPARQL/SPARQL.pdf. (accessed on 12 December 2020).
- Hahsler, M.; Piekenbrock, M. dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms. Available online: https://github.com/mhahsler/dbscan (accessed on 12 December 2020).
- Sievert, C. plotly for R; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
- Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016. [Google Scholar]
- Hafen, R.; Continuum Analytics, Inc. rbokeh: R Interface for Bokeh, R package version 0.5.0; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
- Xie, Y.; Cheng, J.; Tan, X. DT: A Wrapper of the JavaScript Library ’DataTables’, R package version 0.2; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
- Chang, W. Shinythemes: Themes for Shiny, R package version 1.1.1; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
- Attali, D. Shinyjs: Easily Improve the User Experience of Your Shiny Apps in Seconds, R package version 0.9; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
- Chang, W.; Borges Ribeiro, B. Shinydashboard: Create Dashboards with ’Shiny’, R package version 0.5.3; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
- Altman, D.G.; Bland, J.M. Statistics Notes: Diagnostic tests 2: Predictive values. BMJ 1994, 309, 102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Berners-Lee, T. Design Issues: Linked Data. 2006. Available online: http://www.w3.org/DesignIssues/LinkedData.html (accessed on 12 December 2020).
- Bizer, C.; Heath, T.; Berners-Lee, T. Linked Data–The Story So Far. Int. J. Semantic Web Inf. Syst. 2009, 5, 1–22. [Google Scholar] [CrossRef] [Green Version]
- Bratsas, C.; Alexiou, S.; Kontokostas, D.; Parapontis, I.; Antoniou, I.; Metakides, G. Greek Open Data in the Age of Linked Data: A Demonstration of LOD Internationalization. Available online: https://doi.org/10.2139/ssrn.2088076 (accessed on 3 January 2021).
- Kontokostas, D.; Bratsas, C.; Auer, S.; Hellmann, S.; Antoniou, I.; Metakides, G. Internationalization of Linked Data: The case of the Greek DBpedia edition. J. Web Semant. 2012, 15, 51–61. [Google Scholar] [CrossRef]
- Kontokostas, D.; Bratsas, C.; Auer, S.; Hellmann, S.; Antoniou, I.; Metakides, G. Towards linked data internationalization-realizing the greek dbpedia. In Proceedings of the ACM WebSci’11, Koblenz, Germany, 14–17 June 2011. [Google Scholar]
- Gayo, J.E.L.; Prud’hommeaux, E.; Boneva, I.; Kontokostas, D. Validating RDF Data. Synth. Lect. Semant. Web Theory Technol. 2017, 7, 1–328. [Google Scholar] [CrossRef]
- Melidis, A.; Deligiannis, A.; Priftis, A. Greece End-of-Term Report 2016-2018 Open Government Partnership; Independent Reporting Mechanism (IRM): Athens, Greece, 2019; pp. 70–71. [Google Scholar]
1. | |
2. | |
3. | |
4. | |
5. | |
6. | |
7. | |
8. | |
9. | |
10. | |
11. | |
12. | |
13. | |
14. | |
15. | |
16. | |
17. | |
18. | |
19. | |
20. |
Class | Label | Subclass of |
---|---|---|
vfp:Project | Financial or fiscal Project. It may refer to a construction project or a grant. | foaf:Project |
vfp:Organization | Organization related to the project. It includes beneficiaries, contractors/implementers or any other bodies involved in the project. | vcard:Organization |
vfp:Document | Documents, images or URLs associated with the project. | foaf:Document |
vfp:Place | Place associated with the project. | dbo:Place |
MIS | Budget | Contracts | Payments | Start Date | End Date |
---|---|---|---|---|---|
200000 | 1,465,906 | 1,465,906 | 1,465,906 | 2009-09-16 | 2010-12-31 |
200010 | 25,346,422 | 25,346,422 | 25,346,422 | 2009-10-12 | 2016-12-31 |
200054 | 6,347,801 | 6,259,160 | 5,661,888 | 2009-03-23 | 2015-12-31 |
200056 | 19,495,000 | 18,934,124 | 18,934,124 | 2009-01-01 | 2015-11-30 |
200059 | 7,011,500 | 6,817,449 | 6,801,507 | 2009-08-14 | 2015-12-31 |
200065 | 9,152,263 | 9,152,263 | 9,152,263 | 2010-12-22 | 2015-11-30 |
200101 | 4,543,729 | 3,165,602 | 754,299 | 2012-07-19 | 2015-12-31 |
200111 | 421,780 | 421,780 | 421,780 | 2010-04-29 | 2011-06-29 |
200112 | 173,720 | 173,720 | 173,720 | 2010-04-01 | 2011-06-30 |
200115 | 55,000 | 55,000 | 55,000 | 2010-09-01 | 2013-12-31 |
MIS | Completion | Payment Completion | Contract Completion |
---|---|---|---|
491704 | 0.81 | 0.84 | 0.97 |
524889 | 0.81 | 1.00 | 0.81 |
524944 | 0.81 | 1.00 | 0.81 |
525053 | 0.81 | 0.81 | 1.00 |
525097 | 0.81 | 0.81 | 1.00 |
216685 | 0.80 | 1.00 | 0.80 |
216686 | 0.80 | 1.00 | 0.80 |
217143 | 0.80 | 1.00 | 0.80 |
217183 | 0.80 | 0.80 | 1.00 |
270967 | 0.80 | 1.00 | 0.80 |
Performance Indicators | Mean | SD | Min | Max |
---|---|---|---|---|
Completion | 0.89 | 0.23 | 0 | 1.61 |
Payment Completion | 0.94 | 0.19 | 0 | 1.61 |
Contract Completion | 0.94 | 0.17 | 0 | 6.91 |
Cluster | Members |
---|---|
1 | 8150 |
4 | 510 |
0 | 506 |
2 | 286 |
10 | 163 |
3 | 153 |
9 | 139 |
29 | 132 |
6 | 113 |
17 | 107 |
Rejected | Not Rejected | Total | |
---|---|---|---|
Red Flag | TP = 312 | FP = 3096 | 3408 |
No Red Flag | FN = 126 | TN = 8024 | 8150 |
Total | 438 | 11,120 | 11,558 |
Rejected | Not Rejected | Total | |
---|---|---|---|
Red Flag | |||
No Red Flag | |||
Total | 1 |
Rejected | Not Rejected | |
---|---|---|
Red Flag | ||
No Red Flag |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bratsas, C.; Chondrokostas, E.; Koupidis, K.; Antoniou, I. The Use of National Strategic Reference Framework Data in Knowledge Graphs and Data Mining to Identify Red Flags. Data 2021, 6, 2. https://doi.org/10.3390/data6010002
Bratsas C, Chondrokostas E, Koupidis K, Antoniou I. The Use of National Strategic Reference Framework Data in Knowledge Graphs and Data Mining to Identify Red Flags. Data. 2021; 6(1):2. https://doi.org/10.3390/data6010002
Chicago/Turabian StyleBratsas, Charalampos, Evangelos Chondrokostas, Kleanthis Koupidis, and Ioannis Antoniou. 2021. "The Use of National Strategic Reference Framework Data in Knowledge Graphs and Data Mining to Identify Red Flags" Data 6, no. 1: 2. https://doi.org/10.3390/data6010002
APA StyleBratsas, C., Chondrokostas, E., Koupidis, K., & Antoniou, I. (2021). The Use of National Strategic Reference Framework Data in Knowledge Graphs and Data Mining to Identify Red Flags. Data, 6(1), 2. https://doi.org/10.3390/data6010002