Abstract
In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on tabular data only, considering each contract separately, and are therefore very sensitive to this issue. In this work, we adopt a graph-based method allowing leveraging relations between contracts, to compensate for the missing attributes. We propose PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs. Notably, it is able to identify induced subgraphs, a type of pattern widely overlooked in the literature. When benchmarked on standard datasets, its predictive performance is on par with state-of-the-art methods, with the additional advantage of being explainable. These experiments also reveal that induced patterns are more discriminative on certain datasets. When applying PANG to public procurement data, the prediction is superior to other methods, and it identifies subgraph patterns that are characteristic of fraud-prone situations, thereby making it possible to better understand fraudulent behavior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Acosta-Mendoza, N., Gago-Alonso, A., Carrasco-Ochoa, J.A., Francisco Martínez-Trinidad, J., Eladio Medina-Pagola, J.: Improving graph-based image classification by using emerging patterns as attributes. Eng. Appl. Artif. Intell. 50, 215–225 (2016). https://doi.org/10.1016/j.engappai.2016.01.030
Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29(3), 626–688 (2014). https://doi.org/10.1007/s10618-014-0365-y
Carneiro, D., Veloso, P., Ventura, A., Palumbo, G., Costa, J.: Network analysis for fraud detection in Portuguese public procurement. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds.) IDEAL 2020. LNCS, vol. 12490, pp. 390–401. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62365-4_37
Carvalho, R.N., Matsumoto, S., Laskey, K.B., Costa, P.C.G., Ladeira, M., Santos, L.L.: Probabilistic ontology and knowledge fusion for procurement fraud detection in Brazil. In: Bobillo, F., et al. (eds.) UniDL/URSW 2008-2010. LNCS (LNAI), vol. 7123, pp. 19–40. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35975-0_2
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011). https://doi.org/10.1145/1961189.1961199
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75
CSIRO’s Data61: Stellargraph machine learning library (2018). https://github.com/stellargraph/stellargraph
Debnath, A.S., Lopez, R.L., Debnath, G., Shusterman, A., Hansch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 34(2), 786–797 (1991). https://doi.org/10.1021/jm00106a046
Dobson, P.D., Doig, A.J.: Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol. 330(4), 771–783 (2003). https://doi.org/10.1016/s0022-2836(03)00628-4
Dou, Y., Shu, K., Xia, C., Yu, P.S., Sun, L.: User preference-aware fake news detection. In: 44th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 2051–2055 (2021). https://doi.org/10.1145/3404835.3462990
European Union: Tenders Electronic Daily (2023). https://ted.europa.eu/
Falcón-Cortés, A., Aldana, A., Larralde, H.: Practices of public procurement and the risk of corrupt behavior before and after the government transition in México. EPJ Data Science 11, 19 (2022). https://doi.org/10.1140/epjds/s13688-022-00329-7
Fazekas, M., Tóth, I.J.: New ways to measure institutionalised grand corruption in public procurement. Technical report, U4 Anti-Corruption Resource Centre (2014). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2519385
Fazekas, M., Tóth, I.J.: From corruption to state capture: a new analytical framework with empirical applications from Hungary. Polit. Res. Q. 69(2), 320–334 (2016). https://doi.org/10.1177/1065912916639137
Ferwerda, J., Deleanu, I., Unger, B.: Corruption in public procurement: finding the right indicators. Eur. J. Crim. Policy Res. 23(2), 245–267 (2017). https://doi.org/10.1007/s10610-016-9312-3
Ferwerda, J., Deleanu, I.S.: Identifying and reducing corruption in public procurement in the EU. Technical report, European Commission (2013). https://ec.europa.eu/anti-fraud/sites/antifraud/files/docs/body/identifying_reducing_corruption_in_public_procurement_en.pdf
Fournier-Viger, P., Cheng, C., Lin, J.C.-W., Yun, U., Kiran, R.U.: TKG: efficient mining of top-k frequent subgraphs. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P.K. (eds.) BDA 2019. LNCS, vol. 11932, pp. 209–226. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37188-3_13
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: 7th Python in Science Conference, pp. 11–15 (2008). https://conference.scipy.org/proceedings/SciPy2008/paper_2/
Houbraken, M., Demeyer, S., Michoel, T., Audenaert, P., Colle, D., Pickavet, M.: The index-based subgraph matching algorithm with general symmetries (ISMAGS): exploiting symmetry for faster subgraph enumeration. PLoS ONE 9(5), e97896 (2014). https://doi.org/10.1371/journal.pone.0097896
Hsieh, S.-M., Hsu, C.-C., Hsu, L.-F.: Efficient method to perform isomorphism testing of labeled graphs. In: Gavrilova, M.L., et al. (eds.) ICCSA 2006. LNCS, vol. 3984, pp. 422–431. Springer, Heidelberg (2006). https://doi.org/10.1007/11751649_46
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: 3rd IEEE International Conference on Data Mining (2003). https://doi.org/10.1109/icdm.2003.1250974
Kane, B., Cuissart, B., Crémilleux, B.: Minimal jumping emerging patterns: computation and practical assessment. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 722–733. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18038-0_56
Kim, H., Lee, B.S., Shin, W.Y., Lim, S.: Graph anomaly detection with graph neural networks: current status and challenges. IEEE Access 10, 111820–111829 (2022). https://doi.org/10.1109/access.2022.3211306
Kriege, N.M., Giscard, P.L., Wilson, R.: On valid optimal assignment kernels and applications to graph classification. In: 30th International Conference on Neural Information Processing Systems, pp. 1623–1631 (2016). https://proceedings.neurips.cc/paper_files/paper/2016/hash/0efe32849d230d7f53049ddc4a4b0c60-Abstract.html
Li, R., Wang, W.: REAFUM: representative approximate frequent subgraph mining. In: SIAM International Conference on Data Mining, pp. 757–765 (2015). https://doi.org/10.1137/1.9781611974010.85
Loyola-González, O., Medina-Pérez, M.A., Choo, K.R.: A review of supervised classification based on contrast patterns: applications, trends, and challenges. J. Grid Comput. 18(4), 797–845 (2020). https://doi.org/10.1007/s10723-020-09526-y
Luo, X., et al.: Deep graph level anomaly detection with contrastive learning. Sci. Rep. 12, 19867 (2022). https://doi.org/10.1038/s41598-022-22086-3
Ma, R., Pang, G., Chen, L., van den Hengel, A.: Deep graph-level anomaly detection by glocal knowledge distillation. In: 15th ACM International Conference on Web Search and Data Mining, pp. 704–714 (2022). https://doi.org/10.1145/3488560.3498473
Ma, X., et al.: A comprehensive survey on graph anomaly detection with deep learning. IEEE Trans. Knowl. Data Eng. (2021, in press). https://doi.org/10.1109/TKDE.2021.3118815
Malik, R., Khan, K.U., Nawaz, W.: Maximal gSpan: multi-document summarization through frequent subgraph mining. In: 17th International Conference on Ubiquitous Information Management and Communication, pp. 1–7 (2023). https://doi.org/10.1109/imcom56909.2023.10035618
Maréchal, F., Morand, P.H.: Are social and environmental clauses a tool for favoritism? Analysis of French public procurement contracts. Eur. J. Polit. Econ. 73, 102140 (2022). https://doi.org/10.1016/j.ejpoleco.2021.102140
Métivier, J.P., et al.: Discovering structural alerts for mutagenicity using stable emerging molecular patterns. J. Chem. Inf. Model. 55(5), 925–940 (2015). https://doi.org/10.1021/ci500611v
Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput. Surv. 45(2), 1–39 (2013). https://doi.org/10.1145/2431211.2431218
Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: graph2vec: learning distributed representations of graphs. In: 13th International Workshop on Mining and Learning with Graphs, p. 21 (2017). https://arxiv.org/abs/1707.05005
National Fraud Authority: Red flags for integrity: Giving the green light to open data solutions. Technical report, Open Contracting Partnership, Development Gateway (2016). https://www.open-contracting.org/wp-content/uploads/2016/11/OCP2016-Red-flags-for-integrityshared-1.pdf
Potin, L., Labatut, V., Figueiredo, R., Largeron, C., Morand, P.H.: FOPPA: a database of French Open Public Procurement Award notices. Technical report, Avignon Université (2022). https://hal.archives-ouvertes.fr/hal-03796734
Potin, L., Labatut, V., Largeron, C., Morand, P.H.: FOPPA: an open database of French public procurement award notices from 2010–2020. Sci. Data 10, 303 (2023). https://doi.org/10.1038/s41597-023-02213-z
Pourhabibi, T., Ong, K.L., Kam, B.H., Boo, Y.L.: Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020). https://doi.org/10.1016/j.dss.2020.113303
Rizzo, I.: Efficiency and integrity issues in public procurement performance. J. Public Finance Public Choice 31(1–3), 111–128 (2013). https://doi.org/10.1332/251569213x15664519748613
Rozemberczki, B., Kiss, O., Sarkar, R.: Karate Club: an API oriented open-source Python framework for unsupervised learning on graphs. In: 29th ACM International Conference on Information and Knowledge Management, pp. 3125–3132 (2020). https://doi.org/10.1145/3340531.3412757
Shaul, Z., Naaz, S.: cgSpan: closed graph-based substructure pattern mining. In: IEEE International Conference on Big Data (2021). https://doi.org/10.1109/BigData52589.2021.9671995
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12(77), 2539–2561 (2011). http://jmlr.org/papers/v12/shervashidze11a.html
Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgiannis, M.: GraKeL: a graph kernel library in Python. J. Mach. Learn. Res. 21(54), 1–5 (2020). https://www.jmlr.org/papers/v21/18-370.html
Thoma, M., et al.: Discriminative frequent subgraph mining with optimality guarantees. Stat. Anal. Data Min. 3(5), 302–318 (2010). https://doi.org/10.1002/sam.10084
Toivonen, H., Srinivasan, A., King, R.D., Kramer, S., Helma, C.: Statistical evaluation of the predictive toxicology challenge 2000–2001. Bioinformatics 19(10), 1183–1193 (2003). https://doi.org/10.1093/bioinformatics/btg130
Wachs, J., Kertész, J.: A network approach to cartel detection in public auction markets. Sci. Rep. 9, 10818 (2019). https://doi.org/10.1038/s41598-019-47198-1
Wale, N., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. In: 6th International Conference on Data Mining, pp. 678–689 (2006). https://doi.org/10.1109/icdm.2006.39
Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: ACM SIGMOD International Conference on Management of Data, pp. 433–444 (2008). https://doi.org/10.1145/1376616.1376662
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002). https://doi.org/10.1109/ICDM.2002.1184038
Yang, Z., Zhang, G., Wu, J., Yang, J.: A comprehensive survey of graph-level learning. arXiv cs.LG, 2301.05860 (2023). https://arxiv.org/abs/2301.05860
Yuan, H., Yu, H., Gui, S., Ji, S.: Explainability in graph neural networks: a taxonomic survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022, in press). https://doi.org/10.1109/tpami.2022.3204236
Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An end-to-end deep learning architecture for graph classification. In: AAAI Conference on Artificial Intelligence, vol. 32, pp. 4438–4445 (2018). https://doi.org/10.1609/aaai.v32i1.11782
Acknowledgments
This work was supported by Agorantic (FR 3621), and the ANR under grant number ANR-19-CE38-0004 for the DeCoMaP project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Implications
Anomaly detection can have ethical implications, for instance if the methods are used to discriminate against certain individuals. In this respect, however, our PANG methodological framework does not present any more risk than the supervised classification methods developed in machine learning.
Moreover, this work takes place in the framework of a project aiming, among other things, at proposing ways of automatically red flagging contracts and economic agents depending on fraud risk. Therefore, the method that we propose is meant to be used by public authorities to better regulate public procurement and the management of the related open data.
Finally, the data used in this article are publicly shared, and were collected from a public open data repository handled by the European Union. They do not contain any personal information, and cannot be used directly to infer any personal information, as they only describe the economic transactions of companies and public institutions regarding public procurement.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Potin, L., Figueiredo, R., Labatut, V., Largeron, C. (2023). Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-43427-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)