Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14174))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1270 Accesses
3 Citations

Abstract

In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on tabular data only, considering each contract separately, and are therefore very sensitive to this issue. In this work, we adopt a graph-based method allowing leveraging relations between contracts, to compensate for the missing attributes. We propose PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs. Notably, it is able to identify induced subgraphs, a type of pattern widely overlooked in the literature. When benchmarked on standard datasets, its predictive performance is on par with state-of-the-art methods, with the additional advantage of being explainable. These experiments also reveal that induced patterns are more discriminative on certain datasets. When applying PANG to public procurement data, the prediction is superior to other methods, and it identifies subgraph patterns that are characteristic of fraud-prone situations, thereby making it possible to better understand fraudulent behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

A Graph-Based Approach to Detect Anomalies Based on Shared Attribute Values

GCNXG: Detecting Fraudulent Activities in Financial Networks: A Graph Analytics and Machine Learning Fusion

Notes

1.
https://github.com/CompNet/Pang/releases/tag/v1.0.0 .

References

Acosta-Mendoza, N., Gago-Alonso, A., Carrasco-Ochoa, J.A., Francisco Martínez-Trinidad, J., Eladio Medina-Pagola, J.: Improving graph-based image classification by using emerging patterns as attributes. Eng. Appl. Artif. Intell. 50, 215–225 (2016). https://doi.org/10.1016/j.engappai.2016.01.030
Article Google Scholar
Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29(3), 626–688 (2014). https://doi.org/10.1007/s10618-014-0365-y
Article MathSciNet Google Scholar
Carneiro, D., Veloso, P., Ventura, A., Palumbo, G., Costa, J.: Network analysis for fraud detection in Portuguese public procurement. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds.) IDEAL 2020. LNCS, vol. 12490, pp. 390–401. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62365-4_37
Chapter Google Scholar
Carvalho, R.N., Matsumoto, S., Laskey, K.B., Costa, P.C.G., Ladeira, M., Santos, L.L.: Probabilistic ontology and knowledge fusion for procurement fraud detection in Brazil. In: Bobillo, F., et al. (eds.) UniDL/URSW 2008-2010. LNCS (LNAI), vol. 7123, pp. 19–40. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35975-0_2
Chapter Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011). https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75
Article Google Scholar
CSIRO’s Data61: Stellargraph machine learning library (2018). https://github.com/stellargraph/stellargraph
Debnath, A.S., Lopez, R.L., Debnath, G., Shusterman, A., Hansch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 34(2), 786–797 (1991). https://doi.org/10.1021/jm00106a046
Dobson, P.D., Doig, A.J.: Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol. 330(4), 771–783 (2003). https://doi.org/10.1016/s0022-2836(03)00628-4
Article Google Scholar
Dou, Y., Shu, K., Xia, C., Yu, P.S., Sun, L.: User preference-aware fake news detection. In: 44th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 2051–2055 (2021). https://doi.org/10.1145/3404835.3462990
European Union: Tenders Electronic Daily (2023). https://ted.europa.eu/
Falcón-Cortés, A., Aldana, A., Larralde, H.: Practices of public procurement and the risk of corrupt behavior before and after the government transition in México. EPJ Data Science 11, 19 (2022). https://doi.org/10.1140/epjds/s13688-022-00329-7
Article Google Scholar
Fazekas, M., Tóth, I.J.: New ways to measure institutionalised grand corruption in public procurement. Technical report, U4 Anti-Corruption Resource Centre (2014). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2519385
Fazekas, M., Tóth, I.J.: From corruption to state capture: a new analytical framework with empirical applications from Hungary. Polit. Res. Q. 69(2), 320–334 (2016). https://doi.org/10.1177/1065912916639137
Article Google Scholar
Ferwerda, J., Deleanu, I., Unger, B.: Corruption in public procurement: finding the right indicators. Eur. J. Crim. Policy Res. 23(2), 245–267 (2017). https://doi.org/10.1007/s10610-016-9312-3
Article Google Scholar
Ferwerda, J., Deleanu, I.S.: Identifying and reducing corruption in public procurement in the EU. Technical report, European Commission (2013). https://ec.europa.eu/anti-fraud/sites/antifraud/files/docs/body/identifying_reducing_corruption_in_public_procurement_en.pdf
Fournier-Viger, P., Cheng, C., Lin, J.C.-W., Yun, U., Kiran, R.U.: TKG: efficient mining of top-k frequent subgraphs. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P.K. (eds.) BDA 2019. LNCS, vol. 11932, pp. 209–226. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37188-3_13
Chapter Google Scholar
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Chapter Google Scholar
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: 7th Python in Science Conference, pp. 11–15 (2008). https://conference.scipy.org/proceedings/SciPy2008/paper_2/
Houbraken, M., Demeyer, S., Michoel, T., Audenaert, P., Colle, D., Pickavet, M.: The index-based subgraph matching algorithm with general symmetries (ISMAGS): exploiting symmetry for faster subgraph enumeration. PLoS ONE 9(5), e97896 (2014). https://doi.org/10.1371/journal.pone.0097896
Article Google Scholar
Hsieh, S.-M., Hsu, C.-C., Hsu, L.-F.: Efficient method to perform isomorphism testing of labeled graphs. In: Gavrilova, M.L., et al. (eds.) ICCSA 2006. LNCS, vol. 3984, pp. 422–431. Springer, Heidelberg (2006). https://doi.org/10.1007/11751649_46
Chapter Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: 3rd IEEE International Conference on Data Mining (2003). https://doi.org/10.1109/icdm.2003.1250974
Kane, B., Cuissart, B., Crémilleux, B.: Minimal jumping emerging patterns: computation and practical assessment. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 722–733. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18038-0_56
Chapter Google Scholar
Kim, H., Lee, B.S., Shin, W.Y., Lim, S.: Graph anomaly detection with graph neural networks: current status and challenges. IEEE Access 10, 111820–111829 (2022). https://doi.org/10.1109/access.2022.3211306
Article Google Scholar
Kriege, N.M., Giscard, P.L., Wilson, R.: On valid optimal assignment kernels and applications to graph classification. In: 30th International Conference on Neural Information Processing Systems, pp. 1623–1631 (2016). https://proceedings.neurips.cc/paper_files/paper/2016/hash/0efe32849d230d7f53049ddc4a4b0c60-Abstract.html
Li, R., Wang, W.: REAFUM: representative approximate frequent subgraph mining. In: SIAM International Conference on Data Mining, pp. 757–765 (2015). https://doi.org/10.1137/1.9781611974010.85
Loyola-González, O., Medina-Pérez, M.A., Choo, K.R.: A review of supervised classification based on contrast patterns: applications, trends, and challenges. J. Grid Comput. 18(4), 797–845 (2020). https://doi.org/10.1007/s10723-020-09526-y
Article Google Scholar
Luo, X., et al.: Deep graph level anomaly detection with contrastive learning. Sci. Rep. 12, 19867 (2022). https://doi.org/10.1038/s41598-022-22086-3
Article Google Scholar
Ma, R., Pang, G., Chen, L., van den Hengel, A.: Deep graph-level anomaly detection by glocal knowledge distillation. In: 15th ACM International Conference on Web Search and Data Mining, pp. 704–714 (2022). https://doi.org/10.1145/3488560.3498473
Ma, X., et al.: A comprehensive survey on graph anomaly detection with deep learning. IEEE Trans. Knowl. Data Eng. (2021, in press). https://doi.org/10.1109/TKDE.2021.3118815
Malik, R., Khan, K.U., Nawaz, W.: Maximal gSpan: multi-document summarization through frequent subgraph mining. In: 17th International Conference on Ubiquitous Information Management and Communication, pp. 1–7 (2023). https://doi.org/10.1109/imcom56909.2023.10035618
Maréchal, F., Morand, P.H.: Are social and environmental clauses a tool for favoritism? Analysis of French public procurement contracts. Eur. J. Polit. Econ. 73, 102140 (2022). https://doi.org/10.1016/j.ejpoleco.2021.102140
Article Google Scholar
Métivier, J.P., et al.: Discovering structural alerts for mutagenicity using stable emerging molecular patterns. J. Chem. Inf. Model. 55(5), 925–940 (2015). https://doi.org/10.1021/ci500611v
Article Google Scholar
Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput. Surv. 45(2), 1–39 (2013). https://doi.org/10.1145/2431211.2431218
Article MATH Google Scholar
Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: graph2vec: learning distributed representations of graphs. In: 13th International Workshop on Mining and Learning with Graphs, p. 21 (2017). https://arxiv.org/abs/1707.05005
National Fraud Authority: Red flags for integrity: Giving the green light to open data solutions. Technical report, Open Contracting Partnership, Development Gateway (2016). https://www.open-contracting.org/wp-content/uploads/2016/11/OCP2016-Red-flags-for-integrityshared-1.pdf
Potin, L., Labatut, V., Figueiredo, R., Largeron, C., Morand, P.H.: FOPPA: a database of French Open Public Procurement Award notices. Technical report, Avignon Université (2022). https://hal.archives-ouvertes.fr/hal-03796734
Potin, L., Labatut, V., Largeron, C., Morand, P.H.: FOPPA: an open database of French public procurement award notices from 2010–2020. Sci. Data 10, 303 (2023). https://doi.org/10.1038/s41597-023-02213-z
Article Google Scholar
Pourhabibi, T., Ong, K.L., Kam, B.H., Boo, Y.L.: Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020). https://doi.org/10.1016/j.dss.2020.113303
Article Google Scholar
Rizzo, I.: Efficiency and integrity issues in public procurement performance. J. Public Finance Public Choice 31(1–3), 111–128 (2013). https://doi.org/10.1332/251569213x15664519748613
Article Google Scholar
Rozemberczki, B., Kiss, O., Sarkar, R.: Karate Club: an API oriented open-source Python framework for unsupervised learning on graphs. In: 29th ACM International Conference on Information and Knowledge Management, pp. 3125–3132 (2020). https://doi.org/10.1145/3340531.3412757
Shaul, Z., Naaz, S.: cgSpan: closed graph-based substructure pattern mining. In: IEEE International Conference on Big Data (2021). https://doi.org/10.1109/BigData52589.2021.9671995
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12(77), 2539–2561 (2011). http://jmlr.org/papers/v12/shervashidze11a.html
Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgiannis, M.: GraKeL: a graph kernel library in Python. J. Mach. Learn. Res. 21(54), 1–5 (2020). https://www.jmlr.org/papers/v21/18-370.html
Thoma, M., et al.: Discriminative frequent subgraph mining with optimality guarantees. Stat. Anal. Data Min. 3(5), 302–318 (2010). https://doi.org/10.1002/sam.10084
Article MathSciNet MATH Google Scholar
Toivonen, H., Srinivasan, A., King, R.D., Kramer, S., Helma, C.: Statistical evaluation of the predictive toxicology challenge 2000–2001. Bioinformatics 19(10), 1183–1193 (2003). https://doi.org/10.1093/bioinformatics/btg130
Article Google Scholar
Wachs, J., Kertész, J.: A network approach to cartel detection in public auction markets. Sci. Rep. 9, 10818 (2019). https://doi.org/10.1038/s41598-019-47198-1
Article Google Scholar
Wale, N., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. In: 6th International Conference on Data Mining, pp. 678–689 (2006). https://doi.org/10.1109/icdm.2006.39
Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: ACM SIGMOD International Conference on Management of Data, pp. 433–444 (2008). https://doi.org/10.1145/1376616.1376662
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002). https://doi.org/10.1109/ICDM.2002.1184038
Yang, Z., Zhang, G., Wu, J., Yang, J.: A comprehensive survey of graph-level learning. arXiv cs.LG, 2301.05860 (2023). https://arxiv.org/abs/2301.05860
Yuan, H., Yu, H., Gui, S., Ji, S.: Explainability in graph neural networks: a taxonomic survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022, in press). https://doi.org/10.1109/tpami.2022.3204236
Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An end-to-end deep learning architecture for graph classification. In: AAAI Conference on Artificial Intelligence, vol. 32, pp. 4438–4445 (2018). https://doi.org/10.1609/aaai.v32i1.11782

Download references

Acknowledgments

This work was supported by Agorantic (FR 3621), and the ANR under grant number ANR-19-CE38-0004 for the DeCoMaP project.

Author information

Authors and Affiliations

Laboratoire Informatique d’Avignon – UPR 4128, 84911, Avignon, France
Lucas Potin, Rosa Figueiredo & Vincent Labatut
Laboratoire Hubert Curien – UMR 5516, 42023, Saint-Etienne, France
Christine Largeron

Authors

Lucas Potin
View author publications
You can also search for this author in PubMed Google Scholar
Rosa Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Labatut
View author publications
You can also search for this author in PubMed Google Scholar
Christine Largeron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas Potin .

Editor information

Editors and Affiliations

CENTAI, Turin, Italy
Gianmarco De Francisci Morales
NYU and Two Sigma, New York, NY, USA
Claudia Perlich
Netflix, Los Angeles, CA, USA
Natali Ruchansky
Telefonica Research, Barcelona, Spain
Nicolas Kourtellis
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Implications

Anomaly detection can have ethical implications, for instance if the methods are used to discriminate against certain individuals. In this respect, however, our PANG methodological framework does not present any more risk than the supervised classification methods developed in machine learning.

Moreover, this work takes place in the framework of a project aiming, among other things, at proposing ways of automatically red flagging contracts and economic agents depending on fraud risk. Therefore, the method that we propose is meant to be used by public authorities to better regulate public procurement and the management of the related open data.

Finally, the data used in this article are publicly shared, and were collected from a public open data repository handled by the European Union. They do not contain any personal information, and cannot be used directly to infer any personal information, as they only describe the economic transactions of companies and public institutions regarding public procurement.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Potin, L., Figueiredo, R., Labatut, V., Largeron, C. (2023). Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-43427-3_5
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

A Graph-Based Approach to Detect Anomalies Based on Shared Attribute Values

GCNXG: Detecting Fraudulent Activities in Financial Networks: A Graph Analytics and Machine Learning Fusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Implications

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

A Graph-Based Approach to Detect Anomalies Based on Shared Attribute Values

GCNXG: Detecting Fraudulent Activities in Financial Networks: A Graph Analytics and Machine Learning Fusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Implications

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation