Nothing Special   »   [go: up one dir, main page]

Skip to main content

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track (ECML PKDD 2023)

Abstract

In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on tabular data only, considering each contract separately, and are therefore very sensitive to this issue. In this work, we adopt a graph-based method allowing leveraging relations between contracts, to compensate for the missing attributes. We propose PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs. Notably, it is able to identify induced subgraphs, a type of pattern widely overlooked in the literature. When benchmarked on standard datasets, its predictive performance is on par with state-of-the-art methods, with the additional advantage of being explainable. These experiments also reveal that induced patterns are more discriminative on certain datasets. When applying PANG to public procurement data, the prediction is superior to other methods, and it identifies subgraph patterns that are characteristic of fraud-prone situations, thereby making it possible to better understand fraudulent behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/CompNet/Pang/releases/tag/v1.0.0 .

References

  1. Acosta-Mendoza, N., Gago-Alonso, A., Carrasco-Ochoa, J.A., Francisco Martínez-Trinidad, J., Eladio Medina-Pagola, J.: Improving graph-based image classification by using emerging patterns as attributes. Eng. Appl. Artif. Intell. 50, 215–225 (2016). https://doi.org/10.1016/j.engappai.2016.01.030

    Article  Google Scholar 

  2. Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29(3), 626–688 (2014). https://doi.org/10.1007/s10618-014-0365-y

    Article  MathSciNet  Google Scholar 

  3. Carneiro, D., Veloso, P., Ventura, A., Palumbo, G., Costa, J.: Network analysis for fraud detection in Portuguese public procurement. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds.) IDEAL 2020. LNCS, vol. 12490, pp. 390–401. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62365-4_37

    Chapter  Google Scholar 

  4. Carvalho, R.N., Matsumoto, S., Laskey, K.B., Costa, P.C.G., Ladeira, M., Santos, L.L.: Probabilistic ontology and knowledge fusion for procurement fraud detection in Brazil. In: Bobillo, F., et al. (eds.) UniDL/URSW 2008-2010. LNCS (LNAI), vol. 7123, pp. 19–40. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35975-0_2

    Chapter  Google Scholar 

  5. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011). https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  6. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75

    Article  Google Scholar 

  7. CSIRO’s Data61: Stellargraph machine learning library (2018). https://github.com/stellargraph/stellargraph

  8. Debnath, A.S., Lopez, R.L., Debnath, G., Shusterman, A., Hansch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 34(2), 786–797 (1991). https://doi.org/10.1021/jm00106a046

  9. Dobson, P.D., Doig, A.J.: Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol. 330(4), 771–783 (2003). https://doi.org/10.1016/s0022-2836(03)00628-4

    Article  Google Scholar 

  10. Dou, Y., Shu, K., Xia, C., Yu, P.S., Sun, L.: User preference-aware fake news detection. In: 44th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 2051–2055 (2021). https://doi.org/10.1145/3404835.3462990

  11. European Union: Tenders Electronic Daily (2023). https://ted.europa.eu/

  12. Falcón-Cortés, A., Aldana, A., Larralde, H.: Practices of public procurement and the risk of corrupt behavior before and after the government transition in México. EPJ Data Science 11, 19 (2022). https://doi.org/10.1140/epjds/s13688-022-00329-7

    Article  Google Scholar 

  13. Fazekas, M., Tóth, I.J.: New ways to measure institutionalised grand corruption in public procurement. Technical report, U4 Anti-Corruption Resource Centre (2014). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2519385

  14. Fazekas, M., Tóth, I.J.: From corruption to state capture: a new analytical framework with empirical applications from Hungary. Polit. Res. Q. 69(2), 320–334 (2016). https://doi.org/10.1177/1065912916639137

    Article  Google Scholar 

  15. Ferwerda, J., Deleanu, I., Unger, B.: Corruption in public procurement: finding the right indicators. Eur. J. Crim. Policy Res. 23(2), 245–267 (2017). https://doi.org/10.1007/s10610-016-9312-3

    Article  Google Scholar 

  16. Ferwerda, J., Deleanu, I.S.: Identifying and reducing corruption in public procurement in the EU. Technical report, European Commission (2013). https://ec.europa.eu/anti-fraud/sites/antifraud/files/docs/body/identifying_reducing_corruption_in_public_procurement_en.pdf

  17. Fournier-Viger, P., Cheng, C., Lin, J.C.-W., Yun, U., Kiran, R.U.: TKG: efficient mining of top-k frequent subgraphs. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P.K. (eds.) BDA 2019. LNCS, vol. 11932, pp. 209–226. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37188-3_13

    Chapter  Google Scholar 

  18. Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8

    Chapter  Google Scholar 

  19. Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: 7th Python in Science Conference, pp. 11–15 (2008). https://conference.scipy.org/proceedings/SciPy2008/paper_2/

  20. Houbraken, M., Demeyer, S., Michoel, T., Audenaert, P., Colle, D., Pickavet, M.: The index-based subgraph matching algorithm with general symmetries (ISMAGS): exploiting symmetry for faster subgraph enumeration. PLoS ONE 9(5), e97896 (2014). https://doi.org/10.1371/journal.pone.0097896

    Article  Google Scholar 

  21. Hsieh, S.-M., Hsu, C.-C., Hsu, L.-F.: Efficient method to perform isomorphism testing of labeled graphs. In: Gavrilova, M.L., et al. (eds.) ICCSA 2006. LNCS, vol. 3984, pp. 422–431. Springer, Heidelberg (2006). https://doi.org/10.1007/11751649_46

    Chapter  Google Scholar 

  22. Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: 3rd IEEE International Conference on Data Mining (2003). https://doi.org/10.1109/icdm.2003.1250974

  23. Kane, B., Cuissart, B., Crémilleux, B.: Minimal jumping emerging patterns: computation and practical assessment. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 722–733. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18038-0_56

    Chapter  Google Scholar 

  24. Kim, H., Lee, B.S., Shin, W.Y., Lim, S.: Graph anomaly detection with graph neural networks: current status and challenges. IEEE Access 10, 111820–111829 (2022). https://doi.org/10.1109/access.2022.3211306

    Article  Google Scholar 

  25. Kriege, N.M., Giscard, P.L., Wilson, R.: On valid optimal assignment kernels and applications to graph classification. In: 30th International Conference on Neural Information Processing Systems, pp. 1623–1631 (2016). https://proceedings.neurips.cc/paper_files/paper/2016/hash/0efe32849d230d7f53049ddc4a4b0c60-Abstract.html

  26. Li, R., Wang, W.: REAFUM: representative approximate frequent subgraph mining. In: SIAM International Conference on Data Mining, pp. 757–765 (2015). https://doi.org/10.1137/1.9781611974010.85

  27. Loyola-González, O., Medina-Pérez, M.A., Choo, K.R.: A review of supervised classification based on contrast patterns: applications, trends, and challenges. J. Grid Comput. 18(4), 797–845 (2020). https://doi.org/10.1007/s10723-020-09526-y

    Article  Google Scholar 

  28. Luo, X., et al.: Deep graph level anomaly detection with contrastive learning. Sci. Rep. 12, 19867 (2022). https://doi.org/10.1038/s41598-022-22086-3

    Article  Google Scholar 

  29. Ma, R., Pang, G., Chen, L., van den Hengel, A.: Deep graph-level anomaly detection by glocal knowledge distillation. In: 15th ACM International Conference on Web Search and Data Mining, pp. 704–714 (2022). https://doi.org/10.1145/3488560.3498473

  30. Ma, X., et al.: A comprehensive survey on graph anomaly detection with deep learning. IEEE Trans. Knowl. Data Eng. (2021, in press). https://doi.org/10.1109/TKDE.2021.3118815

  31. Malik, R., Khan, K.U., Nawaz, W.: Maximal gSpan: multi-document summarization through frequent subgraph mining. In: 17th International Conference on Ubiquitous Information Management and Communication, pp. 1–7 (2023). https://doi.org/10.1109/imcom56909.2023.10035618

  32. Maréchal, F., Morand, P.H.: Are social and environmental clauses a tool for favoritism? Analysis of French public procurement contracts. Eur. J. Polit. Econ. 73, 102140 (2022). https://doi.org/10.1016/j.ejpoleco.2021.102140

    Article  Google Scholar 

  33. Métivier, J.P., et al.: Discovering structural alerts for mutagenicity using stable emerging molecular patterns. J. Chem. Inf. Model. 55(5), 925–940 (2015). https://doi.org/10.1021/ci500611v

    Article  Google Scholar 

  34. Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput. Surv. 45(2), 1–39 (2013). https://doi.org/10.1145/2431211.2431218

    Article  MATH  Google Scholar 

  35. Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: graph2vec: learning distributed representations of graphs. In: 13th International Workshop on Mining and Learning with Graphs, p. 21 (2017). https://arxiv.org/abs/1707.05005

  36. National Fraud Authority: Red flags for integrity: Giving the green light to open data solutions. Technical report, Open Contracting Partnership, Development Gateway (2016). https://www.open-contracting.org/wp-content/uploads/2016/11/OCP2016-Red-flags-for-integrityshared-1.pdf

  37. Potin, L., Labatut, V., Figueiredo, R., Largeron, C., Morand, P.H.: FOPPA: a database of French Open Public Procurement Award notices. Technical report, Avignon Université (2022). https://hal.archives-ouvertes.fr/hal-03796734

  38. Potin, L., Labatut, V., Largeron, C., Morand, P.H.: FOPPA: an open database of French public procurement award notices from 2010–2020. Sci. Data 10, 303 (2023). https://doi.org/10.1038/s41597-023-02213-z

    Article  Google Scholar 

  39. Pourhabibi, T., Ong, K.L., Kam, B.H., Boo, Y.L.: Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020). https://doi.org/10.1016/j.dss.2020.113303

    Article  Google Scholar 

  40. Rizzo, I.: Efficiency and integrity issues in public procurement performance. J. Public Finance Public Choice 31(1–3), 111–128 (2013). https://doi.org/10.1332/251569213x15664519748613

    Article  Google Scholar 

  41. Rozemberczki, B., Kiss, O., Sarkar, R.: Karate Club: an API oriented open-source Python framework for unsupervised learning on graphs. In: 29th ACM International Conference on Information and Knowledge Management, pp. 3125–3132 (2020). https://doi.org/10.1145/3340531.3412757

  42. Shaul, Z., Naaz, S.: cgSpan: closed graph-based substructure pattern mining. In: IEEE International Conference on Big Data (2021). https://doi.org/10.1109/BigData52589.2021.9671995

  43. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12(77), 2539–2561 (2011). http://jmlr.org/papers/v12/shervashidze11a.html

  44. Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgiannis, M.: GraKeL: a graph kernel library in Python. J. Mach. Learn. Res. 21(54), 1–5 (2020). https://www.jmlr.org/papers/v21/18-370.html

  45. Thoma, M., et al.: Discriminative frequent subgraph mining with optimality guarantees. Stat. Anal. Data Min. 3(5), 302–318 (2010). https://doi.org/10.1002/sam.10084

    Article  MathSciNet  MATH  Google Scholar 

  46. Toivonen, H., Srinivasan, A., King, R.D., Kramer, S., Helma, C.: Statistical evaluation of the predictive toxicology challenge 2000–2001. Bioinformatics 19(10), 1183–1193 (2003). https://doi.org/10.1093/bioinformatics/btg130

    Article  Google Scholar 

  47. Wachs, J., Kertész, J.: A network approach to cartel detection in public auction markets. Sci. Rep. 9, 10818 (2019). https://doi.org/10.1038/s41598-019-47198-1

    Article  Google Scholar 

  48. Wale, N., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. In: 6th International Conference on Data Mining, pp. 678–689 (2006). https://doi.org/10.1109/icdm.2006.39

  49. Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: ACM SIGMOD International Conference on Management of Data, pp. 433–444 (2008). https://doi.org/10.1145/1376616.1376662

  50. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002). https://doi.org/10.1109/ICDM.2002.1184038

  51. Yang, Z., Zhang, G., Wu, J., Yang, J.: A comprehensive survey of graph-level learning. arXiv cs.LG, 2301.05860 (2023). https://arxiv.org/abs/2301.05860

  52. Yuan, H., Yu, H., Gui, S., Ji, S.: Explainability in graph neural networks: a taxonomic survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022, in press). https://doi.org/10.1109/tpami.2022.3204236

  53. Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An end-to-end deep learning architecture for graph classification. In: AAAI Conference on Artificial Intelligence, vol. 32, pp. 4438–4445 (2018). https://doi.org/10.1609/aaai.v32i1.11782

Download references

Acknowledgments

This work was supported by Agorantic (FR 3621), and the ANR under grant number ANR-19-CE38-0004 for the DeCoMaP project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas Potin .

Editor information

Editors and Affiliations

Ethics declarations

Ethical Implications

Anomaly detection can have ethical implications, for instance if the methods are used to discriminate against certain individuals. In this respect, however, our PANG methodological framework does not present any more risk than the supervised classification methods developed in machine learning.

Moreover, this work takes place in the framework of a project aiming, among other things, at proposing ways of automatically red flagging contracts and economic agents depending on fraud risk. Therefore, the method that we propose is meant to be used by public authorities to better regulate public procurement and the management of the related open data.

Finally, the data used in this article are publicly shared, and were collected from a public open data repository handled by the European Union. They do not contain any personal information, and cannot be used directly to infer any personal information, as they only describe the economic transactions of companies and public institutions regarding public procurement.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Potin, L., Figueiredo, R., Labatut, V., Largeron, C. (2023). Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43427-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43426-6

  • Online ISBN: 978-3-031-43427-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics