research-article

CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs

Authors:

Pranav Poduval,

Karamjit Singh,

Pranay GuptaAuthors Info & Claims

ICAIF '22: Proceedings of the Third ACM International Conference on AI in Finance

Pages 420 - 427

https://doi.org/10.1145/3533271.3561788

Published: 26 October 2022 Publication History

Abstract

Similarity-search is an important problem to solve for the payment industry having user-merchant interaction data. It finds out merchants similar to a given merchant and solves various tasks like peer-set generation, recommendation, community detection, and anomaly detection. Recent works have shown that by leveraging interaction data, Graph Neural Networks (GNNs) can be used to generate node embeddings for entities like a merchant, which can be further used for such similarity-search tasks. However, most of the real-world financial data come with high cardinality categorical features such as city, industry, super-industries, etc. which are fed to the GNNs in a one-hot encoded manner. Current GNN algorithms are not designed to work for such sparse features which makes it difficult for them to learn these sparse features preserving embeddings. In this work, we propose CaPE, a Category Preserving Embedding generation method which preserves the high cardinality feature information in the embeddings. We have designed CaPE to preserve other important numerical feature information as well. We compare CaPE with the latest GNN algorithms for embedding generation methods to showcase its superiority in peer set generation tasks on real-world datasets, both external as well as internal (synthetically generated). We also compared our method for a downstream task like link prediction.

References

[1]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

[2]

Belur V Dasarathy. 1991. Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Tutorial(1991).

[3]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016), 3844–3852.

[4]

Kaize Ding, Yichuan Li, Jundong Li, Chenghao Liu, and Huan Liu. 2019. Feature interaction-aware graph neural networks. arXiv preprint arXiv:1908.07110(2019).

[5]

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 135–144.

Digital Library

[6]

David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint arXiv:1509.09292(2015).

[7]

Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. Bine: Bipartite network embedding. In The 41st international ACM SIGIR conference on research & development in information retrieval. 715–724.

Digital Library

[8]

Mihajlo Grbovic and Haibin Cheng. 2018. Real-time personalization using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 311–320.

Digital Library

[9]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.

Digital Library

[10]

William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025–1035.

[11]

Chaoyang He, Tian Xie, Yu Rong, Wenbing Huang, Junzhou Huang, Xiang Ren, and Cyrus Shahabi. 2019. Cascade-BGNN: Toward Efficient Self-supervised Representation Learning on Large-scale Bipartite Graphs. arXiv preprint arXiv:1906.11994(2019).

[12]

Vassilis N Ioannidis, Da Zheng, and George Karypis. 2020. PanRep: Universal node embeddings for heterogeneous graphs. (2020).

[13]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data(2019).

[14]

Anish Khazane, Jonathan Rider, Max Serpe, Antonia Gogoglou, Keegan Hines, C Bayan Bruss, and Richard Serpe. 2019. Deeptrax: Embedding graphs of financial transactions. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 126–133.

[15]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).

[16]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).

[17]

Vineet Kosaraju, Amir Sadeghian, Roberto Martín-Martín, Ian Reid, S Hamid Rezatofighi, and Silvio Savarese. 2019. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. arXiv preprint arXiv:1907.03395(2019).

[18]

Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. Pytorch-biggraph: A large-scale graph embedding system. arXiv preprint arXiv:1903.12287(2019).

[19]

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.

[20]

Cheng-Yuan Liou, Wei-Chen Cheng, Jiun-Wei Liou, and Daw-Ran Liou. 2014. Autoencoder for words. Neurocomputing 139(2014), 84–96.

Digital Library

[21]

Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129–137.

Digital Library

[22]

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International conference on machine learning. PMLR, 2014–2023.

[23]

Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science 2, 11 (1901), 559–572.

[24]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.

Digital Library

[25]

Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. 2019. Session-based social recommendation via dynamic graph attention networks. In Proceedings of the Twelfth ACM international conference on web search and data mining. 555–563.

Digital Library

[26]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. 1067–1077.

Digital Library

[27]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903(2017).

[28]

Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 839–848.

Digital Library

[29]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974–983.

Digital Library

Cited By

Index Terms

CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings

Recommendations

Probabilistic embeddings of bounded genus graphs into planar graphs
SCG '07: Proceedings of the twenty-third annual symposium on Computational geometry

A probabilistic C-embedding of a (guest) metric M into a collection of(host) metrics M'₁, ..., M'_k is a randomized mapping F of M intoone of the M'₁, ..., M'_k such that, for any two points p,q in theguest metric: The distance between F(p) and F(q) in ...
Learning Backward Compatible Embeddings
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product ...
Nearest-neighbor-preserving embeddings

In this article we introduce the notion of nearest-neighbor-preserving embeddings. These are randomized embeddings between two metric spaces which preserve the (approximate) nearest-neighbors. We give two examples of such embeddings for Euclidean ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICAIF '22: Proceedings of the Third ACM International Conference on AI in Finance

November 2022

527 pages

ISBN:9781450393768

DOI:10.1145/3533271

Editors:
Daniele Magazzeni
J.P. Morgan AI Research
,
Senthil Kumar
Capital One
,
Rahul Savani
University of Liverpool
,
Renyuan Xu
University of Southern California
,
Carmine Ventre
King's College London
,
Blanka Horvath
University of Oxford
,
Ruimeng Hu
University of California Santa Barbara
,
Tucker Balch
J.P. Morgan AI Research
,
Francesca Toni
Imperial College London

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICAIF '22

Sponsor:

ACM

ICAIF '22: 3rd ACM International Conference on AI in Finance

November 2 - 4, 2022

NY, New York, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
111
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents