Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Privacy-preserving spam filtering using homomorphic and functional encryption

Published: 01 January 2023 Publication History

Abstract

Conventional spam classification requires the end-users to reveal the content of incoming emails to a classifier so that text analysis can be performed. On the other hand, new cryptographic primitives allow this classification task to be performed on encrypted emails without revealing the email contents, hence preserves user data privacy. In this paper, we construct a spam classification framework that enables the classification of encrypted emails. Our model is based on a neural network with a quadratic network component and a multi-layer perceptron network component. The quadratic network architecture is compatible with the operation of an existing quadratic functional encryption scheme. To protect email content privacy, we proposed two spam classification solutions based on homomorphic encryption (HE) and functional encryption (FE) that enables our classifiers to predict the label of encrypted emails. The evaluation results on real-world spam datasets indicate that our proposed spam classification solutions achieve accuracies over 95%. Our performance study and security analysis provide pros and cons of each proposed solution. For instance, the FE solution predicts a label of an encrypted email in less than 31 s whereas the HE solution takes up to 265 s to do so. Nonetheless, the HE solution is not prone to potential information leakage as the FE solution.

References

[1]
M. Sahami, S. Dumais, D. Heckerman, E. Horvitz, A Bayesian approach to filtering junk e-mail, in: Learning for Text Categorization: Papers from the 1998 Workshop, Vol. 62, 1998, pp. 98–105.
[2]
J. Rennie, ifile: An application of machine learning to e-mail filtering, in: Proc. KDD 2000 Workshop on Text Mining.
[3]
Taylor B., Fingal D., Aberdeen D., The war against spam: A report from the front line, 2007.
[6]
Wiggers K., Gmail is now blocking 100 million more spam emails a day, thanks to TensorFlow, 2019, URL https://www.computerworld.com/article/2498784/has-the-spam-problem-been-solved-.html.
[7]
Google Blog, How machine learning in g suite makes people more productive, 2017, URL https://www.blog.google/products/g-suite/how-machine-learning-g-suite-makes-people-more-productive/.
[8]
Metz C., Google says its AI catches 99.9% of gmail spam, 2015, URL https://www.wired.com/2015/07/google-says-ai-catches-99-9-percent-gmail-spam/.
[9]
Pathak M.A., Sharifi M., Raj B., Privacy preserving spam filtering, 2011, arXiv preprint arXiv:1102.4021.
[10]
Paillier P., Public-key cryptosystems based on composite degree residuosity classes, in: Stern J. (Ed.), EUROCRYPT’99, Springer Berlin Heidelberg, Berlin, Heidelberg, 1999, pp. 223–238.
[11]
Khedr A., Gulak G., Vaikuntanathan V., SHIELD: scalable homomorphic implementation of encrypted data-classifiers, IEEE Trans. Comput. 65 (9) (2015) 2848–2858.
[12]
Boneh D., Sahai A., Waters B., Functional encryption: Definitions and challenges, in: Theory of Cryptography, 2011, pp. 253–273.
[13]
E. Dufour-Sans, R. Gay, D. Pointcheval, Reading in the Dark: Classifying Encrypted Digits with Functional Encryption, Cryptology ePrint Archive, Report 2018/206, 2018.
[14]
C.E.Z. Baltico, D. Catalano, D. Fiore, R. Gay, Practical Functional Encryption for Quadratic Functions with Applications to Predicate Encryption, Cryptology ePrint Archive, Report 2017/151, 2017.
[15]
D. Ligier, S. Carpov, C. Fontaine, R. Sirdey, Privacy Preserving Data Classification using Inner-product Functional Encryption, in: Proceedings of the 3rd ICISSP, 2017.
[16]
Ryffel T., Pointcheval D., Bach F.R., Dufour-Sans E., Gay R., Partially encrypted deep learning using functional encryption, in: NeurIPS’19, 2019.
[17]
Brakerski Z., Fully homomorphic encryption without modulus switching from classical gapsvp, in: Annual Cryptology Conference, Springer, 2012, pp. 868–886.
[18]
J. Fan, F. Vercauteren, Somewhat Practical Fully Homomorphic Encryption, Cryptology ePrint Archive, Report 2012/144, 2012.
[19]
Lyubashevsky V., Peikert C., Regev O., On ideal lattices and learning with errors over rings, in: Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2010, pp. 1–23.
[20]
Aloufi A., Hu P., Song Y., Lauter K., Computing blindfolded on data homomorphically encrypted under multiple keys: A survey, ACM Comput. Surv. 54 (9) (2021),.
[21]
Brakerski Z., Vaikuntanathan V., Efficient fully homomorphic encryption from (standard) LWE, in: FOCS’11, IEEE, 2011, pp. 97–106.
[22]
Marc T., Stopar M., Hartman J., Bizjak M., Modic J., Privacy-enhanced machine learning with functional encryption, in: ESORICS’19, 2019.
[23]
Abdalla M., Bourse F., De Caro A., Pointcheval D., Simple functional encryption schemes for inner products, in: Katz J. (Ed.), PKC’15, Springer Berlin Heidelberg, Berlin, Heidelberg, 2015.
[24]
Baltico C.E.Z., Catalano D., Fiore D., Gay R., Practical functional encryption for quadratic functions with applications to predicate encryption, in: CRYPTO’17, 2017, pp. 67–98.
[25]
Xu R., Joshi J.B.D., Li C., CryptoNN: Training neural networks over encrypted data, 2019, arXiv:1904.07303.
[26]
Bost R., Popa R.A., Tu S., Goldwasser S., Machine learning classification over encrypted data, in: NDSS’15, 2015.
[27]
Demertzis I., Froelicher D., Luo N., Hovd M.N., I-SEAL2: Identifying spam EmAiL with SEAL, in: Lauter K., Dai W., Laine K. (Eds.), Protecting Privacy Through Homomorphic Encryption, Springer International Publishing, Cham, 2021, pp. 129–132.
[28]
T. Gupta, H. Fingler, L. Alvisi, M. Walfish, Pretzel: Email encryption and provider-supplied functions are compatible, in: Proceedings of the ACM Special Interest Group on Data Communication, 2017, pp. 169–182.
[29]
Zhong Z., Ramaswamy L., Li K., ALPACAS: A large-scale privacy-aware collaborative anti-spam system, in: INFOCOM’08, IEEE, 2008, pp. 556–564.
[30]
C. Juvekar, V. Vaikuntanathan, A. Chandrakasan, GAZELLE: A low latency framework for secure neural network inference, in: 27th USENIX Security Symposium, 2018.
[31]
J. Liu, M. Juuti, Y. Lu, N. Asokan, Oblivious neural network predictions via minionn transformations, in: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.
[32]
Yang Y., Pedersen J.O., A comparative study on feature selection in text categorization, in: Icml, Vol. 97, Nashville, TN, USA, 1997, p. 35.
[33]
Pouyanfar S., Sadiq S., Yan Y., Tian H., Tao Y., Reyes M.P., Shyu M.-L., Chen S.-C., Iyengar S.S., A survey on deep learning: Algorithms, techniques, and applications, ACM Comput. Surv. 51 (5) (2018) 1–36.
[34]
Tensorflow Documentation, Post-training quantization, 2020, URL https://www.tensorflow.org/lite/performance/post_training_quantization.
[35]
SEAL, Microsoft SEAL (release 3.6), 2020, Microsoft Research, Redmond, WA, https://github.com/Microsoft/SEAL.
[36]
Halevi S., Shoup V., Algorithms in helib, in: Garay J.A., Gennaro R. (Eds.), Advances in Cryptology – CRYPTO 2014, Springer Berlin Heidelberg, Berlin, Heidelberg, 2014, pp. 554–571.
[38]
Cormack G.V., Lynam T.R., 2007 TREC public spam corpus, 2021, https://plg.uwaterloo.ca/~gvcormac/treccorpus07/, Online; accessed 29 May 2021.
[39]
CEAS 2008, Live Spam Challenge Corpusn, URL https://plg.uwaterloo.ca/~gvcormac/ceascorpus/.
[40]
Metsis V., Androutsopoulos I., Paliouras G., Enron-spam datasets, 2021, http://www2.aueb.gr/users/ion/data/enron-spam/, Online; accessed 29 May 2021.
[41]
Almeida T.A., Yamakami A., Compression-based spam filter, Secur. Commun. Netw. 9 (4) (2016) 327–335.
[42]
Sculley D., Cormack G.V., Going mini: Extreme lightweight spam filters, 2009.
[43]
E.D. Sans, Reading In The Dark: Classifying Encrypted Digits with Functional Encryption, URL https://github.com/edufoursans/reading-in-the-dark.
[44]
Akinyele J.A., Garman C., Miers I., Pagano M.W., Rushanan M., Green M., Rubin A.D., Charm: a framework for rapidly prototyping cryptosystems, J. Cryptogr. Eng. 3 (2) (2013).
[46]
S. Yoo, Y. Yang, F. Lin, I.-C. Moon, Mining social networks for personalized email prioritization, in: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 967–976.
[47]
Chiou P.-R., Lin P.-C., Li C.-T., Blocking spam sessions with greylisting and block listing based on client behavior, in: 2013 15th International Conference on Advanced Communications Technology, ICACT, IEEE, 2013, pp. 184–189.
[48]
M. Kucherawy, D. Crocker, Email greylisting: An applicability statement for smtp, Technical Report, 2012.
[49]
Song C., Raghunathan A., Information leakage in embedding models, in: CCS’20, 2020, pp. 377–390.
[50]
Nasr M., Shokri R., Houmansadr A., Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning, in: S&P’19, 2019, pp. 739–753,.
[51]
Melis L., Song C., De Cristofaro E., Shmatikov V., Exploiting unintended feature leakage in collaborative learning, in: S&P’19, 2019, pp. 691–706,.
[52]
Ganju K., Wang Q., Yang W., Gunter C.A., Borisov N., Property inference attacks on fully connected neural networks using permutation invariant representations, in: CCS’18, 2018, pp. 619–633.
[53]
Shokri R., Stronati M., Song C., Shmatikov V., Membership inference attacks against machine learning models, in: S&P’17, 2017, pp. 3–18,.
[54]
Sun Y., Chong N., Ochiai H., Privacy-preserving phishing email detection based on federated learning and LSTM, 2021, arXiv preprint arXiv:2110.06025.
[55]
Thapa C., Tang J.W., Abuadbba A., Gao Y., Camtepe S., Nepal S., Almashor M., Zheng Y., Evaluation of federated learning in phishing email detection, 2020, arXiv preprint arXiv:2007.13300.
[56]
Lyubashevsky V., Peikert C., Regev O., A toolkit for ring-LWE cryptography, in: EUROCRYPT’13, in: LNCS, Springer, 2013.
[57]
Samardzic N., Feldmann A., Krastev A., Devadas S., Dreslinski R., Peikert C., Sanchez D., F1: A fast and programmable accelerator for fully homomorphic encryption, in: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 238–252,.
[58]
Samardzic N., Feldmann A., Krastev A., Manohar N., Genise N., Devadas S., Eldefrawy K., Peikert C., Sanchez D., CraterLake: A hardware accelerator for efficient unbounded computation on encrypted data, in: Proceedings of the 49th Annual International Symposium on Computer Architecture, ISCA ’22, Association for Computing Machinery, New York, NY, USA, 2022, pp. 173–187,.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Communications
Computer Communications  Volume 197, Issue C
Jan 2023
307 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2023

Author Tags

  1. Spam filtering
  2. Deep neural networks
  3. Functional encryption
  4. Homomorphic encryption

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media