Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)

Published: 01 October 2006 Publication History

Abstract

An effective approach to phishing Web page detection is proposed, which uses Earth Mover's Distance (EMD) to measure Web page visual similarity. We first convert the involved Web pages into low resolution images and then use color and coordinate features to represent the image signatures. We use EMD to calculate the signature distances of the images of the Web pages. We train an EMD threshold vector for classifying a Web page as a phishing or a normal one. Large-scale experiments with 10,281 suspected Web pages are carried out to show high classification precision, phishing recall, and applicable time performance for online enterprise solution. We also compare our method with two others to manifest its advantage. We also built up a real system which is already used online and it has caught many real phishing cases.

References

[1]
Anti-Phishing Group of the City University of Hong Kong,
[2]
Anti-Phishing Working Group,
[3]
A. Broder, S. Glassman, M. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Proc. Sixth Int'l World Wide Web Conf., pp.391-404, 1997.
[4]
Yu Chen, Wei-Ying Ma, Hong-Jiang Zhang, Detecting web page structure for adaptive viewing on small form factor devices, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
[5]
Abdur Chowdhury, Ophir Frieder, David Grossman, Mary Catherine McCabe, Collection statistics for fast duplicate document detection, ACM Transactions on Information Systems (TOIS), v.20 n.2, p.171-191, April 2002
[6]
Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1076-1083, 1999.
[7]
R. Dhamija and J.D. Tygar, “The Battle Against Phishing: Dynamic Security Skins,” Proc. Symp. Usable Privacy and Security, 2005.
[8]
A.Y. Fu, X. Deng, and W. Liu, “A Potential IRI Based Phishing Strategy,” Proc. Sixth Int'l Conf. Web Information Systems Eng. (WISE '05), pp. 618-619, Nov. 2005.
[9]
[10]
Proc. 2004 IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 220-227, 2004.
[11]
X.D. Gu, J.L. Chen, W.Y. Ma, and G.L. Chen, “Visual Based Content Understanding towards Web Adaptation,” Proc. Second Int'l Conf. Adaptive Hypermedia and Adaptive Web-Based Systems, pp.29-31, 2002.
[12]
F.S. Hillier and G.J. Liberman, Introduction to Mathematical Programming. McGraw-Hill, 1990.
[13]
F.L. Hitchcock, “The Distribution of a Product from Several Sources to Numerous Localities,” J. Math. Physics, vol. 20, pp. 224-230, 1941.
[14]
J. Am. Soc. Information Science and Technology, vol. 54, no. 3, pp. 203-215, 2003.
[15]
C.R. John, The Image Processing Handbook, second ed. CRC Press, 1995.
[16]
E. Levina and P. Bickel, “The Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics,” Proc. IEEE Int'l Conf. Computer Vision, vol. 2, 2001.
[17]
IEEE Internet Computing, vol. 10, no. 2, pp. 58-65, 2006.
[18]
W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng, “Detection of Phishing Web Pages Based on Visual Similarity,” Proc. 14th Int'l World Wide Web Conf., pp. 1060-1061, 2005.
[19]
W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng, “Phishing Web Page Detection,” Proc. Eighth Int'l Conf. Documents Analysis and Recognition, pp. 560-564, 2005.
[20]
T. Nanno, S. Saito, and M. Okumura, “Structuring Web Pages Based on Repetition of Elements,” Proc. Seventh Int'l Conf. Document Analysis and Recognition, 2003.
[21]
Netscape Corp., The SSL Protocol,
[22]
Y. Rubner, C. Tomasi, and L.J. Guibas, “The Earth Mover's Distance as a Metric for Image Retrieval,” Technical Report STAN-CS-TN-98-86, Dept. of Computer Science, Stanford Univ., 1998.
[23]
Proc. IEEE Int'l Conf. Computer Vision, pp. 59-66, 1998.
[24]
G. Salton, A. Wong, and C.S. Yang, “A Vector Space Model for Information Retrieval,” J. Am. Soc. Information Science, vol. 18, no. 11, pp. 613-620, 1975.
[25]
L. Wood, Document Object Model Level 1 Specification,
[26]
M. Wu, R.C. Miller, and G. Little, “Web Wallet: Preventing Hishing Attacks by Revealing User Intentions,” Proc. Symp. Usable Privacy and Security, 2006.
[27]
Shipeng Yu, Deng Cai, Ji-Rong Wen, Wei-Ying Ma, Improving pseudo-relevance feedback in web information retrieval using web page segmentation, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary

Cited By

View all
  • (2024)VORTEX : Visual phishing detectiOns aRe Through EXplanationsACM Transactions on Internet Technology10.1145/365466524:2(1-24)Online publication date: 6-May-2024
  • (2024)Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand WebsitesProceedings of the ACM Web Conference 202410.1145/3589334.3645535(1756-1767)Online publication date: 13-May-2024
  • (2024)Reversible key frame selection data hiding in videos using search tree labelling schemeMultimedia Tools and Applications10.1007/s11042-023-15671-983:2(3855-3878)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing  Volume 3, Issue 4
October 2006
133 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 October 2006

Author Tags

  1. Antiphishing
  2. Earth Mover's Distance.
  3. visual assessment

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)VORTEX : Visual phishing detectiOns aRe Through EXplanationsACM Transactions on Internet Technology10.1145/365466524:2(1-24)Online publication date: 6-May-2024
  • (2024)Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand WebsitesProceedings of the ACM Web Conference 202410.1145/3589334.3645535(1756-1767)Online publication date: 13-May-2024
  • (2024)Reversible key frame selection data hiding in videos using search tree labelling schemeMultimedia Tools and Applications10.1007/s11042-023-15671-983:2(3855-3878)Online publication date: 1-Jan-2024
  • (2023)Knowledge expansion and counterfactual interaction for reference-based phishing detectionProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620469(4139-4156)Online publication date: 9-Aug-2023
  • (2023)The Chameleon on the Web: an Empirical Study of the Insidious Proactive Web DefacementsProceedings of the ACM Web Conference 202310.1145/3543507.3583377(2241-2251)Online publication date: 30-Apr-2023
  • (2023)Intelligent feature selection model based on particle swarm optimization to detect phishing websitesMultimedia Tools and Applications10.1007/s11042-023-15399-682:29(44943-44975)Online publication date: 1-Dec-2023
  • (2023)Attacking Logo-Based Phishing Website Detectors with Adversarial PerturbationsComputer Security – ESORICS 202310.1007/978-3-031-51479-1_9(162-182)Online publication date: 25-Sep-2023
  • (2022)Phishing Target Identification Based on Neural Networks Using Category Features and ImagesSecurity and Communication Networks10.1155/2022/56532702022Online publication date: 1-Jan-2022
  • (2022)A Multiscale Semi-Smooth Newton Method for Optimal TransportJournal of Scientific Computing10.1007/s10915-022-01813-y91:2Online publication date: 1-May-2022
  • (2022)Hybrid Phishing URL Detection Using Segmented Word EmbeddingInformation Integration and Web Intelligence10.1007/978-3-031-21047-1_46(507-518)Online publication date: 28-Nov-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media