short-paper

Distilling Word Embeddings: An Encoding Approach

Authors:

Lili Mou,

Ran Jia,

Yan Xu,

Ge Li,

Lu Zhang,

Zhi JinAuthors Info & Claims

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1977 - 1980

https://doi.org/10.1145/2983323.2983888

Published: 24 October 2016 Publication History

Get Access

Abstract

Distilling knowledge from a well-trained cumbersome network to a small one has recently become a new research topic, as lightweight neural networks with high performance are particularly in need in various resource-restricted systems. This paper addresses the problem of distilling word embeddings for NLP tasks. We propose an encoding approach to distill task-specific knowledge from a set of high-dimensional embeddings, so that we can reduce model complexity by a large margin as well as retain high accuracy, achieving a good compromise between efficiency and performance. Experiments reveal the phenomenon that distilling knowledge from cumbersome embeddings is better than directly training neural networks with small embeddings.

References

[1]

J. Ba and R. Caruana. Do deep nets really need to be deep? In NIPS, pages 2654--2662, 2014.

Digital Library

Google Scholar

[2]

Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. JMLR, 3:1137--1155, 2003.

Digital Library

Google Scholar

[3]

C. Bucilu\va, R. Caruana, and A. Niculescu-Mizil. Model compression. In SIGKDD, pages 535--541, 2006.

Digital Library

Google Scholar

[4]

W. Chan, N. R. Ke, and I. Lane. Transferring knowledge from a RNN to a DNN. arXiv:1504.01483, 2015.

Google Scholar

[5]

W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick. In ICML, pages 2285--2294, 2015.

Digital Library

Google Scholar

[6]

R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML, pages 160--167, 2008.

Digital Library

Google Scholar

[7]

G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv:1503.02531, 2014.

Google Scholar

[8]

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.

Digital Library

Google Scholar

[9]

F. Morin and Y. Bengio. Hierarchical probabilistic neural network language model. In AISTAT, pages 246--252, 2005.

Google Scholar

[10]

L. Mou, H. Peng, G. Li, Y. Xu, L. Zhang, and Z. Jin. Discriminative neural sentence modeling by tree-based convolution. In EMNLP, pages 2315--2325, 2015.

Crossref

Google Scholar

[11]

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. FitNets: Hints for thin deep nets. In ICLR, 2014.

Google Scholar

[12]

D. Wang, C. Liu, Z. Tang, Z. Zhang, and M. Zhao. Recurrent neural network training with dark knowledge transfer. arXiv:1505.04630, 2015.

Google Scholar

[13]

Y. Xu, L. Mou, G. Li, Y. Chen, H. Peng, and Z. Jin. Classifying relations via long short term memory networks along shortest dependency paths. In EMNLP, pages 1785--1794, 2015.

Crossref

Google Scholar

Cited By

View all

Zhu WWu OSu FDeng Y(2024)Exploring the Learning Difficulty of Data: Theory and MeasureACM Transactions on Knowledge Discovery from Data10.1145/363651218:4(1-37)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3636512
Chia CTkachenko MLauw H(2022)Morphologically-Aware Vocabulary Reduction of Word Embeddings2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00018(56-63)Online publication date: Nov-2022
https://doi.org/10.1109/WI-IAT55865.2022.00018
Gou JYu BMaybank STao D(2021)Knowledge Distillation: A SurveyInternational Journal of Computer Vision10.1007/s11263-021-01453-zOnline publication date: 22-Mar-2021
https://doi.org/10.1007/s11263-021-01453-z
Show More Cited By

Index Terms

Distilling Word Embeddings: An Encoding Approach
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks
Abstract
The internal structural information of words has proven to be very effective for learning Chinese word embeddings. However, most previous attempts made a single form extraction of internal feature to learn representations, ignoring the ...
Word Embeddings for Entity-Annotated Texts
Advances in Information Retrieval
Abstract
Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities ...
A Neural Network Approach for Text Classification Using Low Dimensional Joint Embeddings of Words and Knowledge
Information Integration and Web Intelligence
Abstract
The continuous expansion of textual data collection and dissemination in electronic means has made text classification a crucial task to help exploit, in a variety of applications, massive amounts of digital texts available nowadays. Knowledge ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

October 2016

2566 pages

ISBN:9781450340731

DOI:10.1145/2983323

General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Basic Research Program of China (the 973 Program)
National Natural Science Foundation of China

Conference

CIKM'16

Sponsor:

CIKM'16: ACM Conference on Information and Knowledge Management

October 24 - 28, 2016

Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
271
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)2

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhu WWu OSu FDeng Y(2024)Exploring the Learning Difficulty of Data: Theory and MeasureACM Transactions on Knowledge Discovery from Data10.1145/363651218:4(1-37)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3636512
Chia CTkachenko MLauw H(2022)Morphologically-Aware Vocabulary Reduction of Word Embeddings2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00018(56-63)Online publication date: Nov-2022
https://doi.org/10.1109/WI-IAT55865.2022.00018
Gou JYu BMaybank STao D(2021)Knowledge Distillation: A SurveyInternational Journal of Computer Vision10.1007/s11263-021-01453-zOnline publication date: 22-Mar-2021
https://doi.org/10.1007/s11263-021-01453-z
Yang ZShou LGong MLin WJiang DCaverlee JHu XLalmas MWang W(2020)Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering SystemProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371792(690-698)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371792
Zhang XLi XYang YDong R(2020)Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge DistillationIEEE Access10.1109/ACCESS.2020.30378218(206638-206645)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3037821
Shin BYang HChoi J(2019)The pupil has become the masterProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367519(3439-3445)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367519
Chen XZhang YXu HQin ZZha H(2018)Adversarial Distillation for Efficient Recommendation with External KnowledgeACM Transactions on Information Systems10.1145/328165937:1(1-28)Online publication date: 13-Dec-2018
https://dl.acm.org/doi/10.1145/3281659

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Word Embeddings for Entity-Annotated Texts

A Neural Network Approach for Text Classification Using Low Dimensional Joint Embeddings of Words and Knowledge