Effect of Word Sense Disambiguation On Neural Machine Translation A Case Study in Korean
Effect of Word Sense Disambiguation On Neural Machine Translation A Case Study in Korean
Effect of Word Sense Disambiguation On Neural Machine Translation A Case Study in Korean
ABSTRACT With the advent of robust deep learning, neural machine translation (NMT) has achieved
great progress and recently become the dominant paradigm in machine translation (MT). However, it is
still confronted with the challenge of word ambiguities that force NMT to choose among several translation
candidates that represent different senses of an input word. This research presents a case study using Korean
word sense disambiguation (WSD) to improve NMT performance. First, we constructed a Korean lexical
semantic network (LSN) as a large-scale lexical semantic knowledge base. Then, based on the Korean LSN,
we built a Korean WSD preprocessor that can annotate the correct sense of Korean words in the training
corpus. Finally, we conducted a series of translation experiments using Korean-English, Korean-French,
Korean-Spanish, and Korean-Japanese language pairs. The experimental results show that our Korean
WSD system can significantly improve the translation quality of NMT in terms of the BLEU, TER, and
DLRATIO metrics. On average, it improved the precision by 2.94 BLEU points and improved translation
error prevention by 4.04 TER points and 4.51 DLRATIO points for all the language pairs.
INDEX TERMS Lexical semantic network, neural machine translation, word sense disambiguation.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
38512 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 6, 2018
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Q.-P. Nguyen et al.: Effect of WSD on NMT: Case Study in Korean
Initially, we constructed an LSN named UWordMap for quality of the translation results. Both methods customized
Korean. UWordMap consists of a noun network and a pred- translation models to learn additional information, which
icate network with a hierarchical structure for hyponymy might lead to low performance. In particular, increasing the
relations. The noun and verb networks are connected through size of the training corpus and using more deep layers could
subcategorization information. To the best of our knowledge, cause performance to diminish exponentially.
UWordMap is currently the biggest and most comprehensive In contrast to the previous research, we did not modify
Korean LSN, containing nearly 500,000 nouns, verbs, adjec- the NMT model. Instead, we propose a fast and accurate
tives, and adverbs. We then applied UWordMap to build a fast WSD system that can run independently. Our WSD acts as
and accurate Korean WSD system.1 a preprocessor to annotate Korean texts with sense-codes
We conducted a series of bi-directional translation before they are input into the NMT system. The sense-
experiments with Korean-English, Korean-French, Korean- codes are not additional information; instead, they trans-
Spanish, and Korean-Japanese language pairs. The form ambiguous words. Tagging a single word with different
experimental results show that our Korean WSD system sense-codes creates new words, and consequently removes
can significantly improve NMT translation quality in terms ambiguos words.
of the BLEU, TER, and DLRATIO metrics. On aver-
age, it improved precision by 9.68 and 5.46 BLEU points III. UWordMap — A KOREAN LSN
for translation from and to Korean, respectively. It also Because an LSN is used as an essential and useful knowl-
improved translation error prevention by 8.9 TER points and edge resource in various natural language processing sys-
8.0 DLRATIO points for all the tested systems. tems, especially in systems dealing with semantics, many
researchers have tried to construct one for each language;
II. RELATED WORK examples include the Princeton WordNet [20] for English,
Early research tried to prove that WSD could benefit MT, but EuroWordNet [21] and BalkaNet [22] for various European
Carpuat and Wu [10] reported negative results from integrat- languages, and HowNet [23] for Chinese. Several projects
ing a Chinese WSD system into a Chinese-to-English word- have been conducted to build a Korean LSN, but
based statistical MT (SMT) system. Their WSD predicted most of them are based on existing non-Korean LSNs.
Chinese word senses using the HowNet dictionary and then KorLex [24] and the Korean Thesaurus [25] were
projected the predicted senses into English glosses. based on WordNet, and CoreNet [26] was developed by
Instead of predicting the senses of ambiguous source mapping the NTT Goidaikei Japanese hierarchical lexical
words, Vickrey et al. [11] reformulated the WSD system to Korean word senses. Some Korean LSNs were
task for SMT as predicting possible target translations. designed for specific tasks; for instance, the ETRI lexical
Carpuat and Wu [12] integrated multi-word phrasal concept network [27] was designed for question-answering
WSD models into a phrase-based SMT. Their experiments systems.
both led to the conclusion that WSD can improve SMT. The Korean LSN UWordMap was manually con-
Following those successful integrations of WSD into SMT, structed as a large-scale lexical semantic knowledge base.
others considered applying WSD systems to MT using sev- In UWordMap, every node corresponds to a certain sense and
eral methods. Xiong and Zhang [13] proposed a sense-based has a unique sense-code that represents each distinct sense
SMT model. Su et al. [14] used a graph-based framework of its associated lexicon. The lexicons and their sense-codes
for collective lexical selection. They both experimented on were extracted from SKLD.
Chinese-to-English translations and achieved improvements To construct UWordMap, we first established a lexi-
in translation quality. cal network for nouns; it has a hierarchical structure for
In addition to Chinese-English translations, WSD sys- hyponymy relations. Then, we constructed a lexical network
tems have been successfully integrated into translations for predicates, and it not only has a hierarchical structure
of other language pairs. Using word senses as con- for hyponymy relations but also provides subcategorization
textual features in maxent-based models enhanced the information to connect each predicate with the lexical net-
quality of an English-Portuguese SMT [15]. A Czech- work for nouns. Furthermore, we also defined combination
English phrase-based SMT was improved using a verb-only relations for adverbs and predicates.
WSD [16]. A WordNet-based WSD was successful in an UWordMap can be used through its application-
English-Slovene SMT [17]. programming interface [28] or our online service.2 In this
Recently, Rios et al. [18] proposed a method to improve research, we used only the lexical network for nouns and
WSD in NMT by adding sense to word embeddings and the subcategorization information for predicates to improve
extracting lexical semantic chains from the training data. the accuracy of our WSD systems. Hence, we next describe
Liu et al. [19] also added context-aware to word embed- the structure of the lexical network for nouns and the sub-
dings. In their experimental results, the NMT system failed categorization information for predicates, which are shown
to translate ambiguous words, and their WSD improved the in FIGURE 1.
A. LEXICAL NETWORK FOR NOUNS TABLE 1. Top-level nodes of lexical network for nouns.
• For terminologies, the relation relies on existing termi- The principles for constructing the subcategorization infor-
nological classification systems. mation for predicates are as follows.
After establishing the hyponymy relations, we used the 1- Refer to the example sentences of each predicate
lexical semantic model of Cruse [29] to examine the LNN in SKLD to construct predicate-argument structures.
and ensure that its structure observes the IS-A relation. For instance, from the examples of the predicate
‘‘meog-da_0201’’ (to eat) in SKLD, we extracted
3) SYNONYMY RELATION ESTABLISHMENT the sentence pattern ‘‘eul’’ and combinative argu-
Synonymy relations in the LNN are classified into two cat- ment nouns: {bab (rice), sul (alcohol), yag (drug),
egories: absolute synonymy relation (ASR) and partial syn- mul (water), eum-sig (food), mo-i (feed), bo-yag
onymy relation (PSR). The ASR applies when two or more (analeptic)}.
words have the same meaning, and the PSR connects words 2- Connect the predicate with the argument noun’s upper-
with similar meanings. For more detail, ASR is divided level nodes in the LNN. For instance, in the LNN,
into six types: standard absolute synonymy, misused words, the upper-level nodes of those argument nouns are
dialect words, North Korean words, archaic words, and short {eum-sig (food), eum-lyo (drink), mul-jil (material),
form–original form words. PSR is divided into eight types: aeg-che (liquid), eum-sig-mul (food), meog-i (feed),
standard partial synonymy, refining words, aspirated sound yag (drug)}, respectively (FIGURE 1). We connected
words, honorific words, familiar speech, jargon, terminology, the predicate to only the upper-level nodes that do not
and designation words. violate the exceptional cases below.
3- Handle exceptional cases in which the predicate cannot
connect to an upper-level node.
4) ANTONYM RELATION ESTABLISHMENT • If the upper-level node has children nodes that are
We established the antonym relation for word pairs with not suitable with the predicate, connect the pred-
opposite meanings. According to the lexical semantics, icate with the argument noun itself. For instance,
we divided the antonym relation into three kinds: comple- ‘‘mul-jil’’ is the upper-level node of the argu-
mentary antonym, gradable antonym, and relative antonym. ment noun ‘‘yag.’’ But, ‘‘mul-jil’’ also has children
whose meaning cannot be eaten. So, we connected
B. LEXICAL NETWORK FOR PREDICATES the predicate ‘‘meog-da_020101’’ directly with the
Korean verbs and adjectives have similar grammatical con- argument ‘‘yag.’’
structions. Korean grammar does not require a verb ‘‘to be’’ • If a predicate has two or more argument nouns
(e.g., am, are, is) for an adjective to construct a sentence. that are in hyponymy relations with each other,
Korean adjectives thus function as stative verbs that play connect the predicate with the upper-level node
the predicate role in sentences [30]. Hence, we arranged of the hypernym. For instance, the argument noun
verbs and adjectives into a single lexical network for ‘‘eum-sig’’ is a hypernym of the argument noun
predicates (LNP). ‘‘bab’’ (FIGURE 1). Therefore, we connected
In addition to having a hierarchical structure for ‘‘meog-da_0201’’ with ‘‘eum-sig-mul,’’ which is
hyponymy relations, the LNP also contains subcategorization the upper-level node of ‘‘eum-sig.’’
information about syntactic categories that is construed as • If argument nouns are in the same branch of
arguments. Each predicate associates with its arguments a similar semantic field, connect the predicate
in forming a predicate-argument structure that we used to with the common upper-level node. For instance,
define connections between predicates and the least common FIGURE 2 illustrates an upper-level node common
subsumers (LCS) of the LNN. Subcategorization information to three argument nouns.
is essential for generating semantic connections between the • If the argument nouns are homographs, connect
LNN and LNP in UWordMap. the predicate with upper-level nodes, depending on
We constructed subcategorization information for all pred- their semantics.
icates registered in SKLD by extracting the arguments from • If the argument noun is a top-level node or its
example sentences of each predicate in SKLD. Based on upper-level node is unsuitable for the predicate,
the arguments, we connected the predicates with the LCS we connect the predicate directly with the argu-
of the LNN. For instance, in FIGURE 1, the predicate ment noun.
‘‘meog-da’’ (to eat) was connected to LCS such as ‘‘eum-sig- According to the principles, the subcategorization informa-
mul’’ (food), ‘‘yag’’ (drug), ‘‘meog-i’’ (feed), and ‘‘aeg-che’’ tion was constructed with the number of predicates and LCS
(liquid) through the sentence pattern ‘‘eul’’ (object case shown in the last column of Table 2.
marker). However, some hyponyms of ‘‘yag’’ cannot be
connected to the verb ‘‘meog-da’’ (i.e., ‘‘ba-leu-neun-yag’’ C. CURRENT STATUS OF UWordMap
(liniment), ‘‘ju-sa-yag’’ (injection), and ‘‘but-i-neun-yag’’ UWordMap now contains more than 474,000 words (nouns,
(medicated plaster)). We marked those cases with a con- verbs, adjectives, and adverbs), which is 92.2% of the words
strained relation [N_OBJ]. in SKLD. We compared UWordMap and existing Korean
TABLE 3. An example of candidate generation. ‘‘meog_02/VV’’ occurs many times in the training corpus.
The verb ‘‘meog_02/VV’’ is the word stem for the eojeol
‘‘meog-ja-myeon.’’ Based on the word stem of the verbs and
adjectives, we can somewhat solve the missing data problem
and determine the correct sense of the contiguous nouns.
A surface form of a verb or adjective can be analyzed into
several kinds of word stems. For instance, the surface form
‘‘gan-da’’ is analyzed into four kinds of word stems, as shown
in Table 4. In that case, PRight is denoted as PRight_Stem and
calculated by selecting a word stem from the right-contiguous
eojeol that maximizes the conditional probability:
to the content word ‘‘meog-eoss’’ (ate). According to this
PRight_Stem = argmax k P(ci,j |wi , vi+1,k ) (5)
assumption, the sense of ‘‘sa-gwa’’ can be identified based on
the first two syllables of the following eojeol ‘‘meog-eoss.’’ where vi+1,k is the k-th word stem of the right-contiguous
This assumption considers only the surface form of the eojeol (i.e., i-th = current eojeol, i+1-th = right-contiguous
eojeols. Hence, the conditional probability PRight is denoted eojeol). k can be mapped to 1, 2, 3, and 4 for the example
as PRight_Surf and counted using Equation (3) based on in Table 4.
the entire current eojeol and the first two syllables of the
right-contiguous eojeol. TABLE 4. Analyzing surface form of the Eojeol ‘‘gan-da’’ into word stems.
with the data missing from the training corpus. Each noun
can combine with many different predicates to make up
sentences. Given all the possible combinations of nouns
and predicates, the training corpus cannot contain them
all. That lack of data in the training corpus is one of
the main challenges faced by the corpus-based approach
to WSD.
B. KNOWLEDGE-BASED WSD
To address that problem, we propose a knowledge-based
method using the UWordMap LSN. UWordMap contains
a hierarchical structure network for nouns and subcatego-
rization information for predicates. In the subcategoriza- FIGURE 3. Hierarchical network of the noun ‘‘gil_0101’’ in UWordMap.
tion information, predicates are connected with the LCS
of the hierarchical noun network through sentence patterns.
Table 5 gives an example of the subcategorization informa-
tion in UWordMap. Based on those connections, we can na-mus-gil_01. . . in FIGURE 3) are not connected to the verb
determine the correct sense for nouns and predicates. ‘‘geod-da_02’’. We can still generate a series of sentences by
connecting the verb ‘‘geod-da_02’’ and the hyponyms such
TABLE 5. Example of subcategorization information in UWordMap. as
mo-laes-gil_01/NNG eul/JKO geod-da_02/VV.
san_gil_02/NNG eul/JKO geod-da_02/VV.
na-mus-gil_01/NNG eul/JKO geod-da_02/VV.
...
The noun ‘‘gil_0101’’ has 421 direct hyponyms (level 1
hyponyms), each of which has other hyponyms in level 2.
Therefore, the number of sentences generated is large.
Expanding the training corpus in this way will make it
huge and cause low performance, reducing the system’s
practicality.
Instead of generating sentences to complement the training
corpus, we replace the noun with its hypernym while the sen-
tence is examining. When using both the surface form and the
Thus, UWordMap provides a way to complement the stem word of an eojeol fails to identify its sense, the hypernym
training corpus by generating sentences from the subcat- will be looked up and used instead. If the hypernyms still
egorization information. From the subcategorization infor- cannot identify the sense of an eojeol, we continue looking
mation, we extract the LCS (i.e., nouns), predicates, and up the hypernym of the hypernym in a looping process that
sentence-patterns and then arrange them according to Korean continues until the sense is identified or the hypernym is the
sentence structure to generate sentences [35]. For instance, top-level node (FIGURE 4).
from the subcategorization information in Table 5, we can To improve the performance of the loop process, we make
generate the following sentences to supplement the training paths (hypernym paths) from each noun to a top-level node.
corpus. Because each noun has only one hypernym, each noun has
gil_0101/NNG eul/JKO geod-da_02/VV. only one hypernym path. The average length of the hypernym
geoli_0101/NNG eul/JKO geod-da_02/VV. paths is 10, and the maximum length is 17. For example,
gong-won_03/NNG eul/JKO geod-da_02/VV. ‘‘na-mus-gil_01 > san-gil_02 > gil_0101 > gong-
baeg-seong_0001/NNG e-ge-seo/SRC geod-da_04/VV. gan_0502’’ is a hypernym path created from the hierarchical
si-heom-jang_0001/NNG e-seo/LOC geod-da_04/VV. network shown in FIGURE 3. Storing the hypernym paths in
... the database could reduce the volume of the training corpus
Furthermore, the training corpus can be expanded into and reduce the complexity of looking up hypernyms in the
hyponyms of the LCS on the LSN. The LCS’s were replaced loop process. All processes we have proposed for determining
with their hyponyms to generate a series of sentences with word sense are shown in FIGURE 4.
the same predicates. FIGURE 3 gives a hierarchical network
of the noun ‘‘gil_0101’’ (street, road). In the subcatego- C. EXPERIMENTS AND RESULTS
rization information (Table 5), ‘‘gil_0101’’ is directly con- We conducted our experiments on the Sejong corpus [36]
nected to the verb ‘‘geod-da_02’’ (to walk). However, its and LSN UWordMap. The Sejong corpus includes 11 million
hyponyms (i.e., mo-laes-gil_01, san-gil_02, mi-lo_0101, and eojeols tagged with POS and sense-codes that are identical to
38518 VOLUME 6, 2018
Q.-P. Nguyen et al.: Effect of WSD on NMT: Case Study in Korean
T
←T
hi = hETi ; h i .
B. DECODER
A decoder is a forward RNN to generate the target sentence
y = y1 , y2 , . . . , yTy , yi ∈ RKy , where Ty is the length
of target sentence, and Ky is the vocabulary of the target
language. Word yi is calculated by the conditional probability
B. INTEGRATING KOREAN WSD INTO THE CORPORA
p (yi | {y1 , . . . ,yi−1 } , x) = g(yi−1 , si , ci ) (13)
By using the Korean WSD system described in section IV.
The hidden state is first initialized with s0 = tanh(Ws h1 ) The Korean words in both the training and testing sets were
and then calculated for each time i by tagged with the sense-codes before they were input into the
NMT systems. The Korean WSD system thus works as a
si = (1 − zi ) ◦ si−1 + zi ◦ s̃i (14)
preprocessor for MT systems. Table 8 gives an example of
where a Korean sentence tagged with the sense-codes. Because
MT systems delimit words by the white spaces between them,
s̃i = tanh (WEyi−1 + U [ri ◦ si−1 ] + Cci ) (15)
the sense-code tagging transforms homographic words into
zi = σ (Wz Eyi−1 + Uz si−1 + Cz ci ) (16)
ri = σ (Wr Eyi−1 + Ur si−1 + Cr ci ) (17) distinct words, eliminating the ambiguous words from the
Korean dataset.
E is the word-embedding matrix of the target language, and The Korean WSD system changed the sizes of the tokens
W∗ , U∗ , and C∗ are weight matrices. and vocabulary (i.e., the types of tokens) in the Korean
The context vector ci is calculated based on the source dataset, as shown in Table 9. As explained in detail above,
annotations by the Korean WSD includes two steps. The first step analyzes
XTx exp eij 4 https://krdict.korean.go.kr
ci = PTx hj (18)
k=1 exp (eik )
j=1 5 http://www.statmt.org/moses
TABLE 9. Korean data set after applying WSD. TABLE 10. Translation results.
C. SETUP
We implemented our NMT systems on the open framework
OpenNMT [46], which is a sequence-to-sequence model
described in section V. The systems were set with the follow-
ing parameters: word-embedding dimension = 500, hidden
layer = 2x500 RNNs, input feed = 13 epochs.
We used those NMT systems for bi-directional translation
of the following language pairs: Korean-English, Korean-
French, Korean-Japanese, and Korean-Spanish. To separately
evaluate the effectiveness of our morphological analysis and Table 10 shows the results of the 24 systems in terms of
sense-code tagging, we used three systems (Baseline, Mor- their BLEU, TER, and DLRATIO scores. All three metrics
phology, and WSD) for each direction. The Baseline sys- demonstrate that both the Morphology and WSD systems
tems were trained with the originally collected corpora given improved the translation quality for all four language pairs
in Table 7. The Morphology systems were trained with the and both translation directions.
Korean corpus that had been morphologically analyzed. In the The Morphology systems improved the results of the
WSD systems, the Korean training corpus was both morpho- Baseline systems for all the language pairs by an average
logically analyzed and tagged with sense-codes. Altogether, of 6.41 and 2.85 BLEU points for translation from and
we had 24 translation systems, as shown in Table 10. to Korean, respectively. Morphological complexity causes
a critical data sparsity problem when translating into or
D. RESULTS from Korean [50]. The data sparsity increases the number of
We used the BLEU, TER, and DLRATIO evaluation met- out-of-vocabulary words and reduces the probability of the
rics to measure the translation quality. BLEU (Bi-Lingual occurrence of each word in the training corpus. For instance,
Evaluation Understudy) [47] measures the precision of an NMT systems treat the morphologies of the Korean verb
MT system by comparing the n-grams of a candidate transla- ‘‘to go’’ as completely different words: ‘‘ga-da,’’ ‘‘gan-da,’’
tion with those in the corresponding reference and counting ‘‘ga-yo,’’ and ‘‘gab-ni-da.’’ Hence, the Korean morpholog-
the number of matches. In this research, we use the BLEU ical analysis can improve the translation results. The dis-
metric with 4-grams. TER (Translation Error Rate) [48] is proportionate improvement of results in different translation
an error metric for MT that measures the number of edits directions occurred because we applied the morphological
required to change a system output into one of the references. analysis only to the Korean side. Therefore, the improvement
DLRATIO [49] (Damerau-Levenshtein edit distance) mea- of translations from Korean is more significant than that in
sures the edit distance between two sequences. the reverse direction.
The Korean sense-code tagging helped the NMT systems is not only useful for MT, but also for various fields
correctly align words in the parallel corpus as well as choose in Korean language processing, such as information
correct words for an input sentence. Therefore, the perfor- retrieval and semantic webs.
mance of the WSD systems further improved by an average • We proposed a method for building a fast and accurate
of 3.27 and 2.61 BLEU points for all the language pairs Korean WSD system based on UWordMap.
when translating from and to Korean, respectively. In compar- • The experimental results from bi-directional transla-
ison with the Baseline systems, the WSD systems improved tion between language pairs (Korean-English, Korean-
the translated results for all language pairs by an average French, Korean-Spanish, and Korean-Japanese) demon-
of 9.68 and 5.46 BLEU points for translations from and strate that the proposed Korean WSD system signifi-
to Korean, respectively. In summary, the proposed Korean cantly improved NMT results.
WSD can remarkably improve the translation quality of In the future, we plan to complete UWordMap with all
NMT systems. the words contained in SKLD. We further intend to insert
The TER and DLRATIO metrics provide more evidence neologisms into UWordMap because adding more words will
that the proposed Korean WSD system can improve the make the proposed Korean WSD system more accurate.
translation quality of NMT. The results in Table 10 show Because the quality of an NMT system depends on the
that the proposed Korean WSD system improved the NMT training corpus, we also plan to collect more data related to
performance by an average of 9.1 TER and 9.8 DLRATIO Korean. Additionally, we intend to study the application of a
error points when translating from Korean to the four differ- syntactic and parsing attentional model to NMT system.
ent languages. In the reverse direction, the proposed Korean
WSD improved the performance by an average of 8.8 TER REFERENCES
and 6.3 DLRATIO error points for all NMT systems. Partic- [1] L. Bentivogli, A. Bisazza, M. Cettolo, and M. Federico, ‘‘Neural versus
ularly, the Korean sense-code tagging improved translation phrase-based machine translation quality: A case study,’’ in Proc. EMNLP,
Austin, TX, USA, 2016, pp. 257–267.
error prevention by 4.04 TER points and 4.51 DLRATIO [2] M. Junczys-Dowmunt, T. Dwojak, and H. Hoang, ‘‘Is neural machine
points for all the language pairs. In short, the proposed Korean translation ready for deployment? A case study on 30 translation direc-
WSD can considerably reduce NMT errors. tions,’’ in Proc. IWSLT, Seattle, WA, USA, 2016, pp. 1–8.
[3] J. Crego et al. (Oct. 2016). ‘‘SYSTRAN’s pure neural machine translation
Furthermore, we examined some well-known MT systems
systems.’’ [Online]. Available: https://arxiv.org/abs/1610.05540
to see how they handle the Korean WSD problem. We input [4] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learn-
the sentence used in Section I, ‘‘bae-leul meog-go bae- ing with neural networks,’’ in Proc. NIPS, Montreal, PQ, Canada, 2014,
leul tass-deo-ni bae-ga a-pass-da’’ into Google Translate, pp. 3104–3112.
[5] K. Cho et al., ‘‘Learning phrase representations using RNN encoder-
Microsoft Bing Translator, and Naver Papago. The translated decoder for statistical machine translation,’’ in Proc. EMNLP, Doha, Qatar,
results are shown in Table 11. Google Translate correctly 2014, pp. 1724–1734.
translated the second and third ‘‘bae’’ but incorrectly trans- [6] N. Kalchbrenner and P. Blunsom, ‘‘Recurrent continuous translation mod-
els,’’ in Proc. EMNLP, Seattle, WA, USA, 2013, pp. 1700–1709.
lated the first ‘‘bae.’’ Microsoft Translator could not distin- [7] D. Bahdanau, K. Cho, and Y. Bengio, ‘‘Neural machine translation by
guish the different meanings of ‘‘bae,’’ and it missed a clause. jointly learning to align and translate,’’ in Proc. ICLR, San Diego, CA,
Papago also could not distinguish the different meaning of USA, 2015.
[8] H. Choi, K. Cho, and Y. Bengio, ‘‘Context-dependent word represen-
‘‘bae’’ in this sentence. None of them translated this sentence tation for neural machine translation,’’ Comput. Speech Lang., vol. 45,
correctly. pp. 149–160, Sep. 2017.
[9] R. Marvin and P. Koehn, ‘‘Exploring word sense disambiguation abilities
of neural machine translation systems,’’ in Proc. AMTA, Boston, MA, USA,
TABLE 11. Korean-to-English translation examples.
2018, pp. 125–131.
[10] M. Carpuat and D. Wu, ‘‘Word sense disambiguation vs. statistical machine
translation,’’ in Proc. 43rd Annu. Meeting ACL, Ann Arbor, MI, USA,
2005, pp. 387–394.
[11] D. Vickrey, L. Biewald, M. Teyssier, and D. Koller, ‘‘Word-sense disam-
biguation for machine translation,’’ in Proc. HLT/EMNLP, Vancouver, BC,
Canada, 2005, pp. 771–778.
[12] M. Carpuat and D. Wu, ‘‘Improving statistical machine translation
using word sense disambiguation,’’ in Proc. EMNLP-CoNLL, Prague,
Czech Republic, 2007, pp. 61–72.
[13] D. Xiong and M. Zhang, ‘‘A sense-based translation model for statistical
machine translation,’’ in Proc. 52nd Annu. Meeting ACL, Baltimore, MD,
USA, 2014, pp. 1459–1469.
[14] J. Su, D. Xiong, S. Huang, X. Han, and J. Yao, ‘‘Graph-based collective
lexical selection for statistical machine translation,’’ in Proc. EMNLP,
Lisbon, Portugal, 2015, pp. 1238–1247.
[15] S. Neale, L. Gomes, E. Agirre, O. L. de Lacalle, and A. Branco, ‘‘Word
VII. CONCLUSION sense-aware machine translation: Including senses as contextual features
In this research, we have presented the following three for improved translation models,’’ in Proc. LREC, Ljubljana, Slovenia,
accomplishments: 2016, pp. 2777–2783.
[16] R. Sudarikov, O. Dušek, M. Holub, O. Bojar, and V. Kríž, ‘‘Verb sense
• We constructed the biggest and most comprehensive disambiguation in machine translation,’’ in Proc. COLING, Osaka, Japan,
LSN for the Korean language — UWordMap, which 2016, pp. 42–50.
[17] Š. Vintar and D. Fišer, ‘‘Using WordNet-based word sense disambiguation [44] B. Zhang, D. Xiong, J. Su, and H. Duan ‘‘A context-aware recurrent
to improve MT performance,’’ in Hybrid Approaches to Machine Transla- encoder for neural machine translation,’’ IEEE/ACM Trans. Audio, Speech,
tion. Springer, 2016, pp. 191–205. Language Process., vol. 25, no. 12, pp. 2424–2432, Dec. 2017.
[18] A. Rios, L. Mascarell, and R. Sennrich, ‘‘Improving word sense disam- [45] J. Su et al., ‘‘A hierarchy-to-sequence attentional neural machine transla-
biguation in neural machine translation with sense embeddings,’’ in Proc. tion model,’’ IEEE/ACM Trans. Audio, Speech, Language Process., vol. 26,
WMT, Copenhagen, Denmark, 2017, pp. 11–19. no. 3, pp. 623–632, Mar. 2018.
[19] F. Liu, H. Lu, and G. Neubig. (Mar. 2018). ‘‘Handling homographs [46] G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. (2017). ‘‘Open-
in neural machine translation.’’ [Online]. Available: https://arxiv.org/abs/ NMT: Open-source toolkit for neural machine translation.’’ [Online].
1708.06510 Available: https://arxiv.org/abs/1701.02810
[20] G. A. Miller, ‘‘WordNet: A lexical database for English,’’ Commun. ACM, [47] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, ‘‘BLEU: A method
vol. 38, no. 11, pp. 39–41, 1995. for automatic evaluation of machine translation,’’ in Proc. ACL, 2002,
[21] P. Vossen, ‘‘Introduction to EuroWordNet,’’ Comp. Humanities, vol. 32, pp. 311–318.
nos. 2–3, pp. 73–89, Mar. 1998. [48] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul, ‘‘A study
of translation edit rate with targeted human annotation,’’ in Proc. Assoc.
[22] D. Tufis, D. Cristea, and S. Stamou, ‘‘BalkaNet: Aims, methods, results and
MT Americas, Boston, MA, USA, 2006, pp. 223–231.
perspectives. A general overview,’’ Romanian J. Inf. Sci. Technol., vol. 7,
[49] G. V. Bard, ‘‘Spelling-error tolerant, order-independent pass-phrases via
nos. 1–2, pp. 9–43, 2004.
the Damerau-Levenshtein string-edit distance metric,’’ in Proc. ACSW,
[23] Z. Dong and Q. Dong, HowNet and the Computation of Meaning.
Ballarat, VIC, Australia, 2007, pp. 117–124.
River Edge, NJ, USA: World Scientific, 2006.
[50] Q. P. Nguyen, J. C. Shin, and C. Y. Ock, ‘‘Korean morphological anal-
[24] A. S. Yoon et al., ‘‘Construction of Korean WordNet,’’ J. KIISE, Softw. ysis for Korean-Vietnamese statistical machine translation,’’ Elect. Sci.
Appl., vol. 36, no. 1, pp. 92–126, 2009. Technol., vol. 15, no. 4, pp. 413–419, 2017.
[25] C. K. Lee and G. B. Lee, ‘‘Using WordNet for the automatic construction
of Korean thesaurus,’’ in Proc. 11th Annu. Conf. Hum. Lang. Techn., 1999, QUANG-PHUOC NGUYEN received the B.S.
pp. 156–163.
degree in information technology from the Univer-
[26] K.-S. Choi, ‘‘CoreNet: Chinese-Japanese-Korean WordNet with shared
sity of Natural Sciences, part of Vietnam National
semantic hierarchy,’’ in Proc. NLP-KE, Beijing, China, 2003, pp. 767–770.
University, Ho Chi Minh City, Vietnam, in 2005,
[27] M. Choi, J. Hur, and M.-G. Jang, ‘‘Constructing Korean lexical concept
and the M.S. degree in information technology
network for encyclopedia question-answering system,’’ in Proc. ECON,
from Konkuk University, Seoul, South Korea,
Busan, South Korea, 2004, pp. 3115–3119.
in 2010. He is currently pursuing the Ph.D.
[28] Y. J. Bae and C. Y. Ock, ‘‘Introduction to the Korean word map
(UWordMap) and API,’’ in Proc. 26th Annu. Conf. Human Lang. Technol., degree with the University of Ulsan, Ulsan, South
2014, pp. 27–31. Korea. His research interests include natural lan-
[29] D. A. Cruse, Lexical semantics. Cambridge, U.K.: Cambridge Univ. Press, guage processing, machine learning, and machine
1986. translation.
[30] M. J. Kim, ‘‘Does Korean have adjectives?’’ MIT Work. Papers Linguistics,
ANH-DUNG VO received the B.S. degree in com-
vol. 43, pp. 71–89, 2002.
puter science from the Hanoi University of Science
[31] A. S. Yoon, ‘‘Korean WordNet, KorLex 2.0—A language resource for
and Technology, Vietnam, in 2010, and the M.S.
semantic processing and knowledge engineering,’’ in Proc. HAN-GEUL,
vol. 295, 2012, pp. 163–201. degree in information technology from the Univer-
[32] S. Ikehara et al., ‘‘The semantic system, volume 1 of Goi–Taikei— sity of Ulsan, Ulsan, South Korea, in 2013, where
A Japanese Lexicon,’’ Iwanami Shoten, Tokyo, Japan, Tech. Rep., 1997. he is currently pursuing the Ph.D. degree. His
[33] J. C. Shin and C. Y. Ock, ‘‘Korean homograph tagging model based on research interests include natural language pro-
sub-word conditional probability,’’ KIPS Trans. Softw. Data Eng., vol. 3, cessing, machine learning, and sentiment analysis.
no. 10, pp. 407–420, 2014.
[34] J. C. Shin and C. Y. Ock, ‘‘A Korean morphological analyzer using a
pre-analyzed partial word-phrase dictionary,’’ KIISE, Softw. Appl., vol. 39, JOON-CHOUL SHIN received the B.S., M.Sc.,
no. 5, pp. 415–424, 2012. and Ph.D. degrees in information technology from
[35] J. C. Shin and C. Y. Ock, ‘‘Improvement of Korean homograph disam- the University of Ulsan, Ulsan, South Korea,
biguation using Korean lexical semantic network (UWordMap),’’ J. KIISE, in 2007, 2009, and 2014, respectively. He is cur-
vol. 43, no. 1, pp. 71–79, 2016. rently a Post-Doctoral Researcher with the Uni-
[36] H. Kim, ‘‘Korean national corpus in the 21st century Sejong project,’’ in versity of Ulsan. His research interests include
Proc. NIJL, Tokyo, Japan, 2006, pp. 49–54. Korean language processing, document clustering,
[37] J. C. Shin and C. Y. Ock, ‘‘A stage transition model for Korean part- and software engineering.
of-speech and homograph tagging,’’ KIISE, Softw. Appl., vol. 39, no. 11,
pp. 889–901, 2012.
[38] M. Y. Kang, B. Kim, and J. S. Lee, ‘‘Word sense disambiguation using CHEOL-YOUNG OCK received the B.S., M.S.,
embedded word space,’’ J. Comput. Sci. Eng., vol. 11, no. 1, pp. 32–38, and Ph.D. degrees in computer engineering from
Mar. 2017.
the National University of Seoul, South Korea,
[39] J. H. Min, J. W. Jeon, K. H. Song, and Y. S. Kim, ‘‘A study on word sense
in 1982, 1984, and 1993, respectively, and the
disambiguation using bidirectional recurrent neural network for Korean
Honorary Doctorate degree from the School of IT,
language,’’ J. Korea Soc. Comput. Inf., vol. 22, no. 4, pp. 41–49, 2017.
National University of Mongolia, in 2007. He has
[40] M. A. Castano, F. Casacuberta, and E. Vidal, ‘‘Machine translation using
neural networks and finite-state models,’’ in Proc. 7th Inter. Conf. TMI,
been a Visiting Professor with the Russia Tomsk
1997, pp. 160–167. Institute, Russia, in 1994, and Glasgow University,
[41] M. L. Forcada and R. P. Ñeco, ‘‘Recursive hetero-associative memories U.K., in 1996. He was a Chairman of sigHCLT
for translation,’’ in Proc. Inter. Work-Conf. Artif. Neural Netw., Berlin, (2007–2008) in KIISE, South Korea. He has been
Germany, 1997, pp. 453–462. a Visiting Researcher with the National Institute of Korean Language, South
[42] M.-T. Luong, H. Pham, and C. D. Manning, ‘‘Effective approaches to Korea, since 2008. He is currently a Professor with the School of IT Con-
attention-based neural machine translation,’’ in Proc. EMNLP, Lisbon, vergence, University of Ulsan, South Korea. He has been constructing the
Portugal, 2015, pp. 1412–1421. Ulsan Word Map since 2002. His research interests include natural language
[43] S. Jean, O. Firat, K. Cho, R. Memisevic, and Y. Bengio, ‘‘Montreal neural processing, machine learning, and text mining. He received the Medal for
machine translation systems for WMT’15,’’ in Proc. 10th Workshop SMT, Korean development from the Korean Government in 2016.
Lisbon, Portugal, 2015, pp. 134–140.