Bidirectional Long Short-Term Memory For Automatic English To Kannada Back-Transliteration
Bidirectional Long Short-Term Memory For Automatic English To Kannada Back-Transliteration
Bidirectional Long Short-Term Memory For Automatic English To Kannada Back-Transliteration
1 Introduction
identical alphabet set. However, for languages which practice nonidentical set of
alphabets, words have to be transliterated or portrayed in the native language alpha-
bets.
Majority of multilingual web users have a tendency to represent their native lan-
guages in Roman script on social media platforms. In spite of many recognized
transliteration standards, there is an intense inclination to use unofficial transliter-
ation standards in many websites, social media, and blog sites. There are ample of
issues such as spelling variation, diphthongs, doubled letters, and reoccurring con-
structions which are to be taken care while transcribing.
Neural Networks is a rapidly advancing approach to machine learning [1, 2] and
has shown promising performance when applied to a variety of tasks like image
recognition, speech processing, natural language processing, cognitive modeling,
and so on. It involves using neural networks for training a model for a specific task.
This paper demonstrates the application of neural network for machine transliteration
of English–Kannada, two linguistically distant and widely spoken languages.
The rest of this paper is arranged as follows. Section 2 describes prior work in this
area. An introduction to LSTM and BLSTM is described in Sect. 3. The methodology
adopted to build corpus is presented in Sect. 3. Proposed transliteration network
is portrayed in Sect. 4. Section 5 provides details of results obtained. Section 6
communicates conclusion and future work of the proposed method.
2 Previous Research
Research on Indic languages within the perspective of social media is quite ample,
with numerous studies concentrating on code-switching has become a quite familiar
phenomenon. There are a few substantial works being done on Transliteration or,
more precisely, back-transliteration of Indic languages [3, 4]. A shared task which
included, back-transliteration of Romanized Indic language words to its native scripts
was run in 2014 [5, 6]. In many areas, including machine transliteration, end-to-end
deep learning models have become a good alternative to more traditional statistical
approaches. A Deep Belief Network (DBN) was developed to transliterate from
English to Tamil with restricted corpus [7] and obtained an accuracy of 79.46%. A
character level attention-based encoder in deep learning was proposed to develop a
transliteration model for English–Persian [8]. The model presented a good accuracy,
with BLEU score of 76.4.
In [9], authors proposed transliteration of English to Malayalam using phonemes.
English–Malayalam pronunciation dictionary was used to map English graphemes
to Malayalam phonemes. Performance of the model was fairly good for phonemes
in pronunciation dictionary. However, it suffered from out-of-vocabulary issue when
a word is not in pronunciation dictionary.
The most essential requisite of transliterator is to retain the phonetic structure
of source language after transliterating in target language. Different transliteration
techniques for Indian languages were proposed. In [10], input text was split into
Bidirectional Long Short-Term Memory … 279
phonemes and was classified using Support Vector Machine (SVM) algorithm. Most
of the methods adopted features like n-grams [11], Unicode mapping [12], or a
combination-based approach by combining phoneme extraction and n-grams [13,
14].
Antony et al. [15–17] have proposed Named Entities (NE) transliteration tech-
niques from English to Kannada. In [15, 16], authors adopted a statistical approach
using widely available tools such as Mosses and Giza++ which yielded an accuracy
of about 89.27% for English names. System was also evaluated by comparing with
Google transliterator. A training corpus of 40,000 Named Entities was used to train
SVM algorithm [17] and obtained an accuracy of 87% for 1000 test dataset.
RNNs have been employed to produce promising results on a variety of tasks includ-
ing language model [18] and speech recognition. A RNN foresees present output
based on the preserved memories of its past information. RNNs are designed for
capturing information from sequences or time series data.
RNN networks [19] comprises an input layer, hidden layer, and an output layer
where each cell preserves a memory of previous time. Figure 1 demonstrates a simple
RNN model where X 0 , X 1 , X 2 are inputs at timestamps t 0 , t 1 , t 2 and hidden layer
units are h0 , h1 , h2 :
The new state (ht ) of RNN at time t is a function of its previous state at time t −
1 (ht −1 ) and the input at time t (x t ). Output (yt ) of the hidden layer units at time t are
calculated using new state calculated and the weight matrix. The maths behind RNN
to calculate output from hidden and output layers are as follows:
h (t) gh Wi X (t) + W R h (t−1) + bh (1)
Y (t) g y WY h (t) + b y (2)
where W Y , W R, and Wi are weights which are to be calculated during training phase,
gh and gy are activation functions computed using Eqs. (3) and (4) respectively and
bh and by are bias. RNN uses backpropagation algorithm, but it is applied for every
time stamp. It is commonly known as Backpropagation Through Timestamp (BTT).
The dimensionality of output layer is same as labels and also characterizes likelihood
distribution of labels at time t.
1
gh (3)
1 + e−z
ezm
g y zk (4)
ke
where
σ logistic sigmoid function
i input gate
f = forget gate
o output gate
c cell vectors
h hidden vector
W weight matrix
LSTM is implemented as the following:
• Primary step in the LSTM is to decide the data to be neglected from the cell state
which is made by forget gate layer. This decision is made by a sigmoid layer
called the “forget gate layer”. It is a function of ht −1 and xt as shown in Eq. (5),
and outputs a number between 0 and 1 for each number in the cell state ct −1 . A
1 represents “completely keep this” while a 0 represents “completely get rid of
this.”
f t σ (W f h t−1 , X t + b f ) (5)
• The second step is to decide the new data to be stored. This has two steps, input
obtained from the previous timestamp and the new input are passed through a
sigmoid function called “input gate layer” to get it as shown in Eq. (6). Next, input
obtained from the previous timestamp and the new input are passed through a tanh
function. Both the steps are combined with ft passed from the previous step as in
Eq. (7).
i t σ (Wi h t−1 , X t + bi ) (6)
ct f t ct−1 + i t tanh (Wc h t−1 , X t + bc ) (7)
• Last step is to obtain output using Eqs. (8) and (9), which is based on cell state.
First, a sigmoid layer decides what parts of the cell state are to be outputted. Then,
the tanh function pushes cell state values between −1 to 1, which is multiplied by
the output of the sigmoid gate, so that only decided parts happen to be the output.
ot σ Wo h t−1 , X t + bo (8)
ot σ Wo h t−1 , X t + bo (9)
282 B. S. Sowmya Lakshmi and B. R. Shambhavi
Sequence learning task requires previous and forthcoming input features at time t.
Hence, BLSTM network is used to utilize previous features and upcoming features
at a given time t. BLSTM [20] network hidden layer contains a sequence of forward
and backward recurrent neural components connected to the identical output layer.
Figure 3 shows a simple BLSTM network of four input units X 0 to X 4 . Network
hidden layer has four recurrent components h0 to h3 in the forward direction and
four recurrent components ho to h3 in the backward direction to help predict output
Y 0 to Y 3 by forming an acyclic graph. Most of the text processing task BLSTM
would provide reasonable results in the prediction of sequence of data.
4 Dataset Collection
• Majority of the bilingual words were collected from music lyrics websites which
consist of song lyrics in Kannada script and its corresponding song lyrics in Roman-
ized Kannada. Non-Kannada words, punctuations, and vocalize words in song
lyrics were removed. Obtained list comprehends viable syllable patterns in Kan-
nada and contributed around 70% of the training data.
• The subsequent share of the corpus was manually transliterated NEs.
5 Experiments
5.1 Setup
The proposed approach is implemented on python platform and packages used are
numpy and neural network toolkit keras with Tensorflow in the backend. The network
parameters are set up as in Table 1.
6 Results
Model was tested for the dataset of around 3 K words collected from random websites.
Test dataset contains Romanized words and its transliterated words in Kannada script
which is kept as reference to compare with the result.
Snapshot of results obtained from the model is shown in Table 2. The correctness
of the transliteration is measured by Accuracy (ACC) or Word Error Rate (WAR)
yielded by a transliteration model. For completeness, other transliteration results
obtained by RNN and LSTM networks which are trained for the same datasets are
reported in Table 3.
Bidirectional Long Short-Term Memory … 285
Setuve Correct
Aggalikeya Correct
Sadbhava Incorrect
Anaupacharika Incorrect
References
1. Kalchbrenner, N., & Blunsom, P. (2013). Recurrent continuous translation models. In Proceed-
ings of the 2013 Conference on Empirical Methods in Natural Language Processing.
2. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural
networks. In Advances in neural information processing systems.
3. Sharma, A., & Rattan, D. (2017). Machine transliteration for indian languages: A review.
International Journal, 8(8).
4. Dhore, M. L., Dhore, R. M., Rathod, P. H. (2015). Survey on machine transliteration and
machine learning models. International Journal on Natural Language Computing (IJNLC),
4(2).
5. Sequiera, R. D., Rao, S. S., Shambavi, B. R. (2014). Word-level language identification and
back transliteration of romanized text: A shared task report by BMSCE. In Shared Task System
Description in MSRI FIRE Working Notes.
6. Choudhury, M. et al. (2014). Overview of fire 2014 track on transliterated search. Proceedings
of FIRE, 68–89.
7. Sanjanaashree, P. (2014). Joint layer based deep learning framework for bilingual machine
transliteration. In 2014 International Conference on Advances in Computing, Communications
and Informatics (ICACCI). IEEE.
8. Mahsuli, M. M., & Safabakhsh, R. (2017). English to Persian transliteration using attention-
based approach in deep learning. In 2017 Iranian Conference on Electrical Engineering (ICEE).
IEEE.
9. Sunitha, C., & Jaya, A. (2015). A phoneme based model for english to malayalam translitera-
tion. In 2015 International Conference on Innovation Information in Computing Technologies
(ICIICT). IEEE.
10. Rathod, P. H., Dhore, M. L., & Dhore, R. M. (2013). Hindi and Marathi to English machine
transliteration using SVM. International Journal on Natural Language Computing, 2(4),
55–71.
11. Jindal, S. (2015, May). N-gram machine translation system for English to Punjabi translit-
eration. International Journal of Advances in Electronics and Computer Science, 2(5). ISSN
2393-2835.
12. AL-Farjat, A. H. (2012). Automatic transliteration among indic scripts using code mapping
formula. European Scientific Journal (ESJ), 8(11).
13. Dasgupta, T., Sinha, M., & Basu, A. (2013). A joint source channel model for the English to
Bengali back transliteration. In Mining intelligence and knowledge exploration (pp. 751-760).
Cham: Springer.
14. Dhindsa, B. K., & Sharma, D. V. (2017). English to Hindi transliteration system using
combination-based approach. International Journal, 8(8).
15. Antony, P. J., Ajith, V. P., & Soman. K. P. (2010). Statistical method for English to Kannada
transliteration. In Information Processing and Management (pp. 356–362). Berlin, Heidelberg:
Springer.
16. Reddy, M. V., & Hanumanthappa, M. (2011). English to Kannada/Telugu name transliteration
in CLIR: A statistical Approach. International Journal of Machine Intelligence, 3(4).
17. Antony, P. J., Ajith, V. P., & Soman, K. P. (2010). Kernel method for English to Kannada
transliteration. In 2010 International Conference on Recent Trends in Information, Telecom-
munication and Computing (ITC). IEEE.
Bidirectional Long Short-Term Memory … 287
18. Mikolov, T. et al. (2011). Extensions of recurrent neural network language model. In 2011
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.
19. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.
20. Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Trans-
actions on Signal Processing, 45(11), 2673–2681.