Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

OFF-LINE YORUBA HANDWRITING SENTENCE RECOGNITION USING CHERIET RECOGNITION ALGORITHM

2017

Handwriting is the human way of communicating using written media. Nowadays, there are a lot of changes of technology in terms of communication. Handwriting offers an attractive and efficient method to interact with computer, in order to enhance human computer interaction. This research work focus on Yoruba handwriting sentence recognition using Cheriet algorithm. Many handwriting recognition system have been developed but these only captured Yoruba handwriting word, there is need to develop Yoruba handwriting sentence recognition system. Four methods were adopted for the recognition process as follows: Data acquisition, image processing, feature extraction and recognition. Fifty handwriting were acquired from literate indigenous writers, which was subjected to image preprocessing to enhance the quality of the digitized image. The features of the pre-processed image were extracted using Surf Algorithm and the extracted feature vectors were subjected to Cheriet algorithm for recognition. The recognition system developed was evaluated based on the recognition rate and 92.8% recognition rate was achieved. 2

OFF-LINE YORUBA HANDWRITING SENTENCE RECOGNITION USING CHERIET RECOGNITION ALGORITHM BY YAHAYA ABDULLAHI Department of Computer Science Kwara State University, Malete and JUMOKE F. AJAO Department of Computer Science Kwara State University, Malete MAY, 2017 Abstract: Handwriting is the human way of communicating using written media. Nowadays, there are a lot of changes of technology in terms of communication. Handwriting offers an attractive and efficient method to interact with computer, in order to enhance human computer interaction. This research work focus on Yoruba handwriting sentence recognition using Cheriet algorithm. Many handwriting recognition system have been developed but these only captured Yoruba handwriting word, there is need to develop Yoruba handwriting sentence recognition system. Four methods were adopted for the recognition process as follows: Data acquisition, image processing, feature extraction and recognition. Fifty handwriting were acquired from literate indigenous writers, which was subjected to image preprocessing to enhance the quality of the digitized image. The features of the pre-processed image were extracted using Surf Algorithm and the extracted feature vectors were subjected to Cheriet algorithm for recognition. The recognition system developed was evaluated based on the recognition rate and 92.8% recognition rate was achieved. 1 I. INTRODUCTION Handwriting is the human way of communicating using written media. Nowadays, there are a lot of changes of technology in terms of communication. Handwriting offers an attractive and efficient method to interact with computer, such as a tool developed at this time that is able to receive input in the form of handwriting data. The tool also requires a method to recognize input in the form of handwriting data. Research on handwriting recognition issue becomes important to provide solutions to some of these problems. There is difference in the writing patterns of individuals; so computers need to recognize the different of handwriting patterns of individuals regardless of how it is written (Lulu, Widodo and Cipta 2014). Handwriting recognition can be classified as e i t h e r off-line o r o n -line. In off-line recognition, the writing is usually captured optically by a scanner and the completed writing is available as an image. But, in the on-line system the two dimensional coordinates of successive points a r e represented as a function of time and the order of strokes made by the writer are also available (Anita, 2012). Several applications including mail sorting, bank processing, document reading and postal address recognition require off-line handwriting recognition systems. As a result, the off-line handwriting recognition continues to be an active area for research towards exploring the newer techniques that would improve recognition accuracy. This Research work used an off-line system; the technique to be adopted yielded comparably high recognition accuracy levels. Although many highly accurate systems have been developed to recognize handwritten numerals and characters but such system has not been extended to Yoruba Handwriting sentence recognition. This has been ascribed to the difficult nature of Yoruba handwriting, including the diversity of character patterns, ambiguity and illegibility of characters, and the overlapping nature of many characters in Yoruba word due to the presence of diacritic sign (Gader, 2013). This research attempts to evaluate the 2 effect of recognition of Yoruba Handwriting Sentence through the use of Cheriet Algorithm. Statement of the Problem Many Yoruba Handwritten recognition system developed catered for Yoruba characters and Yoruba words recognition. There is need to extend the recognition to Yoruba sentence. The research work will focus on recognition of Yoruba sentence using Cheriet algorithm. Aim of the Study The aim of this research work is to develop a system that can recognize Yoruba handwriting sentence using Cheriet Algorithm. Objectives of the Study i. To acquire Yoruba sentence to be recognize ii. To design a recognition system for the Yoruba sentence. iii. To implement the system design in ii using Java programming language iv. To evaluate the recognition system using the recognition rate Significance of the Study In order to save the indigenous from extinction, it is very important to incorporate Yoruba recognition system to information technology Scope of the Study This research focused on the offline handwritten technology in which samples are collected from learned Yoruba writers. The system correct any mistake i.e. irrelevant information in handwritten word due to user dependency. The system was developed as a standalone such that anyone can install and use it on a personal computer (PC). 3 II. Yoruba Language Yoruba is a language spoken in West Africa, mainly in Nigeria. The number of speakers of Yoruba is approaching 30 million (Mikael, 2007). The Yoruba dialect consists of several dialects. The various Yoruba dialects in the Yoruba land of Nigeria can be classified into five major dialect areas: Northwest, Northeast, Central, Southwest and Southeast (Adetugbọ, 1982). Clear boundaries cannot be drawn, peripheral areas of dialectal regions often having some similarities to adjoining dialects. The pronunciation of the letters without diacritics corresponds more or less to their International Phonetic Alphabet equivalents, except for the labialvelar stops [k͡p] and [ɡ͡b], in which both consonants are pronounced simultaneously rather than sequentially. The diacritic underneath vowels indicates an open vowel, pronounced with the root of the tongue retracted (so “ẹ” is pronounced [ɛ̙] and ọ is [ɔ̙]). “ṣ” represents a post alveolar consonant [ʃ] like the English “sh”, “y” represents a palatal approximant like English y, and j a voiced palatal plosive [ɟ], as is common in many African orthographies. Yoruba alphabet can be broken down into three tiers: i. Tonal Tier ii. The Under dot Tier iii. The Character Tier and The diagraph 4 A B D E Ẹ F G Gb H I J K L M N O Ọ P R S Ṣ T U W Y a b d e ẹ F g gb H i j k l m n o ọ p r S ṣ t u w y Figure 2.1: The current orthography of Yoruba Alphabets without diacritical marks Á À Ā É È Ē Ẹ/ Ẹ́ / Ẹ̀ / Ẹ̄ / E̩ É̩ È̩ Ē̩ Ọ/ Ọ́/ Ọ̀ / Ọ̄ / O̩ Ó̩ Ò̩ Ō̩ Í Ì Ī Ó Ò Ō á à ā é è Ē ẹ / e̩ ẹ́ / é̩ ẹ̀ / è̩ ẹ̄ / ē̩ í ì ī ó ò ō ọ / o̩ Ṣ/ S̩ ọ́ / ó̩ ọ̀ / ò̩ ọ̄ / ō̩ ú ù ū ṣ / s̩ Figure 2.2: The current orthography of Yoruba Alphabets with diacritical marks 5 Ú Ù Ū III Literature Review Researchers have utilized many different approaches for both the recognition and segmentation tasks of word recognition. Some researchers have used conventional, heuristic techniques for both sentence segmentation and recognition. Jumoke F., Stephen O. Elijah O. and Odetunji O. makes use of the Information Theory Approach to solve Yoruba Handwriting Word Recognition Quality Evaluation of Pre-processing Attributes, an Entropy preprocessing stage measure closer to the entropy measure of the original Information was obtained. Hong Lee, and Brijesh Verma uses the approach of Binary segmentation algorithm for Segmentation of English cursive handwriting recognition and achieved a recognition rate of 83.1%. Ajao JF, Jimoh RG, and Olabiyisi uses the artificial neural network for handwritten recognition system of destination address on envelopes and achieved a recognition rate of 96%. Blumenstein and Verma (2013) uses the Neuro-heuristic algorithm to conduct a research for varieties of New Segmentation Algorithm for Handwritten Word Recognition and obtained a results of up to 76.52% for a test set of segmentation point patterns. Femwa O. conducted a research on Development of a writer-independent online handwritten character recognition using modified hybrid neural network (geometrical and statistical features) and obtained a recognition rate of 96%. Vijay Laxmi Sahu and Babita Kubde work on Offline Handwritten Character Recognition Techniques using Neural Network and achieved an accuracy of 97%. 6 METHODOLOGY Data Acquisition Data were acquired from adult indigenous writers. The recognition of the Yoruba sentence was computed. The approach is aimed at recognizing this Yoruba sentence using the Cheriet algorithm after passing through the preprocessing stages. This is to provide information required in the preprocessing stage as a preparatory level for the handwritten Yoruba recognition system to be developed. Samples of Yoruba words in handwritten collected from indigenous writers is presented in figure 3.1 This research uses the concept of word recognition with the use of Cheriet Algorithm where the characters are recognized using pre-processed character database. Figure 3.2 shows the system flow chat for Yoruba sentence recognition. 7 Figure 3. 1 Samples of Yoruba sentence collected START LOAD SENTENCE FROM IMAGE FOLDER CONVERT IMAGE TO GRAY SCALE PREPROCESSING BINARIZE GRAY SCALE IMAGE FEATURE EXTRACTION NOT RECOGNIZED RECOGNITION NO / YES RECOGNIZED 8 DISPLAY SENTENCE OPEN AS NOTEPAD OPEN AS MS-WORD STOP Figure 3. 2: System Flowchart 9 Pre-Processing The aim of preprocessing stage is the removal of all elements in the word image that are not useful for recognition process (Benouareth, 2008). The generic image pre-processing involves the following series of stages that are carried out in order to extract important information from an image and thereby reducing unwanted information from the image so as to enhance the recognition process. The preprocessing carried on the Yoruba sentence recognition are as follows: i.Conversion of RGB to Grayscale: Conversion of RGB to Grayscale coverts image scanned into the computer to Grayscale image to enhance detail information needed from the image. Given that RGB value of a color is (R, G, B) such that R, G, and B are integers between 0 and 255. The grayscale weighted average, X, is given by the formula: X = 0.299R + 0.587G + 0.114B (1) ii.Binarization: Binarization is the transformation of a grayscale image into a black and white image through thresholding inspired by Niblack’s algorithm. It calculates a pixel-wise threshold by sliding a rectangular window over the gray level image given by the equation 1: TNiblack = m + k * s TNiblack = m + k √ 1 𝑁𝑃 TNiblack = m + k √ (2) ∑(𝑝𝑖 − 𝑚)2 ∑ 𝑃𝑖 2 𝑁𝑃 − 𝑚 2 = 𝑚 + 𝑘 √𝐵 (3) (4) Where NP is the number of pixels in the gray image, m is the average value of the pixels pi, and k is a fixed constant. 10 Niblack ALGORITHM import os, sys from PIL import Image inFile = '' outFile = '' if len(sys.argv) != 3: print 'Input format error!' else: inFile = sys.argv[1] outFile = sys.argv[2] im = Image.open(inFile).convert('L') for i in range(im.size[0]): for j in range(im.size[1]): if im.getpixel((i,j)) > 127: im.putpixel((i,j), 255) else: im.putpixel((i,j), 0) im.show() im.save(outFile) Proposed Recognition Algorithm Method The original grey scaled image is binarized using Cheriet algorithm by selecting automatically a threshold value for a given image. Finally, the image is converted to skeleton format to allow users verity of writing device, pen tilt and to suppress extra data. The proposed recognition algorithm is explained in the figure 3.2. Step 1. Take sentence image from the database. Step 2. Perform pre-processing on taking image. Step 2.1 convert image sentence to gray scale Step 2.2 convert gray scale image to binarization using the Niblack approach Step 3. Perform recognition of sentence 11 Due to the simplicity of the proposed recognition technique, it is very fast and performs well in most of the cases. But has a limitation in it recognition of some characters with diacritical marks such as m, n, u, v and w in which over segmentation occurs and this technique fails to find accurate character boundaries. Cheriet Algorithm In this algorithm, the words is first decomposed in two parts; main body of letters and secondary parts (the diacritics parts) by contour following. The primary parts undergoes a horizontal decomposition distinguishing two areas: median and overflowing zones as shown in figure 3.3. The overflowing zones contain ascenders and descenders, while the median contain valleys. Valleys are the horizontal segments connecting adjacent peaks. It can be seen from the figures that Cheriet algorithm is good except for few characters with under dot, where over segmentation occurs. Therefore it is required to integrate this technique with some intelligent method to increase its performance. It is mention worthy that over segmentation is minimum and occurs for few characters only. Hence it lessened burden of the algorithm used and therefore processing speed increased. 1-a) Original Image 1-b) 2-b) 2-a) Exterior Contours Overlapping Zones Exterior Contours Figure 3. 3: 1) Exterior contour extraction 12 2) Decomposition in median and overflowing zones Word Extraction This stage aims to remove the white spaces around each word and extract the effective image size by moving through the black pixels and when it finds white space more than 100 pixels that means the word is finished and it starts another word. The main goal of feature extraction is to reduce the data dimensionality and properly represent the original data in feature space. Features useful for classification process can be simple features like RGB values in color images, or complex features like energies from the Fourier Transform or Wavelet Transform of a time series. The feature extraction process usually consists of three steps 1. Feature construction is the step in which features are constructed from linear or non-linear combination of raw features. 2. feature selection process is done using techniques like relevancy ranking of individual features 3. Feature reduction process is used to reduce the no. of features especially when too many features are selected compared to the no. of feature vectors. These three steps are not mandatory in the feature extraction process. 13 Algorithm for Surf do do do for all interest area in given input image, calculate Hessian Matrix H (5×5) end Identify two interest area with same determinant value; Mark as a feature; end divide the feature (interest area) into 4×4 subarea; find deviation in x and y axis (estimating transformation); get the angle of deviation as a trace of Hessian matrix; recover the original image by inverse transformation; end 14 The SURF Algorithm 1. Detection  Automatically identify interesting features Where 𝐻 (𝑋, 𝜎) = [ 𝐿𝑥𝑥 (𝑋, 𝜎) 𝐿𝑥𝑦 (𝑋, 𝜎) 𝐿𝑋𝑦 (𝑋, 𝜎) ] 𝐿𝑦𝑦 (𝑋, 𝜎) (5) 𝜕2 𝑔(𝜎) 𝜕𝑥 2 𝜕2 𝐿𝑥𝑦 (𝑋, 𝜎) = I(X) ∗ 𝑔(𝜎) 𝜕𝑥 2 𝐿𝑥𝑥 (𝑋, 𝜎) = I(X) ∗ 2. Description  Each interest point should have a unique description that does not depend on the features scale and rotation. 𝑣 = {∑ 𝑑𝑥, ∑|𝑑𝑥|, ∑ 𝑑𝑦, ∑|𝑑𝑦|} (6) 3. Matching  Given and input image, determine which objects it contains, and possibly a transformation of the object, based on predetermined interest points. ∇2 𝐿 = 𝑡𝑟(ℋ) = 𝐿𝑥𝑥 (𝑋, 𝜎) + 𝐿𝑦𝑦 (𝑋, 𝜎) 15 (7) Performance Evaluation of the System The performance of the system is measured by the recognition rate achieved by the developed system. The Cheriet algorithm used in the development of the system will be evaluated by performing a recognition test in finding the percentage of sentence recognized by the system and those not recognized. The performance metrics that will be used for the testing is given below: The over-all Recognition rate is given thus, ∑ 𝑅𝐸 R.R = ∑ 𝑇𝑆 × 100 (8) Where 𝑅𝐸 is recognized sample and 𝑇𝑆 is tested sample, and R.R is Recognition Rate. The over-all Not-Recognition rate is given as: ∑ 𝑁𝑇 N.R = ∑ 𝑇𝑆 × 100 (9) Where 𝑁𝑇 is not-recognized sample and 𝑇𝑆 is tested sample 16 RESULT AND DISCUSSION Testing and Result Data were acquired from adult indigenous writers. The recognition of the Yoruba sentence was computed. The approach is aimed at recognizing this Yoruba sentence using the Cheriet algorithm after passing through the preprocessing stages. This is to provide information required in the preprocessing stage as a preparatory level for the handwritten Yoruba recognition system to be developed. Table 4.1 and 4.2 shows the detailed description of samples gathered, tested and the results obtained from each test. Discussion of Results To this end, this research project has been able to achieve it intended purpose based on the result obtained on it usability. Many researchers mention recognition and segmentation as a part of their overall systems, however few report their findings at the segmentation level. This research has focused on this very important area and has produced commendable results which can easily be compared to other researchers in the field. 17 Author Problem Solved Technique Method Sample Collected Result Obtained Blumenstein and Verma varieties of New Segmentation Algorithm for Handwritten Word Recognition Neuroheuristic algorithm set of segmentatio n point patterns. obtained a results of up to 76.52% Jumoke F., Stephen O., Elijah O., and Oladayo O. Offline Yoruba Handwritten Word Recognition Hidden Markov Model Approach Yoruba written word was gathered achieved a recognition rate of 95.6% Han and Sethi Segmentation of words on 50 envelopes from real mail pieces Achieved an accuracy of 85.7% Eastwood et al. recognition of handwriting from CEDAR segmentati on of words on 50 envelopes Sample Handwritte n gathered from CDROM Yahaya Abdullahi Off-Line recognition of Yoruba Sentence heuristic algorithm Otsu algorithm Using Cheriet Algorithm 18 50 written Yoruba sentence Achieved an accuracy of 75.9% Achieved an accuracy of 92.8% Table 4. 1: sentence gathered and result from using Cheriet Algorithm Tested Samples Sample Recognized Not R.R N.R Recognized (%) (%) Gbogbo Okunrin ati Obirin 50 48 2 96 4 Ifemi otito ni ajoke 50 49 1 98 2 Baba ati iya 50 50 0 100 0 Nibolowa lati ana 50 45 5 90 10 50 40 10 80 20 𝑜́ 𝑠𝑖̀ 𝑦𝑒 𝑘𝑖̇ 𝑤𝑜𝑛 𝑜́ 𝑚𝑎́ 𝑎 ℎ𝑢̀ 𝑤𝑎̀ 𝑠𝑖́ Table 4. 2: Sample of sentence gathered Handwritten Image Ifemi otito ni ajoke Gbogbo Okunrin ati Obirin Baba ati iya Nibolowa lati ana 𝑜́ 𝑠𝑖̀ 𝑦𝑒 𝑘𝑖̇ 𝑤𝑜𝑛 𝑜́ 𝑚𝑎́ 𝑎 ℎ𝑢̀ 𝑤𝑎̀ 𝑠𝑖́ 19 III. SUMMARY CONCLUSION AND RECOMMENDATIONS Summary The objective of image preprocessing in handwriting recognition is to ensure original image restoration. In the present work, a recognition method is devised to adapt the information theory and related techniques in the development of a robust and accurate Yoruba sentence recognition system. Considering all the aspects discussed in the previous sections, the next steps to provide better Yoruba handwriting sentence recognition systems are obvious. The recognition module will be able to determine the preprocessing stages that are required to be performed on the samples handwriting. This will enhance the performance of the Yoruba recognition to be developed. Conclusion An intelligent recognition algorithm has been presented in this research, producing good results. It was used to recognize different printed handwritten words gathered from different indigenous writers. With some modifications, more testing shall be conducted to allow the technique to be used as part of a larger system. It is therefore hoped that further research can be dedicated to analyzing and improving the results of this very important procedure. Recommendations In future work, the recognition technique will be improved in a number of ways. As a result, under recognition was noticeable in some words. Therefore, the algorithm shall be modified so that it will be possible to detect a smaller number of incorrect recognition points, while at the same time recovering more correct recognition. This can be achieved by looking for more features or possibly enhancing the current feature detection methods. More sentence shall also be used in training and testing, and finally the technique shall be integrated into a complete handwriting recognition system. 20 IV. REFERENCES Anita Jindal, Renu Dhir and Rajneesh Rani, (2012) “Diagonal Features and SVM Classifier for Handwritten Gurumukhi Character Recognition,” Volume 2, Issue 5, ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering. Asworo E., (2003) “Compatration Between Kohonen Neural Network and Learning Vector Quantization Methods on Real Time Handwritting Recognition System”, Institute of Technology Surabaya. Bamgbose, A., (1976) “Yoruba Orthography,” Ibadan University Press, pp. 15-27. Bamgboṣe, Ayọ (1965). Yoruba Orthography. Ibadan: Ibadan University Press. Benouareth A., Ennaji I. and M. Sellami, (2008) “Arabic Handwritten Word Recognition Using HMMs with Explicit State Duration," EURASIP Journal on Advances in Signal Processing, Volume, Article ID 247354. Bozinovic R. M., and Srihari S. N. (1989), “Off-Line Cursive Script Word Recognition”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 11, pp. 68-83. Brunelli R. (2009), Template Matching Techniques in Computer Vision: Theory and Practice, Wiley, ISBN 978-0-470-51706-2. Casey R. G., and Lecolinet E. (1996). “A survey of Methods and Strategies in Character Segmentation”, IEEE Trans Pattern Anal Mach Intell, 18 (7), pp. 690–706. Cheriet M., Suen C. Y., Legault R., Nadal C. and Lam L., (1993) “Building a New Generation of Handwriting Recognition Systems”, Pattern Recognition Letters, Vol. 14, 1993, pp. 305-315. 21 Gader P. D. (1996), “Fusion of Handwritten Word Classifiers”, Pattern Recognition Letters, Vol. 17, 1996, pp. 577-584. Gader P., Whalen, M., Ganzberger, M., Hepp, D., “Handprinted Word Recognition on a NIST Data Set”, Machine Vision Applications, Vol. 8, 1995, pp 31-40. Ibrahim A and Odejobi O., (2016) “A System for the Recognition of handwritten Yoruba characters.” Obafemi Awolowo University, Ile-Ife, Nigeria. AGIS. Jumoke F. Ajao, Stephen O. Olabiyisi, Elijah O. Omidiora and Oladayo O. Okediran, (2012) “Hidden Markov Model Approach for Online Yoruba Handwritten Word Recognition” British Journal of Mathematics & Computer Science 18(6): 1-20, , Article no.BJMCS.27567 ISSN: 22310851 Lulu C. Munggaran, Suryarini Widodo, and Cipta A.M, (2014), “Handwritten Pattern Recognition Using Kohonen Neural Network Based on Pixel Character” (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 5, No. 11. Martin G. L., Rashid M., and Pittman J. A., (1993) “Integrated Segmentation and Recognition through Exhaustive Scans or Learned Saccadic Jumps”, Int’l J. Pattern Recognition and Artificial Intelligence, Vol. 7, pp. 831-847. Mikael Parkvall, Världens Marhsall, and största spark, (2007)" (The World's 100 Largest Languages in 2007), in National encyclopedia. Mineichi Kudo; Jack Sklansky (2000). "Comparison of algorithms that select features for pattern classifiers". Pattern Recognition. 33 (1): 25– 41. doi:10.1016/S0031-3203(99)00041-2 Omnigot A., (2013), “The Yoruba alphabets and it pronunciation,” Accessed at http://www.omniglot.com/writing/yoruba.htm And Support Vector 22