PHoG Features and Kullback-Leibler Divergence Based Ranking Method for Handwriting Recognition

Taraggy M. Ghanim^16,17,
Mahmoud I. Khalil¹⁷ &
Hazem M. Abbas¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11081))

Included in the following conference series:

IAPR Workshop on Artificial Neural Networks in Pattern Recognition

Abstract

Handwriting recognition is a research topic with a lot of challenges and complications. One of the main complications is big databases that affect classifier complexities and their ability to perform correctly. This paper introduces a new ranking approach that is proposed as a solution to this point of research. Per input test image, the approach sorts database classes from the nearest to the furthest based on the calculated ranks. Accordingly, the classification process is applied on only subset of best nearest neighbor classes rather than the whole database classes. The approach starts with computing simple regional-type features to group similar competitive database classes together using decision trees. This grouping process aims to split big database to multiple smaller ones. Decision trees match between test image and one of the constructed smaller databases. Finally, Kullback-Leibler divergence is measured between the pyramid histogram of gradients (PHoGs) features extracted from the test image and the members of the matched smaller database. This measurement sorts the matching classes to select smaller subset from them. This subset represents best nearest neighbors of test image that can be used for final classification. Reducing database size and focusing classification on subset of best nearest neighbor classes reduce the classifier complexity and increase the overall system classification accuracy. The proposed approach was applied on IFN-ENIT database, and its effect was tested on the SVM classifier.

You have full access to this open access chapter, Download conference paper PDF

A Comparative Study of Feature Ranking Methods in Recognition of Handwritten Numerals

An improved discriminative region selection methodology for online handwriting recognition

Article 16 November 2018

Improving Handwritten Character Recognition by Using a Ranking-Based Feature Selection Approach

Keywords

1 Introduction

Handwriting recognition is one of the computer vision fields with many challenging aspects. It can be performed by two ways; online and offline. Online recognition process uses sequential writing order [4] and instant temporal information. Offline recognition process bases on visual features and pixel information only. Off-line recognition is more challenging due to lack of temporal information and absence of instant cursing.

Our work proposes a new ranking process that can be used as a guide to automatic offline handwriting classifiers. Additional to the mentioned challenges that affects this type of classifiers, being applied on big database increases difficulty to classification process, and affects overall accuracy. Accordingly, classification seems as a challenging competition that is responsible for matching test image with the correct class from database. Considering all the database classes to be competitive in such a challenging process is the main complication that affects classifier performance. The target of the proposed approach is to facilitate this challenging competition by selecting subsets of database classes; smaller in size than the whole database, to apply classifier on. This reduction in size of database classes affects classification performance and complexity positively. Now each test image with a selected subset of database classes are participant in the classification process.

The proposed approach starts by extracting five regional features [5] from input test image and database classes. Based on the extracted features, decision trees are applied to match between input test image and a set of matching database classes. These matching classes are grouped in one set. A new proposed ranking algorithm is then applied based on pyramid histogram of gradients (PHoGs) [29] and Kullback-Leibler divergence measure [17] to sort these matching classes relative to each input test image. Sorting classes from the nearest to the furthest enables the selection of smaller subset. These subsets represents best nearest neighbor classes of test image. Our objective is to use these subsets for final classification stage instead the whole database to vote for the correct test image class. By the end of the testing stage, the approach splits the big database to smaller sets of similar classes to construct a data-mined database version.

We tested our system on the Arabic handwriting IFN/ENIT database [26]. Arabic is one of the major languages [4] with a lot of challenges as mentioned briefly in [1]. Next sections are organized to summarize related work, proposed approach, experiments and finally conclusion respectively.

2 Related Work

Different recognition systems were proposed in the field of Arabic handwriting recognition. This section summarizes similar offline handwritten recognition systems. The pyramid histogram of gradients (PHoG) were extracted as features by Saïdani and Echi [29] to recognize Arabic/Latin and Machine-printed Handwritten Databases. A survey was provided by Lawgali [18]. Different classifiers were applied in this field.

Some systems used sliding window to compute different types of features. Hicham et al. [11] proposed a solution without segmenting words into PAWs and achieved 80.26% success rate on Arabic-Numbers database and 78.95% accuracy on IFN/ENIT database [26] using local densities and statistical features. AlKhateeb et al. [3] computed intensity and structure-like features. Re-ranking method was applied to improve the accuracy to 89.24% using Gaussian-like function. Jayech et al. [12] extracted statistical and structural features and applied synchronous Multi-Stream Hidden Markov Model (MSHMM) and achieved 91.1% recognition rate on IFN/ENIT database. MSHMM was also applied by Mezghani et al. [22], Maqqor et al. [21], Pechwitz and Maergner [27] and Kessentini et al. [14]. Mezghani et al. [22] automatically recognizes handwritten and printed Arabic and French words using Gaussian Mixture Models (GMMs). Kessentini et al. [14] worked offline using another multi-stream approach by combining density and contour based features. The system succeeded by 79.8% on the IFN/ENIT database. A confidence- and margin-based discriminative approach by Dreuw et al. [6] was applied using maximum mutual information (MMI) criterion and minimum phone error (MPE) criteria [9, 10, 25]. The MMI outperformed ML by 33% enhancement in word error rate on IFN/ENIT database.

Some other approaches recorded effective results using SVM like Al-Dmour and Abuhelaleh [2]. The system was applied on a subset of IFN/ENIT database with 85% recognition rate using SURF features. Khaissidi et al. [15] proposed an unsupervised segmentation-free method using Histograms of Oriented Gradients (HOGs) as features. The system achieved average precision 68.4% on Ibn Sina data-set [23]. The SVM classifier and HOG features was also applied using Sobel edge detectors by Elfakir et al. [7].

Mozaffari and Bahar [24] discriminate between Farsi/Arabic handwritten from machine-printed words using histogram features. The system achieved success rate 97% on a database of about 3000 words, printed/handwritten. Leila et al. [19] computed invariant pseudo-Zernike moments features and applied Fuzzy ARTMAP neural network. The system achieved 93.8% success rate on a database that includes 96 different classes written by 100 different writers.

3 The Proposed System

The paper proposes a new ranking algorithm that can precede final classification stages in pattern recognition process. The proposed system composes of two separate stages. The first stage applies decision trees as shown in Fig. 1 to match test image with a set of database classes. In the second stage, pyramid histogram of gradients (PHoG) [29] features are computed followed by Kullback-Leibler (KL) divergence measure [17] to introduce a new ranking algorithm. Ranking process aims to sort the classes of the chosen matching set. Sorting process helps to reduce computation complexity by shrinking the database part concerned in the final classification stage. This concerned part is a subset of best nearest neighbor classes chosen from the matching set after sorting.

3.1 Preprocessing

Input images during this stage is prepared for later valuable calculations. Our images are binary images of white background and black handwriting Arabic words. Negation of images [8] was necessary to concentrate computations on minor white pixels. Normalization of words’ size is essential to unify computation environment of all database images.

3.2 Decision Trees Construction

This stage builds a tree-like model of decisions. It constructs a separate decision tree per test image. It is a commonly known data mining process [28] to derive a strategy to determine a particular set of matching classes for each test image. Its performance is evaluated by degree of matching test image with the correct set of classes. Matching process is considered to be correct when the decided set includes the correct class of the input test image. Finally, this is measured by true positive rate which is known as sensitivity [20]. In our proposed solution, the searching strategy bases on five computed regional values proposed by [5]. These features are number of piece of Arabic words PAWs, number of holes, extent, eccentricity and word orientation [5]. A sample of computed number of holes and PAWs are shown in Fig. 2.

Eccentricity [13] measures the aspect ratio as shown in Fig. 3. Extent [5] calculates the ratio between the area of the object and the area of its bounding box. Finally, the calculated word orientation measures the angle between the x-axis and the major axis of the word bounding ellipse.

$$\begin{aligned} ecc =\frac{2c}{b}, c^2 = a^2 - b^2 \end{aligned}$$

(1)

Equation (1) defines eccentricity ecc in terms of distances a, b and c shown in Fig. 3. The ranges of the five computed feature values are used to find the matching set of classes per test image using decision trees. This process excludes a large amount of training samples; which are not included in the matching set, where their inclusion in the final classification increases the recognition complexity and affects the accuracy negatively.

After applying this stage on the whole database, our database is data-mined. The database is split into smaller sets of similar classes characterized by their computed common features.

3.3 The Ranking Stage

The overview of the ranking stage is shown in Fig. 4. This stage begins by applying the PHoGs features followed by applying Kullback-Leibler Divergence Measure to sort the members of the matching set relative to the input image. The output of this stage is a subset of the best nearest neighbors that can be used for the final classification as shown in Fig. 4.

3.3.1 Pyramid Histogram of Gradients

Histogram of oriented gradients (HOG) [29, 30] is a type of feature descriptor that represents edge information. It bases on counting gradient orientation occurrences in images’ region of interest (ROI). Pyramids histogram of gradients (PHoG) is more powerful type of gradient features relative to the ordinary HOG features. PHoG feature extraction technique relies on computing histogram of gradients in localized portions of the image’s ROI. It is an extension that captures fine and more discriminative details than the ordinary HoGs. Each level is constructed by contracting edges in the level below. First level description is related to the original input data, representing the original (HoG). Higher level represents partitioning of the image into connected sub-graphs, which are subsets of pixels in each image. In every partition, features are computed independently relative to others on the same level. The union of sequential feature levels propagates information up and down laterally in the form of a pyramid as shown in Fig. 5.

Considering L number of levels and angle range from 0–360 quantized into discrete eight values, the following algorithm introduced by Saidani et al. [30] is applied,

1.
Divide the image into smaller cells (connected components).
2.
For each cell, compute histogram of oriented gradients.
3.
Discretize into pre-defined angular bins.
4.
Normalization of 8-bin histogram of adjacent cells.

In the experiments, we applied the PHoGs at different levels L as shown in Fig. 5a, b and c to study the effect of varying levels on the final classifier performance.

3.3.2 Kullback-Leibler Divergence Measure

Kullback-Leibler divergence [17] is a way to measure the disparity between any two probability distribution functions. It is an information-based measure of divergence between any given distributions p(x) and q(x) defined over the variable x. In our proposed approach, we considered the PHoGs to be distribution functions over gradients values.

$$\begin{aligned} D_{KL} (p(x) \parallel q(x))=\sum _{x \in X} p(x)\ln {(\frac{p(x)}{ q(x)})} \end{aligned}$$

(2)

The measure of divergence $ D_{KL} $ between the PHoG feature vector of each test image and the PHoGs of the training samples is defined by Eq. (2) considering p(x) is the feature vector of test sample while q(x) is the feature vector per training sample. The gradients are represented by the variable x. The smallest divergence measures vote for the most probable membership classes corresponding to the input test image. This measurement can be considered as the difference between the cross-entropy for q(x) on p(x), and the self-entropy [31] of p(x). Cross entropy is always a bigger value than self-entropy except if p(x) and q(x) are identical. In this case, the cross entropy equals the self-entropy which causes a difference to equal zero which represents the maximum divergence.

The classes of the same group are ranked from the nearest to the furthest according to the computed divergence measure. This ranking process aims to exclude another part from the matching set corresponding to each test image, by choosing only subset of best nearest neighbors to pass finally for later classification.

4 Experiments

The proposed approach is applied on IFN/ENIT database [26], a database of Arabic handwriting words written by different 411 writers. The database is divided into four sets a, b, c and d which is the standard division stated in the database official website [26]. We considered sets a, b, c as training sets, then the approach was tested on set d as done previously by [3, 12] and [6]. Set d contains 859 different classes. Each class represents one city name. The number of samples varies from one class to another.

4.1 Decision Trees Sensitivity

Decisions trees were applied firstly to match test image with a set of matching classes. The matching sets’ sizes differ per sample. Sets’ sizes frequencies is represented in Fig. 6 by solid line. Decision trees sensitivity [20] of this process was measured by the frequency of correct matching between the test image and its matching set. This correct matching was considered by the test image class number inclusion in the matched set. Sensitivity per set size is represented by dotted line in Fig. 6. The difference between the Area Under Curve (AUC) of the solid and dotted graphs in Fig. 6 is 13.4% of the whole AUC.

The IFN/ENIT database [26] has different ratios between the number of training to testing samples, which affects sensitivity of the matching. The probability to match correct set versus ratio of training to testing samples is shown in Fig. 7. The larger the ratio is, the better matching results occur which affect the system performance positively.

4.2 The Ranking Process

Figure 8 is a relation between ranks of correct classes inside their matching set after sorting and the possible success rates that can be achieved. The rank of the correct class comes to early positions after sorting. Saturation in graph means that the expected cumulative success rate approximates 100%. This ranking result allows later classification stage to be applied on a smaller subset of the database. Shrinking the length of classes in matching set reduces the computation complexity passing only a subset of best nearest neighbors to the last classification stage.

Expected success rate is measured from rate of desired class ranks at different PHoG levels after applying the ranking approach. Labels 30, 20, 5 in Fig. 8, parts a, b, c represent the ratio of training to testing samples. Increasing the PHoG level affects the ranking process positively, i.e. the rank of the correct class comes to earlier positions and graph saturates faster. Studying the relation between the expected success rate and correct classes’ ranks indicates the minimum subset length that should be passed to the final classification stages. We applied different levels of PHoG and early saturation is achieved by higher levels of PHoGs. Higher ratio between number of training to testing samples outperforms smaller ratios. This was shown in Fig. 8c.

4.3 Complexity Analysis

The complexity of decision trees construction is O(h) where h is the height of the tree; equals 5 in the proposed system according to the 5 computed features in this stage. During testing, the stage of decision tree construction results in the number of sets that include similar database classes together. Now our database can be stored as sets of similar classes along with their computed features that distinguish them instead of using the raw data. This new data-mined database has two properties. The first is that many sets may include common classes, but each set is a unique combination of database classes together. The second is that some sets are subsets of other bigger sets. These two properties enable the system to build a binary tree-like model relating smaller subsets to bigger super sets. Now the matching process can be done for new test images using our built tree-like model with decision tree pruning to decide for the matching set. This process is done with complexity O(log(v)) where v represents number of constructed sets in database. The biggest set size in Fig. 6 is 423 which is 49.2% of the whole database size. The weighted average matching set size is 172.6 different classes. Complexity of KL divergence method is O(L), where L is the number of classes in the chosen matching set of database classes. The ranking algorithm applied in the second stage finds the worst case is the rank of the correct class is 100 inside its matching set as shown in Fig. 8 when $5^{th}$ level PHoG is applied. Accordingly, we can pass only 100 best nearest neighbors to the third stage instead of the 859 classes of the database. Passing only 11% of the total database classes for final classification again reduces the recognition complexity and increases recognition accuracy.

Finally, we tested the effect of our approach on the support vector machine (SVM) classifier to check accuracy. The complexity of Linear SVM is $O(z^3)$ [16] where z is the length of final subset voted for in stage 2 which equals 100 in the worst case. Now the high SVM complexity is compensated by classifying each test image using small set of the best nearest neighbors instead of the whole data-set.

4.4 Comparison with Similar Systems

Table 1 shows the comparison of our approach with similar systems. Similar systems proposed different Holistic approaches, using the whole word for training and testing without segmentation to PAWs, the same as our approach. The system is trained by sets a, b and c and tested on set d and achieves a recognition rate of an average 96.4% for level 5-PHoG, 92.3% for level 3-PHoG 80% for level 1-PHoG using the Linear SVM without considering the false set matching. Considering the false set matching cause word error rate (WER) to be 16.6% when the system is applied on 859 classes of set d, which is relatively better than [2] who achieve 15% WER on 18 classes only of the same database.

Table 1. A comparison with similar systems on IFN-ENIT database.

Full size table

5 Conclusions

Automatic Handwriting Recognition systems suffer from different complexities and variations concerning different characters’ shapes and handwriting styles. A ranking algorithm is proposed as a solution to some complications. Ranking is done by measuring KL-divergence on PHoGs features. It was concluded that higher level L of PHoG and bigger number of available training samples relative to testing samples affects the applied ranking method effectively.

References

AbdelRaouf, A., Higgins, C.A., Pridmore, T., Khalil, M.: Building a multi-modal arabic corpus (MMAC). Int. J. Doc. Anal. Recognit. (IJDAR) 13(4), 285–302 (2010)
Article Google Scholar
Al-Dmour, A., Abuhelaleh, M.: Arabic handwritten word category classification using bag of features. J. Theor. Appl. Inf. Technol. 89(2), 320 (2016)
Google Scholar
AlKhateeb, J.H., Ren, J., Jiang, J., Al-Muhtaseb, H.: Offline handwritten Arabic cursive text recognition using hidden markov models and re-ranking. Pattern Recognit. Lett. 32(8), 1081–1088 (2011)
Article Google Scholar
Amin, A.: Off-line Arabic character recognition: the state of the art. Pattern Recognit. 31(5), 517–530 (1998)
Article Google Scholar
Dileep, D.: A feature extraction technique based on character geometry for character recognition. Department of Electronics and Communication Engineering, Amrita School of Engineering, Kollam, India (2012)
Google Scholar
Dreuw, P., Heigold, G., Ney, H.: Confidence-and margin-based MMI/MPE discriminative training for off-line handwriting recognition. Int. J. Doc. Anal. Recognit. 14(3), 273–288 (2011)
Article Google Scholar
Elfakir, Y., Khaissidi, G., Mrabti, M., Chenouni, D.: Handwritten Arabic documents indexation using HOG feature. Int. J. Comput. Appl. 126(9), 14–18 (2015)
Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Pearson (2007)
Google Scholar
Heigold, G.: A log-linear discriminative modeling framework for speech recognition. Ph.D. thesis, RWTH Aachen University (2010)
Google Scholar
Heigold, G., Dreuw, P., Hahn, S., Schluter, R., Ney, H.: Margin-based discriminative training for string recognition. IEEE J. Sel. Top. Sig. Process. 4(6), 917–925 (2010)
Article Google Scholar
Hicham, E.M., Akram, H., Khalid, S.: Using features of local densities, statistics and HMM toolkit (HTK) for offline Arabic handwriting text recognition. J. Electr. Syst. Inf. Technol. 3(3), 99–110 (2016)
Google Scholar
Jayech, K., Mahjoub, M.A., Amara, N.E.B.: Arabic handwriting recognition based on synchronous multi-stream HMM without explicit segmentation. In: Onieva, E., Santos, I., Osaba, E., Quintián, H., Corchado, E. (eds.) HAIS 2015. LNCS (LNAI), vol. 9121, pp. 136–145. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19644-2_12
Chapter Google Scholar
Kenna, L.: Eccentricity in ellipses. Math. Mag. 32(3), 133–135 (1959)
Article MathSciNet Google Scholar
Kessentini, Y., Paquet, T., Hamadou, A.B.: Off-line handwritten word recognition using multi-stream hidden Markov models. Pattern Recognit. Lett. 31(1), 60–70 (2010)
Article Google Scholar
Khaissidi, G., Elfakir, Y., Mrabti, M., Lakhliai, Z., Chenouni, D., El yacoubi, M.: Segmentation-free word spotting for handwritten Arabic documents. Int. J. Interact. Multimed. Artif. Intell. 4, 6–10 (2016)
Google Scholar
Koenderink, J.J., Rabins, J.M.: Solid shape. Appl. Opt. 30, 714 (1991)
Article Google Scholar
Kullback, S.: Information Theory and Statistics. Courier Corporation (1997)
Google Scholar
Lawgali, A.: A survey on Arabic character recognition. Int. J. Sig. Process. Image Process. Pattern Recognit. 8(2), 401–426 (2015)
Google Scholar
Leila, C., Maâmar, K., Salim, C.: Combining neural networks for Arabic handwriting recognition. In: 2011 10th International Symposium on Programming and Systems (ISPS), pp. 74–79. IEEE (2011)
Google Scholar
Macmillan, N.A., Creelman, C.D.: Detection Theory: A User’s Guide. Psychology Press (2004)
Google Scholar
Maqqor, A., Halli, A., Satori, K.: A multi-stream HMM approach to offline handwritten Arabic word recognition. Int. J. Natural Lang. Comput. (IJNLC) 2(4), 21–33 (2013)
Article Google Scholar
Mezghani, A., Slimane, F., Kanoun, S., Märgner, V.: Identification of Arabic/French handwritten/printed words using GMM-based system. In: CORIA-CIFED, pp. 371–374 (2014)
Google Scholar
Moghaddam, R.F., Cheriet, M., Adankon, M.M., Filonenko, K., Wisnovsky, R.: IBN SINA: a database for research on processing and understanding of Arabic manuscripts images. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, Proceeding DAS 2010, pp. 11–18 (2010)
Google Scholar
Mozaffari, S., Bahar, P.: Farsi/Arabic handwritten from machine-printed words discrimination. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 698–703. IEEE (2012)
Google Scholar
Nguyen: Vietnamese Characters Recognition based on supervised learning. Ph.D. thesis, A master’s project Computer Science University of Colorado (2013)
Google Scholar
Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H., et al.: IFN/ENIT-database of handwritten Arabic words. In: Proceedings of CIFED, vol. 2, pp. 127–136 (2002). http://www.ifnenit.com
Pechwitz, M., Maergner, V.: HMM based approach for handwritten Arabic word recognition using the IFN/ENIT-database. In: 2003 Seventh International Conference on Document Analysis and Recognition, Proceedings, pp. 890–894. IEEE (2003)
Google Scholar
Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific, Singapore (2014)
Book Google Scholar
Saïdani, A., Echi, A.K.: Pyramid histogram of oriented gradient for machine-printed/handwritten and Arabic/Latin word discrimination. In: 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 267–272. IEEE (2014)
Google Scholar
Saidani, A., Kacem, A., Belaid, A.: Arabic/Latin and machine-printed/handwritten word discrimination using HOG-based shape descriptor. ELCVIA Electr. Lett. Comput. Vis. Image Anal. 14(2), 1–23 (2015)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Misr International University, Cairo, Egypt
Taraggy M. Ghanim
Faculty of Engineering, Ain Shams University, Cairo, Egypt
Taraggy M. Ghanim, Mahmoud I. Khalil & Hazem M. Abbas

Authors

Taraggy M. Ghanim
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud I. Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Hazem M. Abbas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taraggy M. Ghanim .

Editor information

Editors and Affiliations

University of Siena, Siena, Italy
Luca Pancioni
Ulm University, Ulm, Germany
Friedhelm Schwenker
University of Siena, Siena, Italy
Edmondo Trentin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghanim, T.M., Khalil, M.I., Abbas, H.M. (2018). PHoG Features and Kullback-Leibler Divergence Based Ranking Method for Handwriting Recognition. In: Pancioni, L., Schwenker, F., Trentin, E. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2018. Lecture Notes in Computer Science(), vol 11081. Springer, Cham. https://doi.org/10.1007/978-3-319-99978-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-99978-4_23
Published: 30 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99977-7
Online ISBN: 978-3-319-99978-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)