Keywords

1 Introduction

Handwriting recognition is one of the computer vision fields with many challenging aspects. It can be performed by two ways; online and offline. Online recognition process uses sequential writing order [4] and instant temporal information. Offline recognition process bases on visual features and pixel information only. Off-line recognition is more challenging due to lack of temporal information and absence of instant cursing.

Our work proposes a new ranking process that can be used as a guide to automatic offline handwriting classifiers. Additional to the mentioned challenges that affects this type of classifiers, being applied on big database increases difficulty to classification process, and affects overall accuracy. Accordingly, classification seems as a challenging competition that is responsible for matching test image with the correct class from database. Considering all the database classes to be competitive in such a challenging process is the main complication that affects classifier performance. The target of the proposed approach is to facilitate this challenging competition by selecting subsets of database classes; smaller in size than the whole database, to apply classifier on. This reduction in size of database classes affects classification performance and complexity positively. Now each test image with a selected subset of database classes are participant in the classification process.

The proposed approach starts by extracting five regional features [5] from input test image and database classes. Based on the extracted features, decision trees are applied to match between input test image and a set of matching database classes. These matching classes are grouped in one set. A new proposed ranking algorithm is then applied based on pyramid histogram of gradients (PHoGs) [29] and Kullback-Leibler divergence measure [17] to sort these matching classes relative to each input test image. Sorting classes from the nearest to the furthest enables the selection of smaller subset. These subsets represents best nearest neighbor classes of test image. Our objective is to use these subsets for final classification stage instead the whole database to vote for the correct test image class. By the end of the testing stage, the approach splits the big database to smaller sets of similar classes to construct a data-mined database version.

We tested our system on the Arabic handwriting IFN/ENIT database [26]. Arabic is one of the major languages [4] with a lot of challenges as mentioned briefly in [1]. Next sections are organized to summarize related work, proposed approach, experiments and finally conclusion respectively.

2 Related Work

Different recognition systems were proposed in the field of Arabic handwriting recognition. This section summarizes similar offline handwritten recognition systems. The pyramid histogram of gradients (PHoG) were extracted as features by Saïdani and Echi [29] to recognize Arabic/Latin and Machine-printed Handwritten Databases. A survey was provided by Lawgali [18]. Different classifiers were applied in this field.

Some systems used sliding window to compute different types of features. Hicham et al. [11] proposed a solution without segmenting words into PAWs and achieved 80.26% success rate on Arabic-Numbers database and 78.95% accuracy on IFN/ENIT database [26] using local densities and statistical features. AlKhateeb et al. [3] computed intensity and structure-like features. Re-ranking method was applied to improve the accuracy to 89.24% using Gaussian-like function. Jayech et al. [12] extracted statistical and structural features and applied synchronous Multi-Stream Hidden Markov Model (MSHMM) and achieved 91.1% recognition rate on IFN/ENIT database. MSHMM was also applied by Mezghani et al. [22], Maqqor et al. [21], Pechwitz and Maergner [27] and Kessentini et al. [14]. Mezghani et al. [22] automatically recognizes handwritten and printed Arabic and French words using Gaussian Mixture Models (GMMs). Kessentini et al. [14] worked offline using another multi-stream approach by combining density and contour based features. The system succeeded by 79.8% on the IFN/ENIT database. A confidence- and margin-based discriminative approach by Dreuw et al. [6] was applied using maximum mutual information (MMI) criterion and minimum phone error (MPE) criteria [9, 10, 25]. The MMI outperformed ML by 33% enhancement in word error rate on IFN/ENIT database.

Some other approaches recorded effective results using SVM like Al-Dmour and Abuhelaleh [2]. The system was applied on a subset of IFN/ENIT database with 85% recognition rate using SURF features. Khaissidi et al. [15] proposed an unsupervised segmentation-free method using Histograms of Oriented Gradients (HOGs) as features. The system achieved average precision 68.4% on Ibn Sina data-set [23]. The SVM classifier and HOG features was also applied using Sobel edge detectors by Elfakir et al. [7].

Mozaffari and Bahar [24] discriminate between Farsi/Arabic handwritten from machine-printed words using histogram features. The system achieved success rate 97% on a database of about 3000 words, printed/handwritten. Leila et al. [19] computed invariant pseudo-Zernike moments features and applied Fuzzy ARTMAP neural network. The system achieved 93.8% success rate on a database that includes 96 different classes written by 100 different writers.

3 The Proposed System

The paper proposes a new ranking algorithm that can precede final classification stages in pattern recognition process. The proposed system composes of two separate stages. The first stage applies decision trees as shown in Fig. 1 to match test image with a set of database classes. In the second stage, pyramid histogram of gradients (PHoG) [29] features are computed followed by Kullback-Leibler (KL) divergence measure [17] to introduce a new ranking algorithm. Ranking process aims to sort the classes of the chosen matching set. Sorting process helps to reduce computation complexity by shrinking the database part concerned in the final classification stage. This concerned part is a subset of best nearest neighbor classes chosen from the matching set after sorting.

3.1 Preprocessing

Input images during this stage is prepared for later valuable calculations. Our images are binary images of white background and black handwriting Arabic words. Negation of images [8] was necessary to concentrate computations on minor white pixels. Normalization of words’ size is essential to unify computation environment of all database images.

3.2 Decision Trees Construction

This stage builds a tree-like model of decisions. It constructs a separate decision tree per test image. It is a commonly known data mining process [28] to derive a strategy to determine a particular set of matching classes for each test image. Its performance is evaluated by degree of matching test image with the correct set of classes. Matching process is considered to be correct when the decided set includes the correct class of the input test image. Finally, this is measured by true positive rate which is known as sensitivity [20]. In our proposed solution, the searching strategy bases on five computed regional values proposed by [5]. These features are number of piece of Arabic words PAWs, number of holes, extent, eccentricity and word orientation [5]. A sample of computed number of holes and PAWs are shown in Fig. 2.

Fig. 1.
figure 1

The decision trees construction stage.

Fig. 2.
figure 2

A sample image after computing number of holes and PAWs.

Eccentricity [13] measures the aspect ratio as shown in Fig. 3. Extent [5] calculates the ratio between the area of the object and the area of its bounding box. Finally, the calculated word orientation measures the angle between the x-axis and the major axis of the word bounding ellipse.

$$\begin{aligned} ecc =\frac{2c}{b}, c^2 = a^2 - b^2 \end{aligned}$$
(1)
Fig. 3.
figure 3

Eccentricity, F1 and F2 are the two foci of the ellipse.

Equation (1) defines eccentricity ecc in terms of distances a, b and c shown in Fig. 3. The ranges of the five computed feature values are used to find the matching set of classes per test image using decision trees. This process excludes a large amount of training samples; which are not included in the matching set, where their inclusion in the final classification increases the recognition complexity and affects the accuracy negatively.

After applying this stage on the whole database, our database is data-mined. The database is split into smaller sets of similar classes characterized by their computed common features.

3.3 The Ranking Stage

The overview of the ranking stage is shown in Fig. 4. This stage begins by applying the PHoGs features followed by applying Kullback-Leibler Divergence Measure to sort the members of the matching set relative to the input image. The output of this stage is a subset of the best nearest neighbors that can be used for the final classification as shown in Fig. 4.

Fig. 4.
figure 4

The ranking stage.

3.3.1 Pyramid Histogram of Gradients

Histogram of oriented gradients (HOG) [29, 30] is a type of feature descriptor that represents edge information. It bases on counting gradient orientation occurrences in images’ region of interest (ROI). Pyramids histogram of gradients (PHoG) is more powerful type of gradient features relative to the ordinary HOG features. PHoG feature extraction technique relies on computing histogram of gradients in localized portions of the image’s ROI. It is an extension that captures fine and more discriminative details than the ordinary HoGs. Each level is constructed by contracting edges in the level below. First level description is related to the original input data, representing the original (HoG). Higher level represents partitioning of the image into connected sub-graphs, which are subsets of pixels in each image. In every partition, features are computed independently relative to others on the same level. The union of sequential feature levels propagates information up and down laterally in the form of a pyramid as shown in Fig. 5.

Considering L number of levels and angle range from 0–360 quantized into discrete eight values, the following algorithm introduced by Saidani et al. [30] is applied,

  1. 1.

    Divide the image into smaller cells (connected components).

  2. 2.

    For each cell, compute histogram of oriented gradients.

  3. 3.

    Discretize into pre-defined angular bins.

  4. 4.

    Normalization of 8-bin histogram of adjacent cells.

In the experiments, we applied the PHoGs at different levels L as shown in Fig. 5a, b and c to study the effect of varying levels on the final classifier performance.

Fig. 5.
figure 5

Pyramid histogram of gradients at different levels.

3.3.2 Kullback-Leibler Divergence Measure

Kullback-Leibler divergence [17] is a way to measure the disparity between any two probability distribution functions. It is an information-based measure of divergence between any given distributions p(x) and q(x) defined over the variable x. In our proposed approach, we considered the PHoGs to be distribution functions over gradients values.

$$\begin{aligned} D_{KL} (p(x) \parallel q(x))=\sum _{x \in X} p(x)\ln {(\frac{p(x)}{ q(x)})} \end{aligned}$$
(2)

The measure of divergence \( D_{KL} \) between the PHoG feature vector of each test image and the PHoGs of the training samples is defined by Eq. (2) considering p(x) is the feature vector of test sample while q(x) is the feature vector per training sample. The gradients are represented by the variable x. The smallest divergence measures vote for the most probable membership classes corresponding to the input test image. This measurement can be considered as the difference between the cross-entropy for q(x) on p(x), and the self-entropy [31] of p(x). Cross entropy is always a bigger value than self-entropy except if p(x) and q(x) are identical. In this case, the cross entropy equals the self-entropy which causes a difference to equal zero which represents the maximum divergence.

The classes of the same group are ranked from the nearest to the furthest according to the computed divergence measure. This ranking process aims to exclude another part from the matching set corresponding to each test image, by choosing only subset of best nearest neighbors to pass finally for later classification.

4 Experiments

The proposed approach is applied on IFN/ENIT database [26], a database of Arabic handwriting words written by different 411 writers. The database is divided into four sets abc and d which is the standard division stated in the database official website [26]. We considered sets abc as training sets, then the approach was tested on set d as done previously by [3, 12] and [6]. Set d contains 859 different classes. Each class represents one city name. The number of samples varies from one class to another.

4.1 Decision Trees Sensitivity

Decisions trees were applied firstly to match test image with a set of matching classes. The matching sets’ sizes differ per sample. Sets’ sizes frequencies is represented in Fig. 6 by solid line. Decision trees sensitivity [20] of this process was measured by the frequency of correct matching between the test image and its matching set. This correct matching was considered by the test image class number inclusion in the matched set. Sensitivity per set size is represented by dotted line in Fig. 6. The difference between the Area Under Curve (AUC) of the solid and dotted graphs in Fig. 6 is 13.4% of the whole AUC.

Fig. 6.
figure 6

Matching process sensitivity.

The IFN/ENIT database [26] has different ratios between the number of training to testing samples, which affects sensitivity of the matching. The probability to match correct set versus ratio of training to testing samples is shown in Fig. 7. The larger the ratio is, the better matching results occur which affect the system performance positively.

Fig. 7.
figure 7

Effect of number of training samples on sensitivity.

4.2 The Ranking Process

Figure 8 is a relation between ranks of correct classes inside their matching set after sorting and the possible success rates that can be achieved. The rank of the correct class comes to early positions after sorting. Saturation in graph means that the expected cumulative success rate approximates 100%. This ranking result allows later classification stage to be applied on a smaller subset of the database. Shrinking the length of classes in matching set reduces the computation complexity passing only a subset of best nearest neighbors to the last classification stage.

Fig. 8.
figure 8

Effect of ranking on the expected success rate.

Expected success rate is measured from rate of desired class ranks at different PHoG levels after applying the ranking approach. Labels 30, 20, 5 in Fig. 8, parts a, b, c represent the ratio of training to testing samples. Increasing the PHoG level affects the ranking process positively, i.e. the rank of the correct class comes to earlier positions and graph saturates faster. Studying the relation between the expected success rate and correct classes’ ranks indicates the minimum subset length that should be passed to the final classification stages. We applied different levels of PHoG and early saturation is achieved by higher levels of PHoGs. Higher ratio between number of training to testing samples outperforms smaller ratios. This was shown in Fig. 8c.

4.3 Complexity Analysis

The complexity of decision trees construction is O(h) where h is the height of the tree; equals 5 in the proposed system according to the 5 computed features in this stage. During testing, the stage of decision tree construction results in the number of sets that include similar database classes together. Now our database can be stored as sets of similar classes along with their computed features that distinguish them instead of using the raw data. This new data-mined database has two properties. The first is that many sets may include common classes, but each set is a unique combination of database classes together. The second is that some sets are subsets of other bigger sets. These two properties enable the system to build a binary tree-like model relating smaller subsets to bigger super sets. Now the matching process can be done for new test images using our built tree-like model with decision tree pruning to decide for the matching set. This process is done with complexity O(log(v)) where v represents number of constructed sets in database. The biggest set size in Fig. 6 is 423 which is 49.2% of the whole database size. The weighted average matching set size is 172.6 different classes. Complexity of KL divergence method is O(L), where L is the number of classes in the chosen matching set of database classes. The ranking algorithm applied in the second stage finds the worst case is the rank of the correct class is 100 inside its matching set as shown in Fig. 8 when \(5^{th}\) level PHoG is applied. Accordingly, we can pass only 100 best nearest neighbors to the third stage instead of the 859 classes of the database. Passing only 11% of the total database classes for final classification again reduces the recognition complexity and increases recognition accuracy.

Finally, we tested the effect of our approach on the support vector machine (SVM) classifier to check accuracy. The complexity of Linear SVM is \(O(z^3)\) [16] where z is the length of final subset voted for in stage 2 which equals 100 in the worst case. Now the high SVM complexity is compensated by classifying each test image using small set of the best nearest neighbors instead of the whole data-set.

4.4 Comparison with Similar Systems

Table 1 shows the comparison of our approach with similar systems. Similar systems proposed different Holistic approaches, using the whole word for training and testing without segmentation to PAWs, the same as our approach. The system is trained by sets ab and c and tested on set d and achieves a recognition rate of an average 96.4% for level 5-PHoG, 92.3% for level 3-PHoG 80% for level 1-PHoG using the Linear SVM without considering the false set matching. Considering the false set matching cause word error rate (WER) to be 16.6% when the system is applied on 859 classes of set d, which is relatively better than [2] who achieve 15% WER on 18 classes only of the same database.

Table 1. A comparison with similar systems on IFN-ENIT database.

5 Conclusions

Automatic Handwriting Recognition systems suffer from different complexities and variations concerning different characters’ shapes and handwriting styles. A ranking algorithm is proposed as a solution to some complications. Ranking is done by measuring KL-divergence on PHoGs features. It was concluded that higher level L of PHoG and bigger number of available training samples relative to testing samples affects the applied ranking method effectively.