CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation
<p>Three sample pages of the registers belonging to three different districts. The layout of pages can change between districts.</p> "> Figure 2
<p>Start of the populated place (village or neighborhood) symbol and individual objects are demonstrated. When a new populated place is registered, its name is written at the top of a new page (populated place start symbol). Then, all men in this place are written one by one (individual objects). These objects include the name, age, appearance, and job of the individuals.</p> "> Figure 3
<p>Example updates of registers are shown. Some of them can connect two individuals and can cause clustering errors. Green enclosed objects are individuals; red ones are populated place symbols; and blue ones are the updates connecting two other object types.</p> "> Figure 4
<p>A sample register page and its labeled version are demonstrated. Different colors represent different object types. The background, which is the region between the objects and document borders, is marked with blue. The start of a populated place object is colored with red. The individual objects are marked with green.</p> "> Figure 5
<p>Training metrics are demonstrated. In the top left, the learning rate, in the top right, the loss function, in the bottom left, regularized loss, and in the bottom right, global steps per second metrics are demonstrated. The subfigures are created with Tensorboard. The horizontal axis is the increasing iterations.</p> "> Figure 6
<p>Flowchart of our populated place assigning algorithm.</p> "> Figure 7
<p>Examples of intertwined rows and columns are shown. They are counted as one since there are not any empty pixels in between.</p> "> Figure 8
<p>A sample prediction made by our system. In the left, a binarized prediction image for counting individuals, in the middle, a binarized image for counting populated place start, and in the right, the objects, enclosed with rectangular boxes. Green boxes for individual register counting and the red box for counting the populated place start object.</p> "> Figure 9
<p>A sample counting mistake. All three individual registers are counted as one. This results in two missing records in our automatic counting system.</p> ">
Abstract
:1. Introduction
2. Related Works
3. Structure of the Registers
4. Automatic Page Segmentation and Object Recognition System for Counting the Ottoman Population
4.1. Creating a Dataset
4.2. Training the CNN Architecture
4.3. Preparing the Dataset for Evaluation
4.4. Post-Processing
4.5. Assigning Individuals to the Populated Places
Algorithm 1 The algorithm for assigning individuals into the populated places. Obj. stands for objects, which could be an individual object or a populated place start object. CoG stands for the Center of Gravity of the object. |
|
4.6. Baseline Heuristic Projection Profile Algorithm for Object Detection
Algorithm 2 Detecting and counting objects belonging to different classes with the heuristic vertical and horizontal projection profiles. |
|
5. Experimental Results and Discussion
5.1. Metrics
5.1.1. Pixel-Wise Classification Accuracy
5.1.2. Pixel-Wise Precision, Recall, and
5.1.3. Intersection over Union
5.1.4. High-Level Counting Errors
5.2. Results and Discussion
5.2.1. Results from the Heuristic Baseline Projection Profile Technique
5.2.2. CNN-Based Page Segmentation and Object Detection
6. Conclusions and Future Works
Author Contributions
Funding
Conflicts of Interest
References
- Wick, C.; Puppe, F. Fully convolutional neural networks for page segmentation of historical document images. In Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria, 24–27 April 2018; pp. 287–292. [Google Scholar]
- Richarz, J.; Fink, G.A.; Vajda, S. Towards semi-supervised transcription of handwritten historical weather reports. In Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), Gold Cost, QLD, Australia, 27–29 March 2012; pp. 180–184. [Google Scholar]
- Zhang, K.; Shen, Z.; Zhou, J.; Dell, M. Information Extraction from Text Regions with Complex Tabular Structure. In Proceedings of the Neural Information Processing Systems (NeurIPS) Document Intelligence Workshop, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Calvo-Zaragoza, J.; Castellanos, F.J.; Vigliensoni, G.; Fujinaga, I. Deep Neural Networks for Document Processing of Music Score Images. Appl. Sci. 2018, 8, 654. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; He, W.; Yin, F.; Liu, C.L. Page segmentation for historical handwritten documents using fully convolutional networks. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 541–546. [Google Scholar]
- Baechler, M.; Ingold, R. Multi resolution layout analysis of medieval manuscripts using dynamic mlp. In Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 1185–1189. [Google Scholar]
- Garz, A.; Sablatnig, R.; Diem, M. Layout analysis for historical manuscripts using sift features. In Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 508–512. [Google Scholar]
- Bukhari, S.S.; Breuel, T.M.; Asi, A.; El-Sana, J. Layout analysis for arabic historical document images using machine learning. In Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), Bari, Italy, 18–20 September 2012; pp. 639–644. [Google Scholar]
- Uttama, S.; Ogier, J.M.; Loonis, P. Top-down segmentation of ancient graphical drop caps: Lettrines. In Proceedings of the 6th IAPR International Workshop on Graphics Recognition, Hong Kong, China, 25–26 August 2005; pp. 87–96. [Google Scholar]
- Ouwayed, N.; Belaïd, A. Multi-oriented Text Line Extraction from Handwritten Arabic Documents. In Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems (DAS), Nara, Japan, 16–19 September 2008; pp. 339–346. [Google Scholar]
- Cohen, R.; Asi, A.; Kedem, K.; El-Sana, J.; Dinstein, I. Robust text and drawing segmentation algorithm for historical documents. In Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, Washington, DC, USA, 24 August 2013; pp. 110–117. [Google Scholar]
- Asi, A.; Cohen, R.; Kedem, K.; El-Sana, J.; Dinstein, I. A coarse-to-fine approach for layout analysis of ancient manuscripts. In Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), Heraklion, Greece, 1–4 September 2014; pp. 140–145. [Google Scholar]
- Chen, K.; Wei, H.; Hennebert, J.; Ingold, R.; Liwicki, M. Page segmentation for historical handwritten document images using color and texture features. In Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), Heraklion, Greece, 1–4 September 2014; pp. 488–493. [Google Scholar]
- Kumar, M.P.; Torr, P.H.S.; Zisserman, A. OBJCUT: Efficient Segmentation Using Top-Down and Bottom-Up Cues. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 530–545. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ares Oliveira, S.; Seguin, B.; Kaplan, F. dhSegment: A Generic Deep-Learning Approach for Document Segmentation. In Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA, 5–8 August 2018; pp. 7–12. [Google Scholar]
- Barakat, B.K.; El-Sana, J. Binarization Free Layout Analysis for Arabic Historical Documents Using Fully Convolutional Networks. In Proceedings of the 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), London, UK, 12–14 March 2018; pp. 151–155. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 28 February 2020).
- Hesham, A.M.; Rashwan, M.A.; Al-Barhamtoshy, H.M.; Abdou, S.M.; Badr, A.A.; Farag, I. Arabic Document Layout Analysis. Pattern Anal. Appl. 2017, 20, 1275–1287. [Google Scholar] [CrossRef]
- Nagy, G. Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 38–62. [Google Scholar] [CrossRef]
- Laven, K.; Leishman, S.; Roweis, S. A statistical learning approach to document image analysis. In Proceedings of the 2005 International Conference on Document Analysis and Recognition (ICDAR), Seoul, Korea, 31 August–1 September 2005; pp. 357–361. [Google Scholar]
- Jaekyu, H.; Haralick, R.M.; Phillips, I.T. Recursive X-Y cut using bounding boxes of connected components. In Proceedings of the 1995 International Conference on Document Analysis and Recognition (ICDAR), Montreal, QC, Canada, 14–16 August 1995; Volume 2, pp. 952–955. [Google Scholar]
- Sauvola, J.; Seppanen, T.; Haapakoski, S.; Pietikainen, M. Adaptive document binarization. In Proceedings of the 1997 International Conference on Document Analysis and Recognition (ICDAR), Ulm, Germany, 18–20 August 1997; Volume 1, pp. 147–152. [Google Scholar]
- Dong, J.X.; Krzyżak, A.; Suen, C.Y.; Ponson, D. Low-Level Cursive Word Representation Based on Geometric Decomposition. In Machine Learning and Data Mining in Pattern Recognition; Perner, P., Imiya, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 590–599. [Google Scholar]
- Matsumoto, T.; Yokohama, T.; Suzuki, H.; Furukawa, R.; Oshimoto, A.; Shimmi, T.; Matsushita, Y.; Seo, T.; Chua, L.O. Several image processing examples by CNN. In Proceedings of the IEEE International Workshop on Cellular Neural Networks and Their Applications, Budapest, Hungary, 16–19 December 1990; pp. 100–111. [Google Scholar]
- Breuel, T.M. Robust, Simple Page Segmentation Using Hybrid Convolutional MDLSTM Networks. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 733–740. [Google Scholar]
- Augusto Borges Oliveira, D.; Palhares Viana, M. Fast CNN-based document layout analysis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1173–1180. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2017, arXiv:1412.6980v9 [cs.LG]. [Google Scholar]
- Ioffe, S. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1942–1950. [Google Scholar]
- Jobin, K.V.; Jawahar, C.V. Document Image Segmentation Using Deep Features. In Computer Vision, Pattern Recognition, Image Processing, and Graphics; Rameshan, R., Arora, C., Dutta Roy, S., Eds.; Springer: Singapore, 2018; pp. 372–382. [Google Scholar]
Tested with | PPSCE(%) | ICE (%) |
---|---|---|
Nicaea | 10 | 2.927 |
Svishtov | 3.703 | 2.412 |
Trained on | Tested with | Pixel-Wise Acc.(%) | IoU (%) | ICE (%) | PPSCE (%) |
---|---|---|---|---|---|
Nicaea | Nicaea | 91.53 | 80.91 | 1.65 | 0 |
Nicaea | Svishtov | 90.35 | 73.6 | 11.57 | 0 |
Svishtov | Svishtov | 93.39 | 72.29 | 0.76 | 0 |
Svishtov | Nicaea | 91.92 | 47.95 | 0.27 | 0 |
Mixed | Mixed | 92.54 | 48.54 | 2.26 | 0 |
Trained on | Tested with | TP | TN | FP | FN | Recall | Precision | |
---|---|---|---|---|---|---|---|---|
Nicaea | Nicaea | 0.7463 | 0.169 | 0.1367 | 0.0702 | 0.913 | 0.981 | 0.9459 |
Nicaea | Svishtov | 0.7455 | 0.158 | 0.0125 | 0.0834 | 0.898 | 0.983 | 0.9386 |
Svishtov | Svishtov | 0.7769 | 0.157 | 0.0191 | 0.0469 | 0.942 | 0.975 | 0.9587 |
Svishtov | Nicaea | 0.7822 | 0.137 | 0.0596 | 0.0209 | 0.974 | 0.928 | 0.9499 |
Mixed | Mixed | 0.7824 | 0.143 | 0.0379 | 0.0366 | 0.955 | 0.953 | 0.9539 |
Trained on | Tested with | TP | TN | FP | FN | Recall | Precision | |
---|---|---|---|---|---|---|---|---|
Nicaea | Nicaea | 0.79517 | 0.18316 | 0.02138 | 0.00028 | 0.88576 | 0.99862 | 0.9376 |
Nicaea | Svishtov | 0.80640 | 0.17583 | 0.01751 | 0.00024 | 0.90168 | 0.99830 | 0.9469 |
Svishtov | Svishtov | 0.80089 | 0.17602 | 0.02302 | 0.00005 | 0.87563 | 0.99952 | 0.9319 |
Svishtov | Nicaea | 0.79172 | 0.19341 | 0.01381 | 0.00104 | 0.91688 | 0.99518 | 0.9528 |
Mixed | Mixed | 0.79934 | 0.18020 | 0.0204 | 0.00005 | 0.88005 | 0.99969 | 0.9341 |
Trained on | Tested with | TP | TN | FP | FN | Recall | Precision | |
---|---|---|---|---|---|---|---|---|
Nicaea | Nicaea | 0.79458 | 0.16838 | 0.01506 | 0.02199 | 0.97252 | 0.98093 | 0.9766 |
Nicaea | Svishtov | 0.81280 | 0.17809 | 0.0902 | 0.0500 | 0.94681 | 0.98859 | 0.9639 |
Svishtov | Svishtov | 0.80669 | 0.15349 | 0.02259 | 0.17215 | 0.97867 | 0.97236 | 0.9754 |
Svishtov | Nicaea | 0.80168 | 0.12277 | 0.07160 | 0.00385 | 0.99514 | 0.91634 | 0.9537 |
Mixed | Mixed | 0.80000 | 0.16613 | 0.01413 | 0.01970 | 0.97581 | 0.98306 | 0.9794 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Can, Y.S.; Kabadayı, M.E. CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation. J. Imaging 2020, 6, 32. https://doi.org/10.3390/jimaging6050032
Can YS, Kabadayı ME. CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation. Journal of Imaging. 2020; 6(5):32. https://doi.org/10.3390/jimaging6050032
Chicago/Turabian StyleCan, Yekta Said, and M. Erdem Kabadayı. 2020. "CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation" Journal of Imaging 6, no. 5: 32. https://doi.org/10.3390/jimaging6050032
APA StyleCan, Y. S., & Kabadayı, M. E. (2020). CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation. Journal of Imaging, 6(5), 32. https://doi.org/10.3390/jimaging6050032