Abstract
Next generation postal sorting machines reuse once extracted mail piece addresses in different sorting steps by means of the mail piece image. Based on the mail piece uniqueness, characteristics derived from the image guarantee the assignment of stored addresses. During the first sorting step mail piece characteristics are extracted and stored together with the target address in a database. In subsequent sorting steps the address is accessed by determining the corresponding mail piece characteristics in the database. Appropriate mail piece image characteristics and procedures for their distance measurement were presented in a previous work.
Image based mail piece identification poses a challenge by a constantly changing and non-deterministic mail spectrum and the differentiation of nearly identical bulk mail.
In particular, the rejection of unknown mail pieces requires the definition of carefully chosen rejection classes depending on the current mail spectrum. In this paper we present an approach for distance based mail piece identification using a two-stage classification process. Bulk and private mail are handled individually by an unsupervised learning process which clusters similar mail piece characteristics. Based on these clusters specific rejection classes can be estimated within each cluster. The first step in the identification process is the determination of the corresponding cluster for a given mail piece. Using the cluster specific rejection classes a mail piece is either identified or rejected. Experimental results obtained on real-world data sets prove the applicability of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Doermann, D. S., Li, H., & Kia, O. E. (2003). The detection of duplicates in document image databases. In IEEE International Conference on Document Analysis and Recognition (pp. 314–318). Ulm, Germany.
Foo, J. J., Zobel, J., & Sinha, R. (2007). Clustering near-duplicate images in large collections. In ACM SIGMM International Workshop on Multimedia Information Retrieval (pp. 21–30). Augsburg, Germany.
Hu, J., Kashi, R. S., & Wilfong, G. T. (1999). Document image layout comparison and classification. In IEEE International Conference on Document Analysis and Recognition (pp. 285–288). Bangalore, India.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data – An introduction to cluster analysis. New York: Wiley.
Milligan, G., & Cooper, M. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.
Peng, H., Long, F., & Chi, Z. (2003). Document image recognition based on template matching of component block projections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1188–1192.
Peng, H., Long, F., Siu, W., Chi, Z., & Feng, D. D. (2000). Document image matching based on component blocks. In IEEE International Conference on Image Processing (pp. 601–604). Vancouver, Canada.
van Beusekom, J., Shafait, F., & Breuel, T. M. (2007). Image-matching for revision detection in printed historical documents. In Springer Symposium of the German Association for Pattern Recognition (pp. 507–516). Heidelberg, Germany.
Worm, K., & Meffert, B. (2008). Robust image based document comparison using attributed relational graphs. In IASTED International Conference on Signal Processing, Pattern Recognition, and Applications (pp. 116–121). Innsbruck, Austria.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Worm, K., Meffert, B. (2009). Image Based Mail Piece Identification Using Unsupervised Learning. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)