Department of Electronics Engineering
The Arabic Handwritten Digits Databases
ADBase & MADBase
Sherif Abdelazeem, Ezzat El-Sherif,
Electronics Engineering Dept., The
Download Databases
ADBase
training set can be download from here.
ADBase
testing set can be downloaded from here.
MADBase
training set can be downloaded from here.
MADBase
testing set can be downloaded from here.
Introduction
This
webpage introduces 2 large Arabic handwritten digits databases suitable for
Arabic digit recognition research. The first database is the Arabic Digits dataBase (ADBase) which is
composed of 70,000 digits images in bmp format; 60,000 for training and 10,000
for testing. The second database is the Modified ADBase
(or the MADBase) which is a modified version of the ADBase.
The ADBase
The ADBase is composed of 70,000 digits written by 700
participants. Each participant wrote each digit (from 0 to 9) twenty times (ten
times only used in our database – the other ten times may be used later in
writer verification research). To ensure including different writing styles,
the database was gathered from different institutions: Colleges of Engineering
and Law,
The MADBase
In
our research, we had an objective of establishing benchmark results for the
Arabic digit recognition problem using different classification techniques.
Another objective of ours is to compare the performances of different
classification techniques on both Arabic and Latin digit recognition problems.
To make such a comparison valid, the two databases of Arabic and Latin digits
should be of the same format. Since we chose the MNIST to be the used Latin
digits database, a Modified version of the ADBase (MADBase) that has the same size and format of MNIST has been
created. The MADBase is created from ADBase as follows. For each digit of ADBase,
its height (h) and width (w) are calculated, and then the digit
is size-normalized to have a new height (hnew)
and new width (wnew). The assigned values
of hnew and wnew
depend on whether h or w is greater than the other. If h>w,
then hnew is set to 20, and w
to floor(20×w/h). if w>h, then wnew is set to 20 and hnew to
floor(20×h/w). This procedure ensures that each digit of MADBase is confined in a 20×20 box while its aspect ration
is preserved. Note here that the resulting size-normalized digits have gray
levels as a result of the anti-aliasing filter used in size-normalization
procedure. The MADBase now has the same size and
format of MNIST. Actually, MNIST is a modified version of the digits database
NIST as MADBase is modified from ADBase.
MNIST can be downloaded from here.
Acknowledgement
We
are thankful to all who assisted in building this database: Hossam
Hassan, Reem Ater, Nagwa
Ibrahim, Mohammed Ismail,
Mohammed Khairy, Karim Ater, Mohammed Ra'afat, and
Mohammed Omran.
Contact
Send
your comments, suggestions, or inquiries to Ezzat.