Character Recognition
Introduction
We propose to implement a simple optical character recognition system. The system
will take as input a scanned bitmap consisting of multiple (but separate) characters. The
system will then segment the image, and perform feature extraction and pattern classification
on the individual objects. The resulting characters will be output.
Our procedure consists of the following:
- Strings of handwritten characters (0-9) were scanned using the HP scanning equipment in
Mudd Lab to create a digital image representation.
- Our system then segments the binary
image, isolating and boxing individual characters. Care was taken to design a segmenter
robust enough to allow for reasonable breaks in the characters. Note that markers
out-perform pencil in creating continuous characters!
- The following features
are calculated for each character: moments up to six, y-median, find holes. In cases of
ambiguity when classification is being performed, a circularity test is also run.
- The
feature vector is passed to a neural network system comprised of ten networks with two
neuron layers each. The neural network was trained using 25 digits from each of the ten
classes (0-9). Note this is a relatively small training set due to time and memory
constraints. "Noisy" hand-written digits were used in the training phase as the system
was designed to recognize handwriting.
The final product is demo-type program which displays the scanned characters, visually boxes
each character starting from the top left and moving left to right down the page, and
outputs to the Matlab screen the resultant character.
Postal Sporks (harton@rice.edu)