Optical Character Recognition (OCR)
What is OCR
OCR is the process of taking characters and words from
an image (such as a newspaper that was scanned in) and converting it to a text based form
that could be edited in a word processor. This is currently one of the most developed
areas of computer vision. Several programs already exist that can convert images to text
with very little error. Many of the newer OCR programs can also recognize different font
types and sizes and can produce formatted text documents, such as Microsoft Word
documents.
Although a lot of progress has been made in the area of OCR, there is still a lot of research for new implementations and applications. I will discuss one such application shortly, under the topic of Content Based Image Recognition.
Character Recognition
The first step in OCR, and probably the most critical,
is recognizing individual characters. There are many possible algorithms for performing
character recognition, but all of them follow 2 basic steps. First, apply image processing
to filter out the visual information that is not needed. Next, you need some process to
use the visual information to determine the character. In almost all cases, this involves
a neural network.
A sample character recognition algorithm is the chaincode algorithm. In the chaincode algorithm the computer performs the following steps.
1) Take the original character graphic. | ![]() |
|
2) Apply image processing (such as binarization and edge detection) to get the outline of the character. | ![]() |
|
3) Break the character into a 4x4 grid. | ![]() |
|
4) For each cell, calculate the average slop and curvature of the outline. |
||
5)Pass this data to a neural network, which will output the recognized character. |
The output of the character recognition algorithm (step 5) depends on the specific implementation. In some implementations, the neural networks may just output a single character. Other neural networks may output multiple possible characters along with a likelihood for that character (for example, "85% sure the letter is an 'a', 40% sure the letter is an 'o'). Again, this will depend upon the implementation of the character recognition algorithm.
There are many other possible character recognition algorithms. Some others are Gradient, Histogram, and Polynomial Classification.
Types of OCR
There are 2 basic ways in which OCR can be performed: Preprinted Text and
Live Handwriting.
Preprinted Text OCR
One type of OCR that can be performed is on preprinted
text. Basically this is text that has already been written or printed before the OCR
processing takes place. Some examples of this would be a newspaper, an encyclopedia, an
employment or loan application, or a handwritten letter to your mother. This type of
processing can be performed on documents that are handwritten, or it can be performed on
documents that were printed by a computer printer or a printing press. With this type of
OCR, you usually start with a paper copy of your document, and convert it to a text file
or word processing file that you can later edit.
Preprinted text OCR is usually performed in the following manner.
Applications of Preprinted Text OCR - U.S. Postal Service
One very good example of a real world, in use,
preprinted text OCR system is in use by the United States Postal Service (USPS). Each year
the USPS processes more than 100 billion pieces of mail. Sorting this amount of mail would
require a tremendous effort on the part of the USPS employees. What the USPS needed was a
systems that could automatically determine where a piece of mail is supposed to go based
on the writing on the envelope.
The technique in use by the USPS is as follows. Each time a piece of mail is handled by a machine, it is checked to see if a bar code is printed on the envelope. The bar code indicates the delivery point code (ZIP+4+2 code), or in other words, where that piece of mail is to be delivered. If the bar code exists, then it can be read and used to determine how to sort the envelope and where to ship it. If a bar code is not present, then the envelope gets sent through a special machine that performs OCR on the envelope, and prints the appropriate bar code on the envelope. Once the bar code is printed on an envelope, OCR never needs to be performed on that envelope again.
The process by which the delivery point code is determined is as follows:
1. First, the envelope is digitized (scanned).
2. Next, the image is converted to grayscale.
2. The image is processed to determine the location of the destination address block. The destination address block is then extracted from the image. Binarization is then applied to the destination address block.
3. The destination address block is broken into parts, such as name, street number, street name, city, state, and ZIP code.
4. OCR is performed on the street number, state, and ZIP code. A query is then performed against a national address database to find all street names that could possibly match the street number, state, and ZIP code. Using these possible street names as a basis, OCR is then performed on the street name and matched up to one of the street names returned by the query.
5. Using the above information, a unique delivery point code can then be determined, and a bar code can be printed on the envelope.
Some of the requirements of this system are that, since several hundred million envelopes are processed daily, the system must operate in real-time. Also, the system must be able to perform the OCR with a low error rate.The resulting system meets these requirements quite nicely. First, the system can process up to 3 envelopes per second. This process may be slower. If the delivery point code cannot be determined after the first processing, then the envelope may be reprocessed in an enhanced mode to make a better attempt to determine the delivery point code. After all reprocessing, the final results are that the system is able to identify the destination 66% of the time with a 2% error rate. Further refinements to the system can greatly enhance these numbers.
Live Handwriting OCR
In live handwriting OCR, the text is being processed as it is written.
This type of technique is applicable to computer writing tablets or Personal Digital
Assistants (PDAs). This technique can be used for handwritten printing or cursive.
Processing live handwriting is very similar to processing preprinted text. The major difference is in the character recognition algorithm. Typically, in live handwriting recognition, the computer tracks the movement of the writing pen. It records timings, along with directions, angles, and curvature of lines. It then passes this data through a neural network to determine the character. Once the character is determined, the remainder of the process is very similar to preprinted text OCR.
Applications of Live Handwriting OCR
At the end of this report, in the links section, there
is a link to an online live handwriting OCR system. With this system, you can write (in
either printing or cursive) the name of a German city, and the program can usually
determine what city's name you wrote.
NEXT TOPIC - CONTENT BASED IMAGE RECOGNITION
PREVIOUS TOPIC - IMAGE PROCESSING BASICS