Optical Character Recognition (OCR)

What is OCR
OCR is the process of taking characters and words from an image (such as a newspaper that was scanned in) and converting it to a text based form that could be edited in a word processor. This is currently one of the most developed areas of computer vision. Several programs already exist that can convert images to text with very little error. Many of the newer OCR programs can also recognize different font types and sizes and can produce formatted text documents, such as Microsoft Word documents.

Although a lot of progress has been made in the area of OCR, there is still a lot of research for new implementations and applications. I will discuss one such application shortly, under the topic of Content Based Image Recognition.

Character Recognition
The first step in OCR, and probably the most critical, is recognizing individual characters. There are many possible algorithms for performing character recognition, but all of them follow 2 basic steps. First, apply image processing to filter out the visual information that is not needed. Next, you need some process to use the visual information to determine the character. In almost all cases, this involves a neural network.

A sample character recognition algorithm is the chaincode algorithm. In the chaincode algorithm the computer performs the following steps.

1) Take the original character graphic. Letter.gif (5980 bytes)
2) Apply image processing (such as binarization and edge detection) to get the outline of the character. Letter Outline.gif (1279 bytes)
3) Break the character into a 4x4 grid. Letter Outline Grid.gif (1508 bytes)
 

4) For each cell, calculate the average slop and curvature of the outline.

 

5)Pass this data to a neural network, which will output the recognized character.

The output of the character recognition algorithm (step 5) depends on the specific implementation. In some implementations, the neural networks may just output a single character. Other neural networks may output multiple possible characters along with a likelihood for that character (for example, "85% sure the letter is an 'a', 40% sure the letter is an 'o'). Again, this will depend upon the implementation of the character recognition algorithm.

There are many other possible character recognition algorithms. Some others are Gradient, Histogram, and Polynomial Classification.

Types of OCR
There are 2 basic ways in which OCR can be performed: Preprinted Text and Live Handwriting.

Preprinted Text OCR
One type of OCR that can be performed is on preprinted text. Basically this is text that has already been written or printed before the OCR processing takes place. Some examples of this would be a newspaper, an encyclopedia, an employment or loan application, or a handwritten letter to your mother. This type of processing can be performed on documents that are handwritten, or it can be performed on documents that were printed by a computer printer or a printing press. With this type of OCR, you usually start with a paper copy of your document, and convert it to a text file or word processing file that you can later edit.

Preprinted text OCR is usually performed in the following manner.

  1. Individual characters are recognize. Each character may have 1 or more possibilities (for example, it may be an 'a' or it may be an 'o').
  2. Determine possible words that can be made from the individual characters. This may generate several possibilities. For example, if step 1 above determined the first letter of the word may be either an 'a' or an 'o', then you may have several possible words starting with 'a' and several possible words starting with 'o'.
  3. Some of the words generated in step 2 may be nonsense words. Thus, all possible words should be compared against a database to determine which of those words actually exist.
  4. Of all the words that exist (as determined by step 4), pick the most likely word. This might be determined based on some weighting value determined from the neural network.

Applications of Preprinted Text OCR - U.S. Postal Service
One very good example of a real world, in use, preprinted text OCR system is in use by the United States Postal Service (USPS). Each year the USPS processes more than 100 billion pieces of mail. Sorting this amount of mail would require a tremendous effort on the part of the USPS employees. What the USPS needed was a systems that could automatically determine where a piece of mail is supposed to go based on the writing on the envelope.

The technique in use by the USPS is as follows. Each time a piece of mail is handled by a machine, it is checked to see if a bar code is printed on the envelope. The bar code indicates the delivery point code (ZIP+4+2 code), or in other words, where that piece of mail is to be delivered. If the bar code exists, then it can be read and used to determine how to sort the envelope and where to ship it. If a bar code is not present, then the envelope gets sent through a special machine that performs OCR on the envelope, and prints the appropriate bar code on the envelope. Once the bar code is printed on an envelope, OCR never needs to be performed on that envelope again.

The process by which the delivery point code is determined is as follows:

1. First, the envelope is digitized (scanned).

envelope.jpg (10326 bytes)

2. Next, the image is converted to grayscale.

gray envelope.gif (7622 bytes)

2. The image is processed to determine the location of the destination address block. The destination address block is then extracted from the image. Binarization is then applied to the destination address block.

address.gif (6954 bytes)

3. The destination address block is broken into parts, such as name, street number, street name, city, state, and ZIP code.

4. OCR is performed on the street number, state, and ZIP code. A query is then performed against a national address database to find all street names that could possibly match the street number, state, and ZIP code. Using these possible street names as a basis, OCR is then performed on the street name and matched up to one of the street names returned by the query.

5. Using the above information, a unique delivery point code can then be determined, and a bar code can be printed on the envelope.

barcode.jpg (11186 bytes)

 

Some of the requirements of this system are that, since several hundred million envelopes are processed daily, the system must operate in real-time. Also, the system must be able to perform the OCR with a low error rate.The resulting system meets these requirements quite nicely. First, the system can process up to 3 envelopes per second. This process may be slower. If the delivery point code cannot be determined after the first processing, then the envelope may be reprocessed in an enhanced mode to make a better attempt to determine the delivery point code. After all reprocessing, the final results are that the system is able to identify the destination 66% of the time with a 2% error rate. Further refinements to the system can greatly enhance these numbers.

Live Handwriting OCR
In live handwriting OCR, the text is being processed as it is written. This type of technique is applicable to computer writing tablets or Personal Digital Assistants (PDAs). This technique can be used for handwritten printing or cursive.

Processing live handwriting is very similar to processing preprinted text. The major difference is in the character recognition algorithm. Typically, in live handwriting recognition, the computer tracks the movement of the writing pen. It records timings, along with directions, angles, and curvature of lines. It then passes this data through a neural network to determine the character. Once the character is determined, the remainder of the process is very similar to preprinted text OCR.

Applications of Live Handwriting OCR
At the end of this report, in the links section, there is a link to an online live handwriting OCR system. With this system, you can write (in either printing or cursive) the name of a German city, and the program can usually determine what city's name you wrote.

 

NEXT TOPIC - CONTENT BASED IMAGE RECOGNITION

PREVIOUS TOPIC - IMAGE PROCESSING BASICS

 

return to the beginning