Handwriting recognition

                                                                                                                                                                                                  Diana Negoescu ‘09

 

 

PROJECT DESCRIPTION:

           

            Since actual handwriting seemed a daunting task at first, my project started off trying to recognize road signs. This seemed like a good place to start because road signs contain typed letters and once I had the algorithm to trace each individual letter and determine to which letter of the alphabet it most likely corresponded, I could expand it to recognize handwriting.                          

 

The program  Img.java takes as input a picture of a road sign, converts it to black and white, and creates a matrix  of the same size as the original image containing 0’s for the pixels in the background and 1’s, 2’s and so on for the pixels that belong to the first, second and so on letters. It then separates the letters by copying each to a smaller matrix just fitting the letter, scales them to be the size of the reference letters of the alphabet and overlaps them, counting the pixels that don’t match.

 

The program Method2.java uses a more complicated approach: it first detects the edges of the letters using the Canny edge detection algorithm, then takes 100 random points on the edge of the letters to be determined and 100 points on the edge of the reference letters against which it is matched and builds up a fixed number of circles divided into sectors around each point. Using the Hungarian algorithm,  the program matches each point on the letter to a point on the reference such as to minimize a cost (which states how good the match is), and chooses the best match after trying all the letters in the alphabet. Despite the complicated approach (or perhaps due to it), the program is far less effective than the first, simple method, and the running time is extremely large.

 

Markovcalphai.java detects handwriting by using the first method for letter detection, together with a 2-letter hidden Markov model to help improve the results.  The Markov model is built using dynamic programming, by filling in a matrix whose elements store the probability of a certain sequence occurring (which is taken as the maximum of an array whose elements are the probability of a previous sequence occurring multiplied by the probability that the blob it is looking at is a certain letter and the transition probability from the last letter of the previous sequence to the current character) and a string consisting of the respective sequence. The program is also able to insert spaces between words, by creating an array of the minimum distance between consecutive letters and setting a threshold as twice the median of the array and does not get stumbled on the dots on the i’s because it ignores all objects whose area is less than a threshold (although it does get stumbled if the dots are bigger than the i’s themselves, as it often occures in my handwriting).

 

Markovhandwords.java is another implementation of the above method for handwriting except that it uses an obscure modification to the Markov model (it takes the eighth square root of the transition probabilities), which nonetheless gave slightly better results.

 

Markovcalphamod.java attempts to build a 3-letter Markov model, but due to time constraints it has not been improved to give satisfactory results.

 

 

RESULTS

 

 

>java Img south.jpg<database.txt

SoUTH

>java Method2 south.jpg

$4WTw

 

 

 

 

> java Markovcalphai aesopshort.txt princeton.jpg 1 10

Princeton is beauthThE

> java Markovhandwords aesopshort.txt princetoni.jpg 1 10

Markov method:

Princeton is beautifiE

Non-Markov method:

Prbncetoc is beautilaE

 

> java Markovcalphamod aesopshort.txt princetoni.jpg 1 10

plinceson is beautelde >

 

 

 

 

 

 

 

 

 

 

 

 

 

 

> java Markovcalpha aesopshort.txt weaseli.jpg 1 10

The WhedEt refthed l Fa lJ thaT ke thi dI netere the enemy of atE ddith The dat reDisthed lm thet he wai not a dird ut O Quthe and thai Hei het fiee