Thursday, August 6, 2009

Activity 10: Preprocessing Text


Objectives: To apply image processing methods to isolate handwritten text from an imaged document.

Tools: Scilab with SIP toolbox

Procedure: We begin with a scanned image of text both handwritten and in type. Since the image is tilted, we make it upright using the mogrify() command in Scilab. For computing expediency we also resize the image to more managable dimensions.


We then crop a small portion of the image, with handwritten text to isolate then convert this to grayscale. Taking the FT as shown in the center image below allows us to see the components that we do not need, in this case the bright vertical line at the center corresponding to the horizontal lines on the image. Using a mask like the one on the right we can cancel this component out.


After eliminating the lines, we clean the images using morphological operators such as closing and opening and resizing . After cleaning, we use bwlabel() to label the text as we have in the middle frame below. Then we convert to binary to get just the hand written text.


We also use image processing methods, in this case image correlation to find the occurences of a tempalte word, in this case "DESCRIPTION". The images below show the template word and the results of the correlation. The images appear to indicate very roughly the instances of occurence but are not as properly correlated as can be expected.


Evaluation: Since the text is not as clearly isolated and the correlation is not as proper, only 7 out of the possible 10 points is proper.

Acknowledgement: I would like to thank Mr. Cabello for constructive discussions.

No comments:

Post a Comment

Followers