IMAGE SET-II

These are some of the set-2 document images. These are scanned documents. After scanning we also apply degradations in some of the images. We evaluated our results using two OCRs (Optical Character Recognition): well known ABBYY Fine Reader and free OCR. We evaluate the quality of algorithms on OCR results by calculating Levenshtein distance from ground truth text.

Levenshtein Distance

The Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.

IMAGES

* Click on link to view result of used binarization algorithms

1) set2/image18

Image size (in pixels): 182×331

2) set2/image17

Image size (in pixels): 320×832

3) set2/image16

This image shows the significance of binarization in the OCR. Using our algorithm a less powerful OCR (here free OCR) performs better than more powerful one (here ABBYY).

Image size (in pixels): 640×920

4) set2/image15

This image shows the significance of binarization in the OCR. Using our algorithm a less powerful OCR (here free OCR) performs better than more powerful one (here ABBYY).

Image size (in pixels): 800×737

5) set2/image06b

Image size (in pixels): 646×1180

6) set2/image01

Image size (in pixels): 182×773