IMAGE SET-II
These are some of the set-2 document images. These are scanned documents. After scanning we also apply degradations in some of the images. We evaluated our results using two OCRs (Optical Character Recognition): well known ABBYY Fine Reader and free OCR. We evaluate the quality of algorithms on OCR results by calculating Levenshtein distance from ground truth text.
Levenshtein Distance
The Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
IMAGES
* Click on link to view result of used binarization algorithms1) set2/image18Image size (in pixels): 182×331 |
2) set2/image17Image size (in pixels): 320×832 |
3) set2/image16This image shows the significance of binarization in the OCR. Using our algorithm a less powerful OCR (here free OCR) performs better than more powerful one (here ABBYY). Image size (in pixels): 640×920 |
4) set2/image15This image shows the significance of binarization in the OCR. Using our algorithm a less powerful OCR (here free OCR) performs better than more powerful one (here ABBYY). Image size (in pixels): 800×737 |
5) set2/image06bImage size (in pixels): 646×1180 |
6) set2/image01Image size (in pixels): 182×773 |