Searchable files with OCR

You cannot search an imaged file without optical character recognition software, also known as OCR. Current technology does not allow you to search for text within an image, though there are a few search engines and search tools that claim to be able to search images. Generally, when we talk about a searchable file, what we are referencing is a text file, or an OCR’ed file. A text file in a computer is not an image which contains text. Even though human eyes can recognize such text, a computer cannot. A searchable file is not just an image file that contains what can be identified as text by the human eye.

How does OCR help in creating searchable files?

OCR is a great technology that converts text contained in image documents to text that is searchable by a computer. The software can scan this image file and recognize text in it. This helps create a searchable file from an otherwise non-searchable one.

How does OCR do that?

OCR technology is software that contains hundreds of thousands of bitmap images of Latin scripts that it uses to compare with segments of the scanned image document. When it finds a resemblance, it files that part of the document away as a string of text. This way, it can recognize and extract text from a complete image document. But this process requires one to store every possible script used and available, which is hardly possible for handwriting text.

Tags: , ,

Leave a comment