How to Scan Documents Using Optical Character Recognition (OCR)
In this section you will learn how to use a flatbed scanner and OmniPage to digitize a document using optical character recognition (OCR) from a printed medium and save it in an editable format for later use.
In order to scan a document make sure you have the following:
- Flatbed scanner
- OCR Software such as OmniPage
- Document to be scanned
Steps to Resolution
This guide is focused with how to use a scanner and OCR software, it does not address how to setup and install a new scanner. Please refere to the product setup/installation guides that came with the device for further details on these steps.
Most OCR software has a multitude of different features that can be used to digitize printed documents into editable text. These directions will present one method of achieving this goal.
- Launch OmniPage
- From the Start Page click Scan Document.
- After the scanner warms up a preview image will be displayed of the document as shown in the image below.
- Select the Black and white picture or text option. Then adjust the region in the window so that all the desired text is enclosed. Click Scan.
- After the page is scanned you will be prompted if you would like to Stop Loading Pages, or Add More Pages. Select the option that is most applicable for your needs. Repeat steps 4 & 5 until all pages have been scanned then click Stop Loading Pages.
- We will now have OmniPage perform optical character recognition. Click the Automatic button on the tool bar.
- The OCR Proofreader will display any words that it does not recognize and will display it as a Suspect Word. If the word/object is valid click Ignore, otherwise in the Suggestions box select the correct word and click Change.
- After the OCR has been completed click the Save to Files button and select Save to Files.
- From the Save to File dialog window select the destination that you want to save your file to (Desktop, flash drive, etc) and do the following:
- In the File name box enter the name of your file.
- Under Save as select Text.
- In the Files of type drop down window select Microsoft Word 2007/2010 (*.docx).
- Under Formatting level select Flowing Page
- Under File options select Create one file for all pages
- Under Page range select All pages
- Click Ok
Recognition of Latin-script, typewritten text is still not 100% accurate even where clear imaging is available. Some studies find that commercial OCR software ranges between 71% - 98% accurate. It is important to note that all OCR documents should be reviewed for both accuracy (the correct words) and formatting. Recognition of hand printing, cursive handwriting, and printed text in other scripts (especially in some East Asian language characters which have many strokes for a single character) are areas still under active development by OCR software publishers.