Forms Processing
Encyclopedia
Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that hard copy
data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.
driven applications these common issues can be resolved and minimized to great extent. Most methods for forms processing address the following areas.
involves human operators keying in data found on the form. The manual process of data entry has many disadvantages in speed, accuracy and cost. Based on average professional typist speeds of 50 to 80 wpm, one could generously estimate about two hundred pages per hour for forms with fifteen one-word fields (not counting the time for reading and sorting pages). In contrast, modern commercial scanners can scan and digitize 200 pages per minute. The second major disadvantage to manual data entry is the likelihood of typographical errors. When factoring in the cost of labor and working space, manual data entry is a very inefficient process.
Automatic form input systems use different types of recognition methods such as optical character recognition
(OCR) for machine print, optical mark reading
(OMR) for check/mark sense boxes, bar code recognition (BCR) for barcodes, and intelligent character recognition
(ICR) for hand print.
With automated form processing system technology users are able to process documents from their scanned images into a computer readable format such as ANSI, XML, CSV, PDF or input directly into a database.
Forms Processing has developed beyond basic capture of the data. Forms processing not only encompasses a recognition process but also helps manage the complete life cycle of documents which starts from scanning of the document to the extraction of the data, and often to delivery into a back-end system. In some cases it may also include processing or generating well formatted results through calculations and analysis. An automated forms processing system can be valuable if there is a need to process hundreds or thousands of images every day.
OCR ecognizes machine-printed uppercase/lowercase alphabetic, numeric, accented characters, many currency symbols
, digits, arithmetic symbols, expanded punctuation characters and more.
ICR recognizes hand-printed American and European English characters using pre-defined character sets: uppercase, lowercase, mixed case alphabetic, digits, currency (including $ (dollar), ¢ (cent) € (Euro) £ (pound), ¥ (Yen)), arithmetic and punctuation characters (including period, comma, single quote
, double quote, ! & ? @ { } \ # % * + – / : ; < = >)
MICR is recognition technology to facilitate the processing of the MICR fonts of cheques. This minimizes chances of errors in clearing of cheques. It is also useful for easier and faster transfer of funds. MICR provides a secure, high-speed method of scanning and processing information.
Optical Mark Recognition (OMR) identifies bubbles filled in by hand or check boxes on printed forms. Usually OMR supports single and multiple mark recognition. The fields to be recognized can be specified as grids (rows by columns) or single bubbles.
Barcode Recognition can read more than 20 industry 1D and 2D barcodes including Code39, CODABAR, Interleaved 2 of 5
, Code93 and more. It automatically detects all barcodes in an image or specified area within the image.
Hard copy
In information handling, a hard copy is a permanent reproduction, or copy, in the form of a physical object, of any media suitable for direct use by a person , of displayed or transmitted data...
data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.
Overview
In the broadest sense, forms processing systems can range from the processing of small application forms to large scale survey forms with multiple pages. There are several common issues involved in forms processing when done manually. These are a lot of tedious human efforts put in, the data keyed in by the user may result in typos, and many hours of labor result from this lengthy process. If the forms are processed using computer softwareComputer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....
driven applications these common issues can be resolved and minimized to great extent. Most methods for forms processing address the following areas.
Manual data entry
This method of data processingData processing
Computer data processing is any process that a computer program does to enter data and summarise, analyse or otherwise convert data into usable information. The process may be automated and run on a computer. It involves recording, analysing, sorting, summarising, calculating, disseminating and...
involves human operators keying in data found on the form. The manual process of data entry has many disadvantages in speed, accuracy and cost. Based on average professional typist speeds of 50 to 80 wpm, one could generously estimate about two hundred pages per hour for forms with fifteen one-word fields (not counting the time for reading and sorting pages). In contrast, modern commercial scanners can scan and digitize 200 pages per minute. The second major disadvantage to manual data entry is the likelihood of typographical errors. When factoring in the cost of labor and working space, manual data entry is a very inefficient process.
Automated forms processing
This method can automate data processing by using pre-defined templates and configurations. A template in this case, would be a map of the document, detailing where the data fields are located within the form or document. As compared to the manual data entry process, automatic form input systems are more preferable, since they help reduce the problems faced during manual data processing.Automatic form input systems use different types of recognition methods such as optical character recognition
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
(OCR) for machine print, optical mark reading
Optical mark recognition
Optical Mark Recognition is the process of capturing human-marked data from document forms such as surveys and tests.-OMR background:...
(OMR) for check/mark sense boxes, bar code recognition (BCR) for barcodes, and intelligent character recognition
Intelligent Character Recognition
In computer science, intelligent character recognition is an advanced optical character recognition or — rather more specific — handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer during processing to improve accuracy and recognition...
(ICR) for hand print.
With automated form processing system technology users are able to process documents from their scanned images into a computer readable format such as ANSI, XML, CSV, PDF or input directly into a database.
Forms Processing has developed beyond basic capture of the data. Forms processing not only encompasses a recognition process but also helps manage the complete life cycle of documents which starts from scanning of the document to the extraction of the data, and often to delivery into a back-end system. In some cases it may also include processing or generating well formatted results through calculations and analysis. An automated forms processing system can be valuable if there is a need to process hundreds or thousands of images every day.
Components
Various components included in data processing using automatic form-input system include- OCR – Optical character recognitionOptical character recognitionOptical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
- OMR – Optical mark recognitionOptical mark recognitionOptical Mark Recognition is the process of capturing human-marked data from document forms such as surveys and tests.-OMR background:...
- ICR – Intelligent character recognitionIntelligent Character RecognitionIn computer science, intelligent character recognition is an advanced optical character recognition or — rather more specific — handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer during processing to improve accuracy and recognition...
- BCR – BarcodeBarcodeA barcode is an optical machine-readable representation of data, which shows data about the object to which it attaches. Originally barcodes represented data by varying the widths and spacings of parallel lines, and may be referred to as linear or 1 dimensional . Later they evolved into rectangles,...
recognition - MICR – Magnetic ink character recognitionMagnetic ink character recognitionMagnetic Ink Character Recognition, or MICR, is a character recognition technology used primarily by the banking industry to facilitate the processing of cheques and makes up the routing number and account number at the bottom of a check. The technology allows computers to read information off...
OCR ecognizes machine-printed uppercase/lowercase alphabetic, numeric, accented characters, many currency symbols
Currency sign
A currency sign is a graphic symbol used as a shorthand for a currency's name, especially in reference to amounts of money. They typically employ the first letter or character of the currency, sometimes with minor changes such as ligatures or overlaid vertical or horizontal bars...
, digits, arithmetic symbols, expanded punctuation characters and more.
ICR recognizes hand-printed American and European English characters using pre-defined character sets: uppercase, lowercase, mixed case alphabetic, digits, currency (including $ (dollar), ¢ (cent) € (Euro) £ (pound), ¥ (Yen)), arithmetic and punctuation characters (including period, comma, single quote
Quotation mark
Quotation marks or inverted commas are punctuation marks at the beginning and end of a quotation, direct speech, literal title or name. Quotation marks can also be used to indicate a different meaning of a word or phrase than the one typically associated with it and are often used to express irony...
, double quote, ! & ? @ { } \ # % * + – / : ; < = >)
MICR is recognition technology to facilitate the processing of the MICR fonts of cheques. This minimizes chances of errors in clearing of cheques. It is also useful for easier and faster transfer of funds. MICR provides a secure, high-speed method of scanning and processing information.
Optical Mark Recognition (OMR) identifies bubbles filled in by hand or check boxes on printed forms. Usually OMR supports single and multiple mark recognition. The fields to be recognized can be specified as grids (rows by columns) or single bubbles.
Barcode Recognition can read more than 20 industry 1D and 2D barcodes including Code39, CODABAR, Interleaved 2 of 5
Interleaved 2 of 5
Interleaved 2 of 5 is a continuous two-width barcode symbology encoding digits. It is used commercially on 135 film and on cartons of some products, while the products inside are labeled with UPC or EAN....
, Code93 and more. It automatically detects all barcodes in an image or specified area within the image.
Process
The process of automated forms processing typically includes the following steps:- A batch of completed forms is scanned using a high-speed scanner
- Images are cleaned with document image processing algorithms to improve accuracy
- Forms are classified based on original template forms and the fields are extracted using the appropriate recognition components
- Fields which the system flagged with a low confidence are queued for verification by a human operator
- Verified data is saved into a database or exported to searchable text format such as CSV, XML or PDF
Prerequisites
Though automated forms processing has many great advantages over manual data entry, it still comes with limitations. To achieve the best accuracy, some prerequisites should be followed.- Scan format: It includes the format of scanned file, Resolution and DPI, Color Mode
- Configuration: The scanned image layout needs to be configured for this automation
- Recognition: The pre defined out put formats
- Result /analyze: Any specific format of result of capture value data presentation.