Document capture software
Encyclopedia
Document Capture Software refers to applications that provide the ability and feature set to automate the process of scanning
paper documents. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats
, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.
or Enterprise Content Management
System like Microsoft SharePoint, Marex FileBound
, Cabinet NG, FileNet
, etc. These systems often provide a search function, allowing search of the assets based on the produced metadata
, and then viewed using document imaging
software.
By converting paper documents into digital format through scanning can companies can convert paper into image formats such as TIF and JPG and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in SharePoint in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents in SharePoint by using keywords in the metadata fields or by searching the content of PDF files across the SharePoint repository.
Organisations adopting SharePoint often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails.
For business critical documents, such as purchase orders and supplier invoices, digitising documents can help speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow.
Jeff Shuey, Director of Business Development at Kodak, makes a distinction between distributed capture and what he calls "remote" capture. In an article publishing in AIIM, he said that the key difference between the two is whether or not the information that is captured from scanning needs to be sent to the centralized server. If, as he points out in his article, the document just needs to be scanned and committed to a SharePoint system and doesn't need to be sent to some other centralized server, this is just a remote capture situation.
Image scanner
In computing, an image scanner—often abbreviated to just scanner—is a device that optically scans images, printed text, handwriting, or an object, and converts it to a digital image. Common examples found in offices are variations of the desktop scanner where the document is placed on a glass...
paper documents. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats
Image file formats
Image file formats are standardized means of organizing and storing digital images. Image files are composed of either pixels, vector data, or a combination of the two. Whatever the format, the files are rasterized to pixels when displayed on most graphic displays...
, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.
Typical features
Typical features of Document Capture Software include:- BarcodeBarcodeA barcode is an optical machine-readable representation of data, which shows data about the object to which it attaches. Originally barcodes represented data by varying the widths and spacings of parallel lines, and may be referred to as linear or 1 dimensional . Later they evolved into rectangles,...
recognition - Patch Code recognition
- Separation
- Optical Character Recognition (OCR)Optical character recognitionOptical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
- Optical Mark Recognition (OMR)Optical mark recognitionOptical Mark Recognition is the process of capturing human-marked data from document forms such as surveys and tests.-OMR background:...
- Quality Assurance
- Indexing
- Migration
Goal for Implementation of a Document Capture Solution
The goal for implementing a document capture solution is to reduce the amount of time spent in the scanning and capture process, and produce metadata along with an image file, and/or OCR text. This information is then migrated to a Document ManagementDocument management system
A document management system is a computer system used to track and store electronic documents and/or images of paper documents. It is usually also capable of keeping track of the different versions created by different users . The term has some overlap with the concepts of content management...
or Enterprise Content Management
Enterprise content management
Enterprise Content Management is a formalized means of organizing and storing an organization's documents, and other content, that relate to the organization's processes...
System like Microsoft SharePoint, Marex FileBound
FileBound
FileBound is a document management, workflow and electronic forms software suite of products. FileBound is developed by Marex Group, Inc., an AIIM Advisory trade member and Microsoft gold certified partner headquartered in Lincoln, Nebraska.- History :...
, Cabinet NG, FileNet
FileNet
FileNet, a company acquired by IBM, developed software to help enterprises manage their content and business processes. The FileNet P8 platform, their flagship system, is a framework for developing custom enterprise systems, offering much functionality out of the box and capable of being customized...
, etc. These systems often provide a search function, allowing search of the assets based on the produced metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
, and then viewed using document imaging
Document imaging
Document imaging is an information technology category for systems capable of replicating documents commonly used in business. Document imaging systems can take many forms including microfilm, on demand printers, facsimile machines, copiers, multifunction printers, document scanners, computer...
software.
Microsoft SharePoint
Microsoft SharePoint is being adopted by many organisations as a corporate document management system for Microsoft Office documents and other electronic files. However, much of the information held by organisations is on paper and this needs to be integrated within the same document repository.By converting paper documents into digital format through scanning can companies can convert paper into image formats such as TIF and JPG and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in SharePoint in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents in SharePoint by using keywords in the metadata fields or by searching the content of PDF files across the SharePoint repository.
Advantages of scanning documents into SharePoint
Information held on paper is usually just as valuable to organisations as the electronic documents that are generated internally. Often this information represents a large proportion of the day to day correspondence with suppliers and customers. Having the ability to manage and share this information internally through a document management system such as SharePoint can improve collaboration between departments or employees and also eliminate the risk of losing this information through disasters such as floods or fire.Organisations adopting SharePoint often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails.
For business critical documents, such as purchase orders and supplier invoices, digitising documents can help speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow.
Document Capture Software for Microsoft SharePoint
There are many document capture software providers that offer integration with SharePoint to varying levels. Some providers offer a batch interface that simply drops images and index data into a directory and relies on a batch upload utility to transfer these documents into SharePoint. Others offer a direct integration with SharePoint which allows documents and metadata to be exported into specific folders within SharePoint. A few capture providers offer a very tightly integrated bi-directional interface with SharePoint. More information can be found from the additional sources below.Additional Sources for Information on SharePoint Capture Solutions
- http://www.aiim.org/sharepoint/Scan-Capture-Documents-to-SharePoint.aspx
- http://www.aiim.org/Resources/eBooks
- http://scanningwithsharepoint.wordpress.com/
- http://aiimcommunities.org/capture/blog/document-imaging-sharepoint-yes-you-can
- http://www.datafinity.co.uk/psi_capture.html
Distributed Capture Solutions
Distributed document capture is a technology which allows the scanning of documents into a central server through the use of individual capture stations. A variation of distributed capture is thin-client document capture in which documents are scanned into a central server through the use of web browser. One of these web-based products was reviewed by AIIM. They said, "(this product) is a thin-client distributed capture system that streamlines the process of acquiring and creating documents." The streamlining is a result of several factors including the lack of software which needs to be installed at every scanning station and the variety of input sources from which documents can be captured. This includes things like email, fax, or a watched folder.Jeff Shuey, Director of Business Development at Kodak, makes a distinction between distributed capture and what he calls "remote" capture. In an article publishing in AIIM, he said that the key difference between the two is whether or not the information that is captured from scanning needs to be sent to the centralized server. If, as he points out in his article, the document just needs to be scanned and committed to a SharePoint system and doesn't need to be sent to some other centralized server, this is just a remote capture situation.