HOCR
Encyclopedia
hOCR is an open standard which defines a data format for representation of OCR
output. The standard aims to embed layout, recognition confidence, style and other information into the recognized text itself. Embedding this data into text in the standard HTML format is used to achieve that goal.
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
output. The standard aims to embed layout, recognition confidence, style and other information into the recognized text itself. Embedding this data into text in the standard HTML format is used to achieve that goal.
See also
- Software that utilizes this format:
- OCRopusOCRopusOCRopus is a free document analysis and optical character recognition system released under the Apache License, Version 2.0 with a very modular design through the use of plugins...
— free OCR software for Linux - TesseractTesseract (software)Tesseract is a free software optical character recognition engine for various operating systems.Originally developed as proprietary software at Hewlett-Packard between 1985 and 1995, it had very little work done on it in the following decade. It was then released as open source in 2005 by Hewlett...
— OCR engine used by OCRopus (as of 3.0) - CuneiformCuneiForm (software)In computer software, CuneiForm is an OCR tool. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of OCR engine was released under the open source BSD license license at the beginning of April...
— free OCR software - ExactImage — free image processing software
- OCRopus
External links
- Public Specification for the hOCR Format
- hocr-tools on Google Code
- hOCR discussion group
- moz-hocr-edit hOCR document editor