HOCR
Encyclopedia
hOCR is an open standard which defines a data format for representation of OCR
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 output. The standard aims to embed layout, recognition confidence, style and other information into the recognized text itself. Embedding this data into text in the standard HTML format is used to achieve that goal.

See also

  • Software that utilizes this format:
    • OCRopus
      OCRopus
      OCRopus is a free document analysis and optical character recognition system released under the Apache License, Version 2.0 with a very modular design through the use of plugins...

       — free OCR software for Linux
    • Tesseract
      Tesseract (software)
      Tesseract is a free software optical character recognition engine for various operating systems.Originally developed as proprietary software at Hewlett-Packard between 1985 and 1995, it had very little work done on it in the following decade. It was then released as open source in 2005 by Hewlett...

       — OCR engine used by OCRopus (as of 3.0)
    • Cuneiform
      CuneiForm (software)
      In computer software, CuneiForm is an OCR tool. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of OCR engine was released under the open source BSD license license at the beginning of April...

       — free OCR software
    • ExactImage — free image processing software

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK