GOCR
Overview
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
optical character recognition
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
program, initially written by Jörg Schulenburg. It can be used to convert or scan image
Image scanner
In computing, an image scanner—often abbreviated to just scanner—is a device that optically scans images, printed text, handwriting, or an object, and converts it to a digital image. Common examples found in offices are variations of the desktop scanner where the document is placed on a glass...
files (portable pixmap
Portable pixmap
The phrase Netpbm format commonly refers to any or all of the members of a set of closely related graphics formats used and defined by the Netpbm project....
or PCX
PCX
PCX is an image file format developed by the now-defunct ZSoft Corporation of Marietta, Georgia. It was the native file format for PC Paintbrush and became one of the first widely accepted DOS imaging standards, although it has since been succeeded by more sophisticated image formats, such as GIF,...
) into text file
Text file
A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...
s.
GOCR claims it can handle single-column sans-serif fonts of 20–60 pixels in height. It reports trouble with serif fonts, overlapping characters, handwritten text, heterogeneous fonts, noisy images, large angles of skew, and text in anything other than a Latin alphabet
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
.
GOCR can also translate barcode
Barcode
A barcode is an optical machine-readable representation of data, which shows data about the object to which it attaches. Originally barcodes represented data by varying the widths and spacings of parallel lines, and may be referred to as linear or 1 dimensional . Later they evolved into rectangles,...
s.
GOCR can be used as a stand-alone command-line application, or as a back-end to other programs. It comes with a gocr.tcl graphic interface.
Discussions