Ocrad
Encyclopedia
Ocrad is an optical character recognition
program, developed as part of the GNU Project
. Like all GNU software it is free software
, and is licensed under the GNU GPL
.
Based on a feature extraction
method, it reads images in portable pixmap
formats known collectively as PNM (PBM, PGM and PPM; each of these formats comes with two versions: "plain" or "raw") and produces text in byte (8-bit) or UTF-8
formats. Also included is a layout analyzer, able to separate the columns or blocks of text normally found on printed pages.
The source code
is 10,000 lines of C++
.
Kooka, which was the KDE
environment's default scanning application until KDE 4, can use Ocrad as its OCR engine. Because development of the programme ceased in 2007, current versions of the KDE deskop environment no longer contain Kooka. Ocrad can be also used as an OCR engine in OCRFeeder
.
Archives of the bug-ocrad mailing list go back to October 2003.
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
program, developed as part of the GNU Project
GNU Project
The GNU Project is a free software, mass collaboration project, announced on September 27, 1983, by Richard Stallman at MIT. It initiated GNU operating system development in January, 1984...
. Like all GNU software it is free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
, and is licensed under the GNU GPL
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
.
Based on a feature extraction
Feature extraction
In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant then the input data will be transformed into a reduced representation...
method, it reads images in portable pixmap
Portable pixmap
The phrase Netpbm format commonly refers to any or all of the members of a set of closely related graphics formats used and defined by the Netpbm project....
formats known collectively as PNM (PBM, PGM and PPM; each of these formats comes with two versions: "plain" or "raw") and produces text in byte (8-bit) or UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
formats. Also included is a layout analyzer, able to separate the columns or blocks of text normally found on printed pages.
The source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
is 10,000 lines of C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
.
User interface
Ocrad can be used as a stand-alone command-line application, or as a back-end to other programs.Kooka, which was the KDE
KDE
KDE is an international free software community producing an integrated set of cross-platform applications designed to run on Linux, FreeBSD, Microsoft Windows, Solaris and Mac OS X systems...
environment's default scanning application until KDE 4, can use Ocrad as its OCR engine. Because development of the programme ceased in 2007, current versions of the KDE deskop environment no longer contain Kooka. Ocrad can be also used as an OCR engine in OCRFeeder
OCRFeeder
OCRFeeder is a free software desktop OCR suite for GNOME. It converts paper documents to digital document files or makes them accessible to visually impaired users....
.
History
Ocrad has been developed by Antonio Diaz Diaz since 2003. Version 0.7 was released in February 2004, 0.14 in February 2006 and 0.18 in May 2009.Archives of the bug-ocrad mailing list go back to October 2003.