CuneiForm (software)
Encyclopedia
In computer software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

, CuneiForm is an OCR
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 tool. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware
Freeware
Freeware is computer software that is available for use at no cost or for an optional fee, but usually with one or more restricted usage rights. Freeware is in contrast to commercial software, which is typically sold for profit, but might be distributed for a business or commercial purpose in the...

 on December 12, 2007. The kernel of OCR engine was released under the open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 BSD license license at the beginning of April 2008.

Features

CuneiForm uses the OmniFont system. Algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s used in CuneiForm come from the rules for writing letters
Letter (alphabet)
A letter is a grapheme in an alphabetic system of writing, such as the Greek alphabet and its descendants. Letters compose phonemes and each phoneme represents a phone in the spoken form of the language....

, from their topology, and do not require pattern recognition learning. CuneiForm recognizes any print font (scanned from book
Book
A book is a set or collection of written, printed, illustrated, or blank sheets, made of hot lava, paper, parchment, or other materials, usually fastened together to hinge at one side. A single sheet within a book is called a leaf or leaflet, and each side of a leaf is called a page...

s, newspaper
Newspaper
A newspaper is a scheduled publication containing news of current events, informative articles, diverse features and advertising. It usually is printed on relatively inexpensive, low-grade paper such as newsprint. By 2007, there were 6580 daily newspapers in the world selling 395 million copies a...

s, magazine
Magazine
Magazines, periodicals, glossies or serials are publications, generally published on a regular schedule, containing a variety of articles. They are generally financed by advertising, by a purchase price, by pre-paid magazine subscriptions, or all three...

s, laser
Laser printer
A laser printer is a common type of computer printer that rapidly produces high quality text and graphics on plain paper. As with digital photocopiers and multifunction printers , laser printers employ a xerographic printing process, but differ from analog photocopiers in that the image is produced...

 printer output, dot-matrix printer
Dot matrix printer
A dot matrix printer or impact matrix printer is a type of computer printer with a print head that runs back and forth, or in an up and down motion, on the page and prints by impact, striking an ink-soaked cloth ribbon against the paper, much like the print mechanism on a typewriter...

 output, typewriter
Typewriter
A typewriter is a mechanical or electromechanical device with keys that, when pressed, cause characters to be printed on a medium, usually paper. Typically one character is printed per keypress, and the machine prints the characters by making ink impressions of type elements similar to the pieces...

 text, etc.). It does not recognize handwritten or pseudo-handwritten text nor does it recognize decorative fonts (e.g. Gothic). There are special settings in CuneiForm for recognition of text from dot-matrix printer and 200x100 DPI
Dots per inch
Dots per inch is a measure of spatial printing or video dot density, in particular the number of individual dots that can be placed in a line within the span of 1 inch . The DPI value tends to correlate with image resolution, but is related only indirectly.- DPI measurement in monitor...

 resolution faxes.

CuneiForm can save text formatting, and also recognizes complicated tables (of any structure).

It recognizes Bulgarian
Bulgarian language
Bulgarian is an Indo-European language, a member of the Slavic linguistic group.Bulgarian, along with the closely related Macedonian language, demonstrates several linguistic characteristics that set it apart from all other Slavic languages such as the elimination of case declension, the...

, Croatian
Croatian language
Croatian is the collective name for the standard language and dialects spoken by Croats, principally in Croatia, Bosnia and Herzegovina, the Serbian province of Vojvodina and other neighbouring countries...

, Czech
Czech language
Czech is a West Slavic language with about 12 million native speakers; it is the majority language in the Czech Republic and spoken by Czechs worldwide. The language was known as Bohemian in English until the late 19th century...

, Danish
Danish language
Danish is a North Germanic language spoken by around six million people, principally in the country of Denmark. It is also spoken by 50,000 Germans of Danish ethnicity in the northern parts of Schleswig-Holstein, Germany, where it holds the status of minority language...

, Dutch
Dutch language
Dutch is a West Germanic language and the native language of the majority of the population of the Netherlands, Belgium, and Suriname, the three member states of the Dutch Language Union. Most speakers live in the European Union, where it is a first language for about 23 million and a second...

, English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

, Estonian
Estonian language
Estonian is the official language of Estonia, spoken by about 1.1 million people in Estonia and tens of thousands in various émigré communities...

, French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

, German
German language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

, Hungarian
Hungarian language
Hungarian is a Uralic language, part of the Ugric group. With some 14 million speakers, it is one of the most widely spoken non-Indo-European languages in Europe....

, Italian
Italian language
Italian is a Romance language spoken mainly in Europe: Italy, Switzerland, San Marino, Vatican City, by minorities in Malta, Monaco, Croatia, Slovenia, France, Libya, Eritrea, and Somalia, and by immigrant communities in the Americas and Australia...

, Latvian
Latvian language
Latvian is the official state language of Latvia. It is also sometimes referred to as Lettish. There are about 1.4 million native Latvian speakers in Latvia and about 150,000 abroad. The Latvian language has a relatively large number of non-native speakers, atypical for a small language...

, Lithuanian
Lithuanian language
Lithuanian is the official state language of Lithuania and is recognized as one of the official languages of the European Union. There are about 2.96 million native Lithuanian speakers in Lithuania and about 170,000 abroad. Lithuanian is a Baltic language, closely related to Latvian, although they...

, Polish
Polish language
Polish is a language of the Lechitic subgroup of West Slavic languages, used throughout Poland and by Polish minorities in other countries...

, Portuguese
Portuguese language
Portuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...

, Romanian
Romanian language
Romanian Romanian Romanian (or Daco-Romanian; obsolete spellings Rumanian, Roumanian; self-designation: română, limba română ("the Romanian language") or românește (lit. "in Romanian") is a Romance language spoken by around 24 to 28 million people, primarily in Romania and Moldova...

, Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...

, Russian-English bilingual, Serbian
Serbian language
Serbian is a form of Serbo-Croatian, a South Slavic language, spoken by Serbs in Serbia, Bosnia and Herzegovina, Montenegro, Croatia and neighbouring countries....

, Slovene, Spanish
Spanish language
Spanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...

, Swedish
Swedish language
Swedish is a North Germanic language, spoken by approximately 10 million people, predominantly in Sweden and parts of Finland, especially along its coast and on the Åland islands. It is largely mutually intelligible with Norwegian and Danish...

, Turkish
Turkish language
Turkish is a language spoken as a native language by over 83 million people worldwide, making it the most commonly spoken of the Turkic languages. Its speakers are located predominantly in Turkey and Northern Cyprus with smaller groups in Iraq, Greece, Bulgaria, the Republic of Macedonia, Kosovo,...

, and Ukrainian
Ukrainian language
Ukrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....

 text.

CuneiForm can save recognized text in RTF
Rich Text Format
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

, HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, or plain text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....

 format. It can also pass text to Microsoft Word
Microsoft Word
Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

 or Microsoft Excel
Microsoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...

.

User interface

CuneiForm can be used as a stand-alone command-line application, or as a back-end to other programs. It comes with its own graphic interface. CuneiForm can be also used as an OCR engine in OCRFeeder
OCRFeeder
OCRFeeder is a free software desktop OCR suite for GNOME. It converts paper documents to digital document files or makes them accessible to visually impaired users....

.

History

Once a leader of OCR software in Russia
Russia
Russia or , officially known as both Russia and the Russian Federation , is a country in northern Eurasia. It is a federal semi-presidential republic, comprising 83 federal subjects...

, CuneiForm was in competition with ABBYY FineReader.

In 1993, Cognitive Technologies signed an OEM
OEM
OEM means the original manufacturer of a component for a product, which may be resold by another company.OEM may also refer to:-Computing:* OEM font, or OEM-US, the original character set of the IBM PC, circa 1981...

 contract with Corel Corporation, which allowed the Cognitive recognition library to be built into the popular publishing package Corel Draw 3.0
CorelDRAW
CorelDRAW is a vector graphics editor developed and marketed by Corel Corporation of Ottawa, Canada. It is also the name of Corel's Graphics Suite...

 (and subsequent versions).

In 1996, OCR CuneiForm'96 was released, which was the first OCR package to include the adaptive recognition method of character recognition. This method is based on a combination of two types of printed characters recognition algorithms: multifont and omnifont. This self-learning system is capable of recognizing poorly printed symbols by creating an internal font generated by those symbols which were printed well enough to be recognized. Thus dynamic adjustment (adaptation) for specific input characters is used.

In June, 2008 Cognitive Technologies launched a free on-line recognition service on OpenOCR.org.

Opening sources

Cognitive Technologies has started a program to make OCR
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 available for all users. Its first step was releasing CuneiForm as freeware
Freeware
Freeware is computer software that is available for use at no cost or for an optional fee, but usually with one or more restricted usage rights. Freeware is in contrast to commercial software, which is typically sold for profit, but might be distributed for a business or commercial purpose in the...

.

Cognitive Technologies plans to start developing a new version of the software as an investor and coordinator of the project. Developers decided on the BSD license
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

 for the release to take into account all legal and technical nuances, but the whole program or its separate modules may be released later licensed under the GPL
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

.

In September 2008, part of Cuneiform was released as open source software. One of the missing parts is table analysis, However, Cognitive has promised to release this component in the future.

Cuneiform is being ported to Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

, BSD and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

. This branch of code will finally be merged with Cognitive codebase.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK