Natural language processing toolkits
Encyclopedia
The following natural language processing toolkits are popular collections of natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 software. They are suites of libraries
Library (computer science)
In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....

, frameworks
Software framework
In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by user code, thus providing application specific software...

, and applications for symbolic, statistical natural language and speech processing. NLP tools usually perform sentence detection, tokenization
Tokenization
Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining...

, POS-tagging, text chunking
Shallow parsing
Shallow parsing is an analysis of a sentence which identifies the constituents , but does not specify their internal structure, nor their role in the main sentence....

, lemmatisation
Lemmatisation
Lemmatisation in linguistics, is the process of grouping together the different inflected forms of a word so they can be analysed as a single item....

, coreference
Coreference
In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent."...

 analysis and resolution, and named-entity detection among others.
NameLanguageLicenseCreatorsWebsite
AlchemyAPI  C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, C#, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

, Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

 
Free or Commercial Orchestr8  http://www.alchemyapi.com/
Antelope framework  C#, VB.net
Visual Basic .NET
Visual Basic .NET , is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic , which is implemented on the .NET Framework...

 
Free for research Proxem  http://www.proxem.com/
Apertium
Apertium
Apertium is a rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License.-History:...

 
C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 
GPL  (various) http://wiki.apertium.org/
Cogito  Commercial Expert System S.p.A.
Expert System S.p.A.
Expert System is a software company, founded in Italy in 1989, pioneer in developing and marketing semantic technologies to understand and manage unstructured information. Expert System's semantic approach, thanks to its capability of natural language processing, enables a rapid and complete...

 
http://www.expertsystem.net/page.asp?id=1521
Carabao Language Kit  Any COM+ compliant language. Customization is via data entry Commercial with free development tools Digital Sonata Pty Ltd  http://www.digitalsonata.com/default.aspx
DELPH-IN  LISP
Lisp
A lisp is a speech impediment, historically also known as sigmatism. Stereotypically, people with a lisp are unable to pronounce sibilants , and replace them with interdentals , though there are actually several kinds of lisp...

, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 
LGPL, MIT, ... Deep Linguistic Processing with HPSG Initiative http://www.delph-in.net/
Distinguo
Distinguo
Distinguo is a proprietary software application for Semantic search based on description logic that enables users to search for meaning instead of just keywords. This API permits developers to integrate into their applications a tool to parse natural language , and then measure the semantic...

 
C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 
Commercial Ultralingua Inc.  http://ultralingua.com/en/semantic-search.htm
Ellogon C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 / C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

LGPL Georgios Petasis http://www.ellogon.org/
FreeLing
Freeling
Freeling may refer to:* Major-General Sir Arthur Henry Freeling, Surveyor-General of South Australia from 1849-1861**Freeling, South Australia, a small town, named for Arthur Freeling...

C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 
GPL Universitat Politècnica de Catalunya http://nlp.lsi.upc.edu/freeling/
General Architecture for Text Engineering
General Architecture for Text Engineering
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including...

 
Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

LGPL  GATE open source community http://gate.ac.uk/
Graph Expression  Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

 
Startup huti.ru http://code.google.com/p/graph-expression/
Learning Based Java
Learning Based Java
Learning Based Java is a special-purpose programming language based on Java and it is geared toward machine learning and natural language processing . It was developed at the Cognitive Computation Group of the University of Illinois at Urbana Champaign...

 
Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 
BSD  Cognitive Computation Group at the University of Illinois  http://cogcomp.cs.illinois.edu/page/software_view/11
LingPipe Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

royalty free or commercial Alias-i http://alias-i.com/lingpipe/index.html
LinguaStream
LinguaStream
LinguaStream is a generic platform for Natural Language Processing , based on incremental enrichment of electronic documents. LinguaStream is developed at the computer science research group since 2001...

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 
Free for research University of Caen, France
France
The French Republic , The French Republic , The French Republic , (commonly known as France , is a unitary semi-presidential republic in Western Europe with several overseas territories and islands located on other continents and in the Indian, Pacific, and Atlantic oceans. Metropolitan France...

 
http://www.linguastream.org/
Mallet
Mallet (software project)
MALLET is a Java "MAchine Learning for Language Toolkit".-Description:MALLET is an integrated collection of Java code useful for statistical natural language processing, document classification, cluster analysis, information extraction, topic modeling and other machine learning applications to...

 
Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 
Common Public License
Common Public License
In computing, the CPL is a free software / open-source software license published by IBM. The Free Software Foundation and Open Source Initiative have approved the license terms of the CPL....

 
University of Massachusetts Amherst
University of Massachusetts Amherst
The University of Massachusetts Amherst is a public research and land-grant university in Amherst, Massachusetts, United States and the flagship of the University of Massachusetts system...

 
http://mallet.cs.umass.edu/
MII nlp toolkit Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 
LGPL UCLA Medical Imaging Informatics (MII) Group http://www.mii.ucla.edu/nlp/
Modular Audio Recognition Framework
Modular Audio Recognition Framework
Modular Audio Recognition Framework is an open-source research platform and a collection of voice, sound, speech, text and natural language processing algorithms written in Java and arranged into a modular and extensible framework that attempts to facilitate addition of new algorithms. MARF may...

 
Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 
BSD  The MARF Research and Development Group, Concordia University
Concordia University
Concordia University is a comprehensive Canadian public university located in Montreal, Quebec, one of the two universities in the city where English is the primary language of instruction...

 
http://marf.sf.net
MontyLingua
MontyLingua
MontyLingua is a popular natural language processing toolkit. It is a suite of libraries and programs for symbolic and statistical natural language processing for both the Python and Java programming languages. It is enriched with common sense knowledge about the everyday world from Open Mind...

 
Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Free for research MIT  http://web.media.mit.edu/~hugo/montylingua/
Natural Language Toolkit
Natural Language Toolkit
Natural Language Toolkit or, more commonly, NLTK is a suite of libraries and programs for symbolic and statistical natural language processing for the Python programming language. NLTK includes graphical demonstrations and sample data...

 (NLTK)
Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 
Apache 2.0
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

 
http://www.nltk.org/Home
NooJ
NooJ
NooJ is a development environment used to construct large-coverage, formalized descriptions of natural languages and to apply them to large corpora in real time.-Author:...

 (based on INTEX)
.NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

-based
Free for research University of Franche-Comté
University of Franche-Comté
The University of Franche-Comté is a French university in the Academy of Besançon with five campuses: Besançon , Belfort , Montbéliard , Vesoul , and Lons-le-Saunier ....

, France
France
The French Republic , The French Republic , The French Republic , (commonly known as France , is a unitary semi-presidential republic in Western Europe with several overseas territories and islands located on other continents and in the Indian, Pacific, and Atlantic oceans. Metropolitan France...

http://www.nooj4nlp.net/
OpenNLP
OpenNLP
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks...

 
Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Apache License 2.0
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

Online community http://incubator.apache.org/opennlp/index.html
Rosette C, C++, Java, .NET Commercial Basis Technology  http://rosette.basistech.com
ScalaNLP  Scala  Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

 
David Hall and Daniel Ramage http://www.scalanlp.org/
Stanford NLP  Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

GPL The Stanford Natural Language Processing Group  http://nlp.stanford.edu/software/index.shtml
Rasp C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

LGPL University of Cambridge
University of Cambridge
The University of Cambridge is a public research university located in Cambridge, United Kingdom. It is the second-oldest university in both the United Kingdom and the English-speaking world , and the seventh-oldest globally...

, University of Sussex
University of Sussex
The University of Sussex is an English public research university situated next to the East Sussex village of Falmer, within the city of Brighton and Hove. The University received its Royal Charter in August 1961....

 
http://www.informatics.susx.ac.uk/research/groups/nlp/rasp/index.html
Natural Javascript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

, NodeJs
GPL Chris Umbel https://github.com/NaturalNode/natural
Text Engineering Software Laboratory (Tesla)  Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Eclipse Public License
Eclipse Public License
The Eclipse Public License is an open source software license used by the Eclipse Foundation for its software. It replaces the Common Public License and removes certain terms relating to litigations related to patents....

University of Cologne
University of Cologne
The University of Cologne is one of the oldest universities in Europe and, with over 44,000 students, one of the largest universities in Germany. The university is part of the Deutsche Forschungsgemeinschaft, an association of Germany's leading research universities...

 
http://tesla.spinfo.uni-koeln.de/index.html
Thinktelligence Delegator  Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Commercial
Commercial software
Commercial software, or less commonly, payware, is computer software that is produced for sale or that serves commercial purposes.Commercial software is most often proprietary software, but free software packages may also be commercial software....

Thinktelligence Corporation  http://www.thinktelligence.com
UIMA
Uima
UIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics....

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 / C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 
Apache 2.0
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

 
Apache
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

 
http://incubator.apache.org/uima/index.html
WebLab-project Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

LGPL  OW2  http://weblab-project.org/
UniteX Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 & C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

LGPL  Laboratoire d'Automatique Documentaire et Linguistique  http://www-igm.univ-mlv.fr/~unitex/
The Dragon Toolkit Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

GPL  Drexel University
Drexel University
Drexel University is a private research university with the main campus located in Philadelphia, Pennsylvania, USA. It was founded in 1891 by Anthony J. Drexel, a noted financier and philanthropist. Drexel offers 70 full-time undergraduate programs and accelerated degrees...

 
http://dragon.ischool.drexel.edu/
Factorie Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

 
University of Massachusetts Amherst
University of Massachusetts Amherst
The University of Massachusetts Amherst is a public research and land-grant university in Amherst, Massachusetts, United States and the flagship of the University of Massachusetts system...

 
http://code.google.com/p/factorie/
Silpa Indic Language Processing Toolkit Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

AGPL  Silpa opensource community developers http://smc.org.in/silpa

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK