MontyLingua
Encyclopedia
MontyLingua is a popular natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 toolkit. It is a suite of libraries
Library (computer science)
In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....

 and programs for symbolic and statistical natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 (NLP) for both the Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 programming languages. It is enriched with common sense
Common sense
Common sense is defined by Merriam-Webster as, "sound and prudent judgment based on a simple perception of the situation or facts." Thus, "common sense" equates to the knowledge and experience which most people already have, or which the person using the term believes that they do or should have...

 knowledge about the everyday world from Open Mind Common Sense
Open Mind Common Sense
Open Mind Common Sense is an artificial intelligence project based at the Massachusetts Institute of Technology Media Lab whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web.Since its founding in 1999, it has...

. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. It does not require training. It was written by Hugo Liu at MIT in 2003.

Because it is enriched with common sense
Common sense
Common sense is defined by Merriam-Webster as, "sound and prudent judgment based on a simple perception of the situation or facts." Thus, "common sense" equates to the knowledge and experience which most people already have, or which the person using the term believes that they do or should have...

 knowledge it may escape many mistakes. e.g.:
  • "(NX the/DT mosquito/NN bit/NN NX) (NX the/DT boy/NN NX)"


vs.
  • "(NX the/DT mosquito/NN NX) (VX bit/VBD VX) (NX the/DT boy/NN NX)"


Non-commercial use is free. If it is your intent to use this software for non-commercial,
non-proprietary purposes, such as for academic research purposes, this software is free and is covered under the GNU GPL License. However it has been forked into full gpl.

Abilities

  • MontyTokenizer: normalizes punctuation, spacing and contractions, with sensitivity to abbrevs.
  • MontyTagger: Part-of-speech tagging
    Part-of-speech tagging
    In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...

    using the Penn Treebank tagset, enriched with "Common Sense" from the Open Mind Common Sense project. Exceeds accuracy of Brill94 tbl tagger using default training files
  • MontyREChunker: chunks tagged text into verb, noun, and adjective chunks (VX,NX, and AX respectively)
  • MontyExtractor: extracts verb-argument structures, phrases, and other semantically valuable information from sentences and returns sentences as "digests"
  • MontyLemmatiser: part-of-speech sensitive lemmatisation. Strips plurals (geese-->goose) and tense (were-->be, had-->have). Includes regexps from Humphreys and Carroll's morph.lex, and UPENN's XTAG corpus
  • MontyNLGenerator: generates summaries, generates surface form sentences, determines and numbers NPs and tenses verbs, accounts for sentence_type
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK