Languageware
Encyclopedia
LanguageWare is a natural language processing
(NLP) technology developed by IBM
, that allows applications to process natural language text. It comprises a set of Java libraries which provide a range of NLP
functions: language identification, text segmentation/tokenization, normalization, entity and relationship extraction, and semantic analysis and disambiguation. The analysis engine uses Finite State Machine
approach at multiple levels, which aids its performance characteristics, while maintaining a reasonably small footprint.
The behaviour of the system is driven by a set of configurable lexico-semantic resources which describe the characteristics and domain of the processed language. A default set of resources comes as part of LanguageWare and these describe the native language characteristics, such as morphology, and the basic vocabulary for the language. Supplemental resources have been created which capture additional vocabularies, terminologies, rules and grammars, which may be generic to the language or specific to one or more domains.
A set of Eclipse
-based customization tooling, LanguageWare Resource Workbench, is available on IBM's alphaWorks site, and allows domain knowledge to be compiled into these resources and thereby incorporated into the analysis process.
LanguageWare can be deployed as a set of UIMA
-compliant annotators, Eclipse plug-ins or Web Services.
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
(NLP) technology developed by IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
, that allows applications to process natural language text. It comprises a set of Java libraries which provide a range of NLP
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
functions: language identification, text segmentation/tokenization, normalization, entity and relationship extraction, and semantic analysis and disambiguation. The analysis engine uses Finite State Machine
Finite state machine
A finite-state machine or finite-state automaton , or simply a state machine, is a mathematical model used to design computer programs and digital logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states...
approach at multiple levels, which aids its performance characteristics, while maintaining a reasonably small footprint.
The behaviour of the system is driven by a set of configurable lexico-semantic resources which describe the characteristics and domain of the processed language. A default set of resources comes as part of LanguageWare and these describe the native language characteristics, such as morphology, and the basic vocabulary for the language. Supplemental resources have been created which capture additional vocabularies, terminologies, rules and grammars, which may be generic to the language or specific to one or more domains.
A set of Eclipse
Eclipse (software)
Eclipse is a multi-language software development environment comprising an integrated development environment and an extensible plug-in system...
-based customization tooling, LanguageWare Resource Workbench, is available on IBM's alphaWorks site, and allows domain knowledge to be compiled into these resources and thereby incorporated into the analysis process.
LanguageWare can be deployed as a set of UIMA
Uima
UIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics....
-compliant annotators, Eclipse plug-ins or Web Services.
See also
- UIMAUimaUIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics....
- LinguisticsLinguisticsLinguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
- SemanticsSemanticsSemantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
- Semantic WebSemantic WebThe Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
- Web services
- Service-oriented architectureService-oriented architectureIn software engineering, a Service-Oriented Architecture is a set of principles and methodologies for designing and developing software in the form of interoperable services. These services are well-defined business functionalities that are built as software components that can be reused for...
- Formal languageFormal languageA formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...
- Finite state machineFinite state machineA finite-state machine or finite-state automaton , or simply a state machine, is a mathematical model used to design computer programs and digital logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states...
- IBM Omnifind
- Data Discovery and Query BuilderData Discovery and Query BuilderData Discovery and Query Builder is a data abstraction technology, developed by IBM, that allows users to retrieve information from a data warehouse, in terms of the user's specific area of expertise instead of SQL....
External links
- IBM LanguageWare Resource Workbench on alphaWorks
- IBM LanguageWare Miner for Multidimensional Socio-Semantic Networks on alphaWorks
- JumpStart Infocenter for IBM LanguageWare on IBM.com
- UIMA Homepage at the Apache Software Foundation
- UIMA Framework on SourceForge
- IBM OmniFind Yahoo! Edition (FREE enterprise search engine)
- Semantic Information Systems and Language Engineering Group
- SemanticDesktop.org
Related Papers
- Branimir K. Boguraev Annotation-Based Finite State Processing in a Large-Scale NLP Architecture, IBM Research Report, 2004
- Alexander Troussov, Mikhail Sogrin, "IBM LanguageWare Ontological Network Miner"
- Sheila Kinsella, Andreas Harth, Alexander Troussov, Mikhail Sogrin, John Judge, Conor Hayes, John G. Breslin, "Navigating and Annotating Semantically-Enabled Networks of People and Associated Objects"
- Mikhail Kotelnikov, Alexander Polonsky, Malte Kiesel, Max Völkel, Heiko Haller, Mikhail Sogrin, Pär Lannerö, Brian Davis, "Interactive Semantic Wikis"
- Sebastian Trüg, Jos van den Oever, Stéphane Laurière, "The Social Semantic Desktop: Nepomuk"
- Séamus Lawless, Vincent Wade, "Dynamic Content Discovery, Harvesting and Delivery"
- R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, and L. V. Subramaniam, "Text analytics for life science using the Unstructured Information Management Architecture"
- Alex Nevidomsky, "UIMA Framework and Knowledge Discovery at IBM", 4th Text Mining Symposium, Fraunhofer SCAI, 2006