Uima
Encyclopedia
UIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics.

UIMA is a component software architecture for the development, discovery, composition, and deployment of multi-modal analytics for the analysis of unstructured information and its integration with search technologies
Search algorithm
In computer science, a search algorithm is an algorithm for finding an item with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots...

 developed by IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

. The source code for a reference implementation
Reference implementation
In the software development process, a reference implementation is the standard from which all other implementations, with their attendant customizations, are measured, and to which all improvements are added...

 of this framework has been made available on SourceForge
SourceForge
SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...

, and later on the website of the Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

.

An example is a logistics analysis software system
Software system
A software system is a system based on software forming part of a computer system . The term "software system" is often used as a synonym of computer program or software; is related to the application of systems theory approaches in software engineering context and are used to study large and...

 that could convert unstructured data such as repair logs and service notes into relational
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

 tables. These table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

s can then be used by automated tools to detect maintenance or manufacturing problems.

Other examples are systems that are used in medical environments to analyze clinical
Medicine
Medicine is the science and art of healing. It encompasses a variety of health care practices evolved to maintain and restore health by the prevention and treatment of illness....

 notes.

Structure of UIMA

The UIMA architecture can be thought of in four dimensions:
  1. It specifies component interfaces in an analytics pipeline
    Pipeline (software)
    In software engineering, a pipeline consists of a chain of processing elements , arranged so that the output of each element is the input of the next. Usually some amount of buffering is provided between consecutive elements...

  2. It describes a set of Design patterns
    Design pattern (computer science)
    In software engineering, a design pattern is a general reusable solution to a commonly occurring problem within a given context in software design. A design pattern is not a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that...

  3. It suggests two data representations: an in-memory representation of annotations for high-performance analytics and an XML
    XML
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

     representation of annotations for integration with remote web services.
  4. It suggests development roles allowing tools to be used by users with diverse skills

IBM Watson - The Jeopardy Challenge

In February 2011 a computer from IBM Research
IBM Research
IBM Research, a division of IBM, is a research and advanced development organization and currently consists of eight locations throughout the world and hundreds of projects....

 named Watson won a competition on Jeopardy!
Jeopardy!
Griffin's first conception of the game used a board comprising ten categories with ten clues each, but after finding that this board could not be shown on camera easily, he reduced it to two rounds of thirty clues each, with five clues in each of six categories...

against Jeopardy star Ken Jennings
Ken Jennings
Kenneth Wayne "Ken" Jennings III is an American game show contestant and author. Jennings is noted for holding the record for the longest winning streak on the U.S. syndicated game show Jeopardy! and as being the all-time leading money winner on American game shows...

 and undefeated Jeopardy champion Brad Rutter
Brad Rutter
Bradford Gates "Brad" Rutter is the biggest all-time money winner on the U.S. syndicated game show Jeopardy! and the second biggest all-time money winner on a game show....

. Watson is a highly advanced computer from IBM Research that uses UIMA for real-time content analytics.

See also

  • Data Discovery and Query Builder
    Data Discovery and Query Builder
    Data Discovery and Query Builder is a data abstraction technology, developed by IBM, that allows users to retrieve information from a data warehouse, in terms of the user's specific area of expertise instead of SQL....

  • Entity extraction
  • General Architecture for Text Engineering
    General Architecture for Text Engineering
    General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including...

     (GATE)
  • IBM Omnifind
  • Languageware
    Languageware
    LanguageWare is a natural language processing technology developed by IBM, that allows applications to process natural language text. It comprises a set of Java libraries which provide a range of NLP functions: language identification, text segmentation/tokenization, normalization, entity and...

  • OpenNLP
    OpenNLP
    The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks...

  • OpenPipeline
    OpenPipeline
    OpenPipeline is open source software for crawling, parsing, analyzing and routing documents. It is intended to tie together otherwise incomplete solutions for enterprise search and document processing...

  • Darmstadt Knowledge Processing Software Repository (DKPro)

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK