Semantic gap
Encyclopedia
The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. In computer science, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation .

More precisely the gap means the difference between ambiguous formulation of contextual knowledge in a powerful language (e.g. natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

) and its sound, reproducible and computational representation in a formal language
Formal language
A formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...

 (e.g. programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

). Semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 of an object depends on the context it is regarded within. For practical application this means any formal representation of real world tasks requires the translation of the contextual expert knowledge of an application (high-level) into the elementary and reproducible operations of a computing machine (low-level). Since natural language allows the expression of tasks which are impossible to compute in a formal language there are no means to automatize this translation in a general way. Moreover the examination of languages within the Chomsky hierarchy
Chomsky hierarchy
Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars....

 indicates that there is no formal and consequently automated way of translating from one language into another above a certain level of expressional power.

Theoretical background

The yet unproven but commonly accepted Church-Turing thesis states that a turing machine
Turing machine
A Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...

 and all equivalent formal languages such as the lambda calculus
Lambda calculus
In mathematical logic and computer science, lambda calculus, also written as λ-calculus, is a formal system for function definition, function application and recursion. The portion of lambda calculus relevant to computation is now called the untyped lambda calculus...

 perform and represent all formal operations respectively as applied by a computing human. However the selection of adequate operations for the correct computation itself is not formally deducible, moreover it depends on the computability of the underlying problem. Tasks, such as the halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

, may be formulated comprehensively in natural language, but the computational representation will not terminate or does not provide a usable result, which is proven by Rice's theorem
Rice's theorem
In computability theory, Rice's theorem states that, for any non-trivial property of partial functions, there is no general and effective method to decide whether an algorithm computes a partial function with that property...

. The general expression of limitations for rule based deduction by Gödel's incompleteness theorem indicates that the semantic gap is never to be fully closed. These are general statements, considering the generalized limits of computation on the highest level of abstraction where the semantic gap manifests itself. There are however lots of subsets of problems which may be translated automatically, especially in the higher numbered levels of the Chomsky hierarchy
Chomsky hierarchy
Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars....

.

Formal languages

Real world tasks are formalized by programming languages, which are executed on computers based on the von Neumann architecture
Von Neumann architecture
The term Von Neumann architecture, aka the Von Neumann model, derives from a computer architecture proposal by the mathematician and early computer scientist John von Neumann and others, dated June 30, 1945, entitled First Draft of a Report on the EDVAC...

. Since programming languages are only comfortable representations of the Turing machine any program on a von Neumann computer has the same properties and limitations as the Turing machine or its equivalent representation. Consequently every programming language such as CPU level machine code, assembler, or any high level programming language has the same expressional power as the underlying Turing machine is able to compute. There is no semantic gap between them since a program is transferred from the high level language to the machine code by a program, e.g. a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 which itself runs on a Turing machine without any user interaction. The semantic gap actually opens between the selection of the rules and the representation of the task.

Practical Consequences

Selection of rules for formal representations of real world applications, corresponds to writing a program. Writing programs is independent from the actual programming language and basically requires the translation of the domain specific knowledge of the user into the formal rules operating a turing machine. It is this transfer from contextual knowledge into formal representation which cannot be automatized with respect to the theoretical limitations of computation. Consequently any mapping from real world applications into computer applications requires a certain amount of technical background knowledge by the user, where the semantic gap manifests itself.

It is a fundamental task of software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

 to close the gap between application specific knowledge and technically doable formalization. For this purpose domain specific (high-level) knowledge must be transferred into an algorithm and its parameters (low-level). This requires the dialogue between user and developer. Aim is always a software which allows the user to represent his knowledge as parameters of an algorithm without knowing the details of the implementation, and to interpret the outcome of the algorithm without the aid of the developer. For this purpose user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...

s play the key role in software design, while developers are supported by frameworks
Software framework
In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by user code, thus providing application specific software...

 which help organizing the integration of contextual information.

Document Retrieval

A simple example can be formulated as a series of increasingly difficult natural language
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 queries to locate a target document that may or may not exist locally on a known computer system.

Example queries:
  • 1) Locate any file in the known directory "/usr/local/funny".
  • 2) Locate any file where the word "funny" appears in the filename.
  • 3) Locate any text file
    Text file
    A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...

     where the word "funny" or the substring "humor" appears in the text.
  • 4) Locate any mp3
    MP3
    MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...

     file where either "funny", "comic" or "humor" appears in the metadata.
  • 5) Locate any file of any type related to humor.
  • 6) Locate any image that is likely to make my grandmother laugh.


The progressive difficulty of these queries is represented by the increasing degree of abstraction from the types and semantics defined the system architecture (directories and files on a known computer) to the types and semantics that occupy the realm of ordinary human discourse (subjects such as "humor" and entities such as "my grandmother"). Moreover, this disparity of realms is further complicated by leaky abstraction
Leaky abstraction
A leaky abstraction is any implemented abstraction, intended to reduce complexity, where the underlying details are not completely hidden. The term is most frequently used to call attention to a flaw in a software or hardware abstraction.-History:...

s, such as is common in the case of query 4), where the target document may exist, but may not encapsulate the "metadata" in a manner expected by the user, nor the designer of the query processing system.

Image Analysis

Image analysis is a typical domain for which a high degree of abstraction from low-level Methods is required, and where the semantic gap immediately affects the user. If image content is to be identified to understand the meaning of an image, the only available independent information is the low-level pixel data. Textual annotations always depend on the knowledge, capability of expression and specific language of the annotator and therefore is unreliable. To recognize the displayed scenes from the raw data of an image the algorithms for selection and manipulation of pixels must be combined and parameterized in an adequate manner and finally linked with the natural description. Even the simple linguistic representation of shape or color such as round or yellow requires entirely different mathematical formalization methods, which are neither intuitive nor unique and sound.

Layered Systems

In many layered system
Layered system
In telecommunication, a layered system is a system in which components are grouped, i.e., layered, in a hierarchical arrangement, such that lower layers provide functions and services that support the functions and services of higher layers....

s, some conflicts when concepts at a high level of abstraction need to be translated into lower, more concrete artifacts. This mismatch is often called semantic gap.

Databases

OODBMSs (object-oriented database management system) advocates sometimes claim that these databases help to reduce the semantic gap between the application domain (miniworld) and the traditional RDBMS systems. http://www.findarticles.com/p/articles/mi_m0ISJ/is_n2_v33/ai_15519487/pg_4. However Relational proponents would posit the exact opposite, because by definition object databases fix the data being recorded into a single binding abstraction.

See also

  • Leaky abstraction
    Leaky abstraction
    A leaky abstraction is any implemented abstraction, intended to reduce complexity, where the underlying details are not completely hidden. The term is most frequently used to call attention to a flaw in a software or hardware abstraction.-History:...

  • Text simplification
    Text simplification
    Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains...

  • Semantic differential
    Semantic differential
    Semantic differential is a type of a rating scale designed to measure the connotative meaning of objects, events, and concepts. The connotations are used to derive the attitude towards the given object, event or concept.-Semantic differential:...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK