Gellish - AbsoluteAstronomy.com

Gellish is a controlled natural language

Controlled natural language

Controlled natural languages are subsets of natural languages, obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate ambiguity and complexity.Traditionally, controlled languages fall into two major types:...

, also called a formal language

Formal language

A formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...

, in which information and knowledge can be expressed in such a way that it is computer-interpretable, as well as system-independent. Gellish is a structured subset of natural language that is suitable for information modelling and knowledge representation

Knowledge representation

Knowledge representation is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge...

and as a successor of electronic data interchange

Electronic Data Interchange

Electronic data interchange is the structured transmission of data between organizations by electronic means. It is used to transfer electronic documents or business data from one computer system to another computer system, i.e...

. From a data modeling

Data modeling

Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...

perspective, it is a generic conceptual data model that also includes domain-specific knowledge and semantics. Therefore, it can also be called a semantic data model

Semantic data model

A semantic data model in software engineering has various meanings:# It is a conceptual data model in which semantic information is included. This means that the model describes the meaning of its instances...

. The accompanying Gellish modelling method thus belongs to the family of semantic modelling methods.

Etymologically speaking, "Gellish" is originally derived from "Generic Engineering Language." However, it is further developed into a language that is also applicable outside the engineering discipline" (Van Renssen 2006).

Overview

Gellish is intended for the expression of complete and unambiguous specification of products, facilities and processes; for information about their purchasing, fabrication, installation, operation and maintenance; and for the exchange of such information between systems, although in a system-independent and computer-interpretable way. It is also intended for the expression of knowledge and requirements about such things.

The definition of Gellish English

Gellish English

Gellish English is a variant of Gellish and is a formal language, which means that it is structured and formalised subset of natural English that is computer interpretable. Its definition includes an English dictionary of concepts that is arranged in a taxonomy and that is extended into an ontology...

is provided in the Gellish English Dictionary-Taxonomy

Gellish English dictionary

The Gellish English Dictionary-Taxonomy is an example of an open-source “smart” electronic dictionary, which concepts are arranged in a subtype-supertype hierarchy, thus forming a taxonomy. The dictionary-taxonomy is a machine readable...

, which is a large 'smart dictionary

Semantic lexicon

A semantic lexicon is a dictionary of words labeled with semantic classes so associations can be drawn between words that have not previously been encountered: it is a dictionary with a semantic network.-List of semantic lexicons:*WordNet*EuroWordNet...

' of concepts with relations between those concepts (earlier called STEPlib). The Dictionary-Taxonomy is called a 'smart dictionary', because the concepts are arranged in a subtype

Subtype

In programming language theory, subtyping or subtype polymorphism is a form of type polymorphism in which a subtype is a datatype that is related to another datatype by some notion of substitutability, meaning that program constructs, typically subroutines or functions, written to operate on...

-supertype hierarchy, making it a taxonomy

Taxonomy

Taxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...

that supports inheritance

Inheritance (computer science)

In object-oriented programming , inheritance is a way to reuse code of existing objects, establish a subtype from an existing object, or both, depending upon programming language support...

of properties from supertype concepts to subtype concepts. Furthermore, because together with other relations between the concepts, the smart dictionary is extended into an ontology

Ontology (computer science)

In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...

. Gellish has basically an extended object-relation-object structure to express facts by relations, whereas each fact may be accompanied by a number of auxiliary facts about the main fact. Examples of auxiliary facts are author, dates, status, etc. To enable an unambiguous interpretation Gellish includes the definition of a large number (more than 650) of standard relation types that determine the rich semantic expression capability of the language.

In principle, for every natural language there is a Gellish variant that is specific for that language. For example, Gellish Dutch (Gellish Nederlands), Gellish Italian, Gellish English

Gellish English

, Gellish Russian, etc. Gellish does not invent its own terminology, such as Esperanto

Esperanto

is the most widely spoken constructed international auxiliary language. Its name derives from Doktoro Esperanto , the pseudonym under which L. L. Zamenhof published the first book detailing Esperanto, the Unua Libro, in 1887...

, but uses the terms from natural languages. Thus, the Gellish English dictionary-taxonomy is like an (electronic) ordinary dictionary that is extended with additional concepts and with relations between the concepts.

For example, the Gellish dictionary-taxonomy contains definitions of many concepts that also appear in ordinary dictionaries, such as kinds of physical objects like building, airplane, car, pump, pipe, properties such as mass and color, scales such as kg and bar, as well as activities and processes, such as repairing and heating, etc. In addition to that the dictionary contains concepts with composed names, such as 'hairpin heat exchanger', which will not appear in ordinary dictionaries. The main difference with ordinary dictionaries is that the Gellish dictionary includes also definitions of standard kinds of relations (relation types), which are denoted by standard Gellish English phrases. For example, it defines relation types such as , , , , , , etc. Such standard relation types and concept definitions enable a Gellish-powered software to correctly and unambiguously interpret Gellish expressions.

Gellish expressions may be expressed in any suitable format, such as SQL or RDF or OWL or even in the form of spreadsheet tables, provided that their content is equivalent to the tabular form of Gellish Naming Tables (which define the vocabulary) and Fact Tables (together defining a Gellish Database

Gellish database

-Universal data structure:Gellish databases are semantic databases that all have the same universally applicable data structure. That data structure is suitable to contain any fact that is expressed in a Gellish language variant, such as Gellish English or Gellish Dutch...

content) or equivalent to Gellish Message Tables (for data exchange). An example of the core of a Message Table is the following:

Left-hand term	Relation type	Right-hand term
centrifugal pump	is a subtype of	pump
P-123	is classified as a	centrifugal pump
P-123	has as aspect	the mass of P-123
the mass of P-123	is classified as a	mass
the mass of P-123	is qualified as	50 kg

A full Gellish Message Table requires additional columns for unique identifiers, the intention of the expression, the language of the expression, cardinalities, unit of measure, the validity context, status, creation date, author, references, and various other columns. Gellish Light only requires the three above columns, but then it does not support, for example, capabilities to distinguish homonyms; automated translation; and version management, etc. Those capabilities and several others are supported by Full Gellish. The following example illustrates the use of some additional columns in a Gellish Message Table.

Fact UID	Intention	Left UID	Left term	Relat. UID	Relation type	Right UID	Right term	UID of UoM	UoM	Status
201	statement	130058	centrifugal pump	1146	is a subtype of	130206	pump			accepted
202	statement	102	P-123	1225	is classified as a	130058	centrifugal pump			proposed
203	statement	102	P-123	1727	has as aspect	103	mass of P-123			proposed
204	statement	103	mass of P-123	1225	is classified as a	550020	mass			proposed
205	statement	103	mass of P-123	5020	is qualified as	920303	50	570039	kg	proposed

The collection of standard relation types define the kinds of facts that can be expressed in Gellish, although anybody can create his own proprietary extension of the dictionary and thus can add concepts and relation types as and when required.

A knowledge base

Knowledge base

A knowledge base is a special kind of database for knowledge management. A Knowledge Base provides a means for information to be collected, organised, shared, searched and utilised.-Types:...

with basic engineering knowledge is integrated with the Gellish Dictionary. That knowledge is itself expressed in Gellish. Typically the Gellish dictionary is used to select classes for classification

Categorization

Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge...

or as standard terminology (metadata

Metadata

The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

) or to harmonize data in various computer systems or as a tool in a search engine

Search engine

A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

.

Gellish enables automatic translation, and enables the use of synonyms, abbreviations and codes as well as homonyms, due to the use of a unique natural language independent identifier (UID) for every concept. For example, 130206 (pump) and 1225 (is classified as a). This makes certain that the concepts are identified in a natural language independent way. Therefore, various Gellish Dictionaries use the same UID's for the same concept. This means that those dictionaries provide translations of the names of the objects, as well as a translation of the standard relation types. The UID's enable that information and knowledge that is expressed in one language variant of Gellish can be automatically translated and presented by Gellish-powered software in any other language variant for which a Gellish dictionary is available. For example, the phrase and the phrase are denotations of the same UID 1225.

For example, a computer can automatically express the second line in the above example in German as follows:

Left-hand term	Relation type	Right-hand term
P-123	ist klassifiziert als ein	Zentrifugalpumpe

Questions (queries) can be expressed as well. Queries are facilitated through standardized terms such as what, which, where and when. They can be used in combination with reserved UID's for unknowns in the range 1-100. This enables Gellish expressions for queries, such as:

- query: what Paris
Gellish-powered software should be able to provide the correct answer to this query by comparing the expression with the facts in the database, and should respond with:
- answer: The Eiffel Tower
Note that the automatic translation capability implies that a query/question that is expressed in a particular language, say English, can be used to search in a Gellish database in another language (say Chinese), whereas the answer can be presented in English!

Information models in Gellish

Information model

An information model in software engineering is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse...

s can be distinguished in two main categories:

Models about individual things. These models may be about individual physical objects as well as about activities, processes and events, or a combination of them. An information model about an individual physical object and possibly also about its operation and maintenance, such as a process plant, a ship, an airplane, an infrastructural facility or a typical design (e.g. of a car or of a component) is called as Facility Information Model
Facility information model
A facility information model is an information model of an individual facility that is integrated with data and documents about the facility. The facility can be any large facility that is designed, fabricated, constructed and installed, operated, maintained and modified; for example, a complete...

or a Product Model, whereas for a building it is called a Building Information Model (BIMs). These models about individual things are characterized by their composition hierarchy, which specify (all) their parts, and by the fact that the assemblies as well as the parts are classified by kinds or types of things.
Models about kinds of things. These models are expressed as collections of relations of particular kinds between kinds of things. They can be further subdivided in the following sub-categories:

- Knowledge models, which are collections of expressions of facts about what can be the case (modeled knowledge).

- Requirements models, which are collections of expressions of facts about what shall be the case in a particular validity context (modeled requirements). This may include modeled versions of the content of requirements documents, such as standard specifications and standard types of components (e.g. as in component and equipment catalogs)

- Definition models, each of which consists of a semantic frame. A definition model is a collection of expressions about what is by definition the case for all things of a kind. The Gellish electronic smart dictionary-taxonomy or ontology is an example of a collection of definition models.

- Models that are collections the include a combination of expressions of the above kinds.

All these categories of models can include drawings and other documents as well as 3D shape information (the core of 3D models). They all can be expressed and integrated in Gellish.

The classification relation between individual things and kinds of things makes the definitions, knowledge and requirements about kinds of things available for the individual things. Furthermore, the subtype-supertype hierarchy in a Gellish Dictionary-Taxonomy implies that the knowledge and requirements that are specified for a kind of thing are inherited by all their subtypes. As a consequence, when somebody designs an individual item and classifies it by a particular kind, then all the knowledge and requirements that are known for the supertypes of that kind will also be recognized and can be made available automatically.

Each category of information model requires its own semantics, because the expression of the individual fact that something real is the case requires other kinds of relations than the expression of the general fact that something can be the case, which again differs from a fact that expresses that something shall be the case in a particular context or that something is by definition always the case. These semantic differences cause that the various categories of information models require their own subsets of standard relation types.
Therefore Gellish makes a distinction between the following categories of relation types:

Relation types for relations between kinds of things (classes). They are intended for the expression of knowledge, requirements and definitions. The various sub-categories knowledge, requirements and definitions are modeled by using different kinds of relations: relation types for things that can be the case, things that shall be the case and things that are by definition the case. All three within applicable cardinality constraints. For example, the specialization relation on the first line in the example above is used for defining a concept (centrifugal pump). The relation types and are examples of kinds of relations that are used to specify knowledge and requirements respectively.
Relation types for relations between individual things. They are intended for the expression of information about individual things. For example the possession of an aspect relation on the third of the above lines.
Relation types for relations between individual things and kinds of things. They are intended for links between individual things and general concepts in the dictionary (or to private extensions of that dictionary). For example the classification and qualification relations above.
Relation types for relations between collections and for relations between a collection and an element in the collection or a common aspect of all elements.

Gellish databases and data exchange messages

Gellish is typically expressed in the form of Gellish Data Tables. There are three categories of Data Tables:

Naming Tables, which contain the vocabulary of the dictionary and the proprietary terms that are used in the expressions.
Fact Tables, which contain the expressions of facts in the form of relations between UID´s, together with a number of auxiliary facts.

A Gellish Database typically consists of one or more Naming Tables and one or more Fact Tables together. Data Tables and Fact Tables are one-to-one equivalent to Message Tables.

Message Tables, which combine the content of Naming Tables and Fact Tables in a single table. Message Tables are intended for the exchange of data between systems and parties. A Message Table is a single standard table for the expression of any facts, including the unique identifiers (UID's' for the facts, the relation types and the related objects, but also including their names (terms) and a number of auxiliary facts, all combined in one table. Multiple Message Tables on different locations can be combined to one distributed database.

All table columns are standardised, so that each Gellish data table of a category contains the same standard columns, or of a subset of the standard ones. This provides standard interfaces for exchange of data between application systems. The content of data tables may also include constraints and requirements (data model

Data model

A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

s) that specify the kind of data that should and may be provided for particular applications. Such requirements models make dedicated database designs superfluous. The Gellish Data Tables can be used as part of a central database or can form distributed databases, but a tables can also be exchanged in data exchange files or as body of Gellish Messages.

A Naming Table relates terms in a language and language community ('speech community') to a unique identifier. This enables the unambiguous use of synonyms, abbreviations and codes as well as homonyms in multiple languages. The following table is an example of a Naming Table:

UID of language	UID of language community	UID for term	Term (name)	Comment
910036	193263	130206	pump	English engineering term for concept 130206
910038	193263	130206	Pumpe	German
910037	193263	130206	pomp	Dutch

The inverse indicator is only relevant when phrases are used to denote relation types, because each standard relation type is denoted by at least one standard phrase as well as at least one standard inverse phrase. For example, the phrase has as inverse phrase . Both phrases denote the same kind of relation (a composition relation). However, when the inverse phrase is used to express a fact, then the left hand and right hand object in the expression should have an inverse position. Thus the following expressions will be recognized as two equally valid expressions of the same fact (with the same Fact UID):
- A B
- B A
So, the inverse indicator indicates for relation types whether as phrase is a base phrase (1) or and inverse phrase (2).

A Fact Table contains expressions of any facts, each of which is accompanied by a number of auxiliary facts that provide additional information relevant for the main facts. Examples of auxiliary facts are: the intention, status, author, creation date, etc.

A Gellish Fact Table consists of columns for the main fact and a number of columns for auxiliary facts. The auxiliary facts enable to specify things such as roles, cardinalities, validity contexts, units of measure, date of latest change, author, references, etcetera.:

The columns for the main fact in a Fact Table are:
- a UID of the fact that is expressed on this row in the table

- a UID of the intention with which the fact is communicated or stored (e.g. as a statement, a query, etc.)

- a UID of a left-hand object

- a UID of a relation type

- a UID of a right-hand object

- a UID of a unit of measure (optional)

- a string that forms a description (textual definition) of the left hand object.

These columns also appear in a Message Table as shown below.

A full Gellish Message table is in fact a combination of a Naming Table and a Fact Table. It contains not only columns for the expression of facts, but also columns for the names of the related objects and the additional columns to express auxiliary facts. This enables the use of a single table, also for the specification and use of synonyms and homonyms, multiple languages, etcetera.
The core of a Message Table is illustrated in the following table:

Language	UID of left-hand object	Name of left-hand object	UID of fact	UID of relation type	Name of relation type	UID of right-hand object	Name of right-hand object	Status
English	101	The Eiffel tower	201	5138	is located in	102	Paris	accepted
English	101	The Eiffel tower	202	1225	is classified as a	40903	tower	accepted
English	102	Paris	203	1225	is classified as a	700008	city	accepted

In the above example, the concepts with the names, as well as the (standard) relation types are selected with their UID's from the Gellish English Dictionary.

A Gellish Database table can be implemented in any tabular format. For example, it can be implemented as a SQL-based database or otherwise, as a STEPfile (according to ISO 10303

ISO 10303

ISO 10303 is an ISO standard for the computer-interpretable representation and exchange of product manufacturing information. Its official title is: Automation systems and integration — Product data representation and exchange...

-21), or as a simple spreadsheet table, as in Excel, such as the Gellish Dictionary itself.

Gellish database tables can also be described in an equivalent form using RDF

Resource Description Framework

The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

/Notation3 or XML

XML

Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

. A representation of “Gellish in XML” is defined in a standard XML Schema. An XML file with data according to that XML Schema is recommended to have as file extension GML, whereas GMZ stands for “Gellish in XML zipped”.

One of the differences between Gellish and RDF, XML or OWL is that Gellish English includes an extensive English Dictionary of concepts, including also a large (and extendable) set of standard relation types to make computer-interpretable expressions (in a form that is also readable for non-IT professionals). On the other hand, 'languages' such as RDF, XML and OWL only define a few basic concepts, which leaves much freedom for their users to define their own 'domain language' concepts.

This attractive freedom has the disadvantage that users of 'languages' such as RDF, XML or OWL still don't use a common language and still cannot integrate data that stem from different sources.
Gellish is designed to provide a real common language, at least to a much larger extent and therefore provides much more standardisation and commonality in terminology and expressions.

Gellish compared with OWL

OWL (Web Ontology Language

Web Ontology Language

The Web Ontology Language is a family of knowledge representation languages for authoring ontologies.The languages are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web...

/Ontological Web Language) and Gellish are both meant for use on the semantic web

Semantic Web

The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

. Gellish can be used in combination with OWL, or on its own. Nevertheless there are important differences between the two languages. The main differences are as follows:

1. Target audience and meta level

OWL is a metalanguage

Metalanguage

Broadly, any metalanguage is language or symbols used when language itself is being discussed or examined. In logic and linguistics, a metalanguage is a language used to make statements about statements in another language...

, including a basic grammar, but without a dictionary. OWL is meant to be used by computer system developers and ontology developers to create ontologies. Gellish is a language that includes a grammar as well as a dictionary-taxonomy and ontology. Gellish is meant to be used by computer system developers as well as by end-users and can also be used by ontology developers when they want to extent the Gellish ontology or build their own domain ontology. Gellish does not make a distinction between a meta-language and a user language; the concepts from both 'worlds' are integrated in one language. So, the Gellish English dictionary contains concepts that are equivalent to the OWL concepts, but also contains the concepts from an ordinary English dictionary.

2. Vocabularies and ontologies

OWL can be used to explicitly represent the meaning of terms in vocabularies

Vocabulary

A person's vocabulary is the set of words within a language that are familiar to that person. A vocabulary usually develops with age, and serves as a useful and fundamental tool for communication and acquiring knowledge...

and the relationships between those terms. In other words: it can be used for the definition of taxonomies

Taxonomy

or ontologies

Ontology (computer science)

. The terms in such a vocabulary do not become part of the OWL language. So OWL does not include definitions of the terms in a natural language, such as road, car, bolt or length. However, it can be used to define them and to build an ontology.

The upper ontology part of Gellish can also be used to define terms and the relations between them. However, many of such natural language terms are already defined in the lower part of the Gellish dictionary-taxonomy itself. So in Gellish, terms such as road, car, bolt or length are part of the Gellish language. Therefore, Gellish English is a subset of natural English.

3. Synonyms and multi-language capabilities

Gellish makes a distinction between concepts and the various terms that are used as names (synonyms, abbreviations and translations) to refer to those concepts in different contexts and languages. Every concept is identified by a unique identifier that is natural-language-independent and can have many different terms in different languages to denote the concept. This enables automatic translation between different natural language versions of Gellish.
In OWL the various terms in different languages and the synonyms are in principle different concepts that need to be declared to be the same by explicit equivalence relations. On one hand the OWL approach is a simpler concept, but it makes expressions ambiguous and makes data integration and automated translation significantly more complicated.

4. Upper ontology

OWL can be regarded as an upper ontology that consists of 54 'language constructs' (constructors or concepts).
The upper ontology part of Gellish currently consists of more than 1500 concepts of which about 650 are standard relation types. In addition to that the Gellish Dictionary-Taxonomy contains more than 40,000 concepts. This indicates the large semantic richness and expression capabilities of Gellish. Furthermore, Gellish contains definitions of many facts about the defined concepts that are expressed as relationships between those concepts.

5. Extensibility

OWL has a fixed set of concepts (terms) that are only extended when the OWL standard is extended. Gellish is extensible by any user, under Open Source

Open source

The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

conditions.

History

Gellish is a further development of ISO 10303-221 (AP221) and ISO 15926

ISO 15926

The ISO 15926 is titled: "Industrial automation systems and integration—Integration of life-cycle data for process plants including oil and gas production facilities" is a standard for data integration, sharing, exchange, and hand-over between computer systems.This title is regarded too...

. Gellish is an integration and extension of the concepts that are defined in both standards. The main difference with both ISO standards is that Gellish is easier to implement and has more (precise) semantic expression capabilities and is suitable to express queries and answers as well. The specific philosophy of spatio-temporal parts that is used in ISO 15926 to represent discrete time periods to represent time can also be used in Gellish, however the recommended representation of time in Gellish is the more intuitive method that specifies that facts have a specified validity duration. For example, each property can have multiple numeric values on a scale, which is expressed as multiple facts, whereas for each of those facts an (optional) specification can be added of the moment or time period during which that fact is valid.

A subset of the Gellish Dictionary (Taxonomy) is used to create ISO 15926-4. Gellish in RDF is being standardized as ISO 15926-11.

External links

Documentation project at SourceForge
SourceForge
SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...
An implementation of Gellish in 'The Brain' presents among others activities and Geographical objects and other parts of the Gellish smart Dictionary. The 'Brain' implementation also includes a multi-language knowledge base about wines.
A simple pseudo Gellish implementation
Gellish, an extensible ontological language

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.