Anchor Modeling
Encyclopedia
Anchor Modeling is an agile database modeling technique suited for information that change over time both in structure and content. It provides a graphical notation used for conceptual modeling similar to that of Entity-Relationship modeling
, with extensions for working with temporal data. The modeling technique is based around four modeling constructs: the anchor, attribute, tie and knot, each capturing different aspects of the domain being modeled.
The resulting models can be translated to physical database designs using formalized rules. When such a translation is done the tables in the relational database will mostly be in the sixth normal form
.
while avoiding its drawbacks. Advantages such as being able to non-destructively evolve the model, avoid null values, and keep the information free from redundancies are gained. Performance issues due to extra joins are largely avoided thanks to a feature in modern database engines called 'table elimination'. In order to handle changes in the information content Anchor Modeling emulates aspects of a temporal database
in the resulting relational database
schema.
The earliest installations using Anchor Modeling were made in Sweden
with the first dating back to 2004, when a data warehouse
for an insurance company was built using the technique. In 2007 the technique was being used in a few data warehouses and one OLTP system, and it was presented internationally by Lars Rönnbäck at the TDWI (The Data Warehousing Institute) conference in Amsterdam
. This stirred enough interest for the technique to warrant a more formal description. Since then research concerning Anchor Modeling is being done in a collaboration between the creators Olle Regardt and Lars Rönnbäck and a team at the Department of Computer and Systems Sciences, Stockholm University
. The first paper, in which Anchor Modeling is formalized, was presented at the 28th International Conference on Conceptual Modeling and won the best paper award.
The research can be followed at www.anchormodeling.com, where material on Anchor Modeling is made public and free to use under a Creative Commons
license. An online modeling tool is also available, which is free to use and Open Source
.
An example model showing the different graphical symbols for all the concepts can be seen below. The symbols resemble those used in Entity-Relationship modeling
, with a couple of extensions. A double outline on an attribute or tie indicates that a history of changes is kept and the knot symbol (an outlined square with rounded edges) is also available.
makes it possible to non-destructively add the necessary modeling concepts needed to capture a change, in such a way that every previous schema
always remains as a subset of the current schema. Since the existing schema is not touched, this gives the benefit of being able to evolve the database in a highly iterative manner and without causing any downtime.
Changes in the content of information is done by emulating similar features of a temporal database
in a relational database
. In Anchor Modeling, pieces of information can be tied to points in time or to intervals of time (both open and closed). The time points when events occur are modeled using attributes, e g the birth dates of persons or the time of a purchase. The intervals of time in which a value is valid are captured through the historization of attributes and ties, e g the changes of hair color of a person or the period of time during which a person was married. In a relational database this is achieved by adding a single column
, with a data type
granular enough to capture the speed of the changes, to the table
corresponding to the historized attribute or tie. This adds a slight complexity as more than one row
in the table have to be examined in order to know if an interval is closed or not.
Points or intervals of time not directly related to the domain being modeled, such as the points of time information entered the database, are handled through the use of metadata
rather than any of the above mentioned constructs.
Anchor tables contain a single column in which identities are stored. An identity is assumed to be the only property of an entity that is always present and immutable. As identities are rarely available from the domain being modeled, they are instead technically generated, e g from an incrementing number sequence.
An example of an anchor for the identities of the nephews of Donald Duck
is a set of 1-tuples:
Knots can be thought of as the combination of an anchor and a single attribute. Knot tables contain two columns, one for an identity and one for a value. Due to storing identites and values together, knots cannot be historized. Their usefulness comes from being able to reduce storage requirements and improve performance, since tables referencing knots can store a short value rather than a long string.
An example of a knot for genders is a set of 2-tuples:
Static attribute tables contain two columns, one for the identity of the entity to which the value belongs and one for the actual property value. Historized attribute tables have an extra column for storing the starting point of a time interval. Knotted attribute tables switch the value column for one containing a second identity that references a value in a knot.
An example of a static attribute for their names is a set of 2-tuples:
An example of a knotted static attribute for their genders is a set of 2-tuples:
An example of a historized attribute for the (changing) colors of their outfits is a set of 3-tuples:
Static tie tables relate two or more anchors to each other, and contain two or more columns for storing the identities. Historized tie tables have an extra column for storing the starting point of a time interval. Knotted tie tables have an additional column for each referenced knot.
An example of a static tie for the sibling relationship is a set of 2-tuples:
The resulting tables will all be in sixth normal form
except for ties in which not all columns are part of the primary key.
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...
, with extensions for working with temporal data. The modeling technique is based around four modeling constructs: the anchor, attribute, tie and knot, each capturing different aspects of the domain being modeled.
The resulting models can be translated to physical database designs using formalized rules. When such a translation is done the tables in the relational database will mostly be in the sixth normal form
Sixth normal form
Sixth normal form is a term in relational database theory, used in two different ways.-6NF :A book by Christopher J...
.
Philosophy and history
Anchor Modeling was created in order to take advantage of the benefits from a high degree of normalizationDatabase normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
while avoiding its drawbacks. Advantages such as being able to non-destructively evolve the model, avoid null values, and keep the information free from redundancies are gained. Performance issues due to extra joins are largely avoided thanks to a feature in modern database engines called 'table elimination'. In order to handle changes in the information content Anchor Modeling emulates aspects of a temporal database
Temporal database
A temporal database is a database with built-in time aspects, for example a temporal data model and a temporal version of Structured Query Language.More specifically the temporal aspects usually include valid-time and transaction-time...
in the resulting relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
schema.
The earliest installations using Anchor Modeling were made in Sweden
Sweden
Sweden , officially the Kingdom of Sweden , is a Nordic country on the Scandinavian Peninsula in Northern Europe. Sweden borders with Norway and Finland and is connected to Denmark by a bridge-tunnel across the Öresund....
with the first dating back to 2004, when a data warehouse
Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...
for an insurance company was built using the technique. In 2007 the technique was being used in a few data warehouses and one OLTP system, and it was presented internationally by Lars Rönnbäck at the TDWI (The Data Warehousing Institute) conference in Amsterdam
Amsterdam
Amsterdam is the largest city and the capital of the Netherlands. The current position of Amsterdam as capital city of the Kingdom of the Netherlands is governed by the constitution of August 24, 1815 and its successors. Amsterdam has a population of 783,364 within city limits, an urban population...
. This stirred enough interest for the technique to warrant a more formal description. Since then research concerning Anchor Modeling is being done in a collaboration between the creators Olle Regardt and Lars Rönnbäck and a team at the Department of Computer and Systems Sciences, Stockholm University
Stockholm University
Stockholm University is a state university in Stockholm, Sweden. It has over 28,000 students at four faculties, making it one of the largest universities in Scandinavia. The institution is also frequently regarded as one of the top 100 universities in the world...
. The first paper, in which Anchor Modeling is formalized, was presented at the 28th International Conference on Conceptual Modeling and won the best paper award.
The research can be followed at www.anchormodeling.com, where material on Anchor Modeling is made public and free to use under a Creative Commons
Creative Commons
Creative Commons is a non-profit organization headquartered in Mountain View, California, United States devoted to expanding the range of creative works available for others to build upon legally and to share. The organization has released several copyright-licenses known as Creative Commons...
license. An online modeling tool is also available, which is free to use and Open Source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
.
Basic notions
Anchor Modeling has four basic modeling concepts, anchors, attributes, ties, and knots. Anchors are used to model entities and events, attributes are used to model properties of anchors, ties model the relationships between anchors, and knots are used to model shared properties, such as states. Attributes and ties can be historized when changes in the information they model need to be kept.An example model showing the different graphical symbols for all the concepts can be seen below. The symbols resemble those used in Entity-Relationship modeling
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...
, with a couple of extensions. A double outline on an attribute or tie indicates that a history of changes is kept and the knot symbol (an outlined square with rounded edges) is also available.
Temporal aspects
Anchor Modeling handles two types of informational evolution, structural changes and content changes. Changes to the structure of information is represented through extensions. The high degree of normalizationDatabase normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
makes it possible to non-destructively add the necessary modeling concepts needed to capture a change, in such a way that every previous schema
Database schema
A database schema of a database system is its structure described in a formal language supported by the database management system and refers to the organization of data to create a blueprint of how a database will be constructed...
always remains as a subset of the current schema. Since the existing schema is not touched, this gives the benefit of being able to evolve the database in a highly iterative manner and without causing any downtime.
Changes in the content of information is done by emulating similar features of a temporal database
Temporal database
A temporal database is a database with built-in time aspects, for example a temporal data model and a temporal version of Structured Query Language.More specifically the temporal aspects usually include valid-time and transaction-time...
in a relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
. In Anchor Modeling, pieces of information can be tied to points in time or to intervals of time (both open and closed). The time points when events occur are modeled using attributes, e g the birth dates of persons or the time of a purchase. The intervals of time in which a value is valid are captured through the historization of attributes and ties, e g the changes of hair color of a person or the period of time during which a person was married. In a relational database this is achieved by adding a single column
Column (database)
In the context of a relational database table, a column is a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed....
, with a data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
granular enough to capture the speed of the changes, to the table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...
corresponding to the historized attribute or tie. This adds a slight complexity as more than one row
Row (database)
In the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...
in the table have to be examined in order to know if an interval is closed or not.
Points or intervals of time not directly related to the domain being modeled, such as the points of time information entered the database, are handled through the use of metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
rather than any of the above mentioned constructs.
Relational representation
In Anchor Modeling there is a one-to-one mapping between the symbols used in the conceptual model and tables in the relational database. Every anchor, attribute, tie, and knot have a corresponding table in the database with an unambiguously defined structure. A conceptual model can thereby be translated to a relational database schema using simple automated rules, and vice versa. This is different from many other modeling techniques in which there are complex and sometimes subjective translation steps between the conceptual, logical, and physical levels.Anchor tables contain a single column in which identities are stored. An identity is assumed to be the only property of an entity that is always present and immutable. As identities are rarely available from the domain being modeled, they are instead technically generated, e g from an incrementing number sequence.
An example of an anchor for the identities of the nephews of Donald Duck
Donald Duck
Donald Fauntleroy Duck is a cartoon character created in 1934 at Walt Disney Productions and licensed by The Walt Disney Company. Donald is an anthropomorphic white duck with a yellow-orange bill, legs, and feet. He typically wears a sailor suit with a cap and a black or red bow tie. Donald is most...
is a set of 1-tuples:
{⟨#42⟩, ⟨#43⟩, ⟨#44⟩}
Knots can be thought of as the combination of an anchor and a single attribute. Knot tables contain two columns, one for an identity and one for a value. Due to storing identites and values together, knots cannot be historized. Their usefulness comes from being able to reduce storage requirements and improve performance, since tables referencing knots can store a short value rather than a long string.
An example of a knot for genders is a set of 2-tuples:
{⟨#1, 'Male⟩, ⟨#2, 'Female'⟩}
Static attribute tables contain two columns, one for the identity of the entity to which the value belongs and one for the actual property value. Historized attribute tables have an extra column for storing the starting point of a time interval. Knotted attribute tables switch the value column for one containing a second identity that references a value in a knot.
An example of a static attribute for their names is a set of 2-tuples:
{⟨#42, 'Huey'⟩, ⟨#43, 'Dewey'⟩, ⟨#44, 'Louie'⟩}
An example of a knotted static attribute for their genders is a set of 2-tuples:
{⟨#42, #1⟩, ⟨#43, #1⟩, ⟨#44, #1⟩}
An example of a historized attribute for the (changing) colors of their outfits is a set of 3-tuples:
{⟨#44, 'Orange', 1938-04-15⟩, ⟨#44, 'Green', 1939-04-28⟩, ⟨#44, 'Blue', 1940-12-13⟩}
Static tie tables relate two or more anchors to each other, and contain two or more columns for storing the identities. Historized tie tables have an extra column for storing the starting point of a time interval. Knotted tie tables have an additional column for each referenced knot.
An example of a static tie for the sibling relationship is a set of 2-tuples:
{⟨#42, #43⟩, ⟨#42, #44⟩, ⟨#43, #42⟩, ⟨#43, #44⟩, ⟨#44, #42⟩, ⟨#44, #43⟩}
The resulting tables will all be in sixth normal form
Sixth normal form
Sixth normal form is a term in relational database theory, used in two different ways.-6NF :A book by Christopher J...
except for ties in which not all columns are part of the primary key.