Data integrity
Encyclopedia
Data Integrity in its broadest meaning refers to the trustworthiness of system resource
System resource
A system resource is any part of a computer system, such as disk drive space, memory capacity, or processor time that might be used by a computer program. This is allocated by operating system so that programs can run efficiently....

s over their entire life cycle
Life cycle
Life cycle or lifecycle may refer to: * Biological life cycle* Enterprise life cycle* Life cycle assessment* New product development* Product lifecycle , various meanings* Project life cycle...

. In more analytic terms, it is "the representational faithfulness of information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

 to the true state of the object that the information represents, where representational faithfulness is composed of four essential qualities or core attributes: completeness, currency/timeliness, accuracy/correctness and validity/authorization.The concept of business rules is already widely used nowadays and is subdivided into six categories which include data rules. Data is further subdivided Data Integrity Rules, data sourcing rules, data extraction
Data extraction
Data extraction is the act or process of retrieving data out of data sources for further data processing or data storage...

 rules, data transformation
Data transformation
In metadata and data warehouse, a data transformation converts data from a source data format into destination data.Data transformation can be divided into two steps:...

 rules and data deployment rules.

Data Integrity is very important in database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 operations in particular and Data Warehousing and Business Intelligence
Business intelligence
Business intelligence mainly refers to computer-based techniques used in identifying, extracting, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes....

 in general. Because Data Integrity ensured that data is of high quality, correct, consistent and accessible, in is important to follow rules governing Data Integrity.

A Data Value Rule or Conditional Data Value Rule specifies data domains. The difference between the two is that the former specifies the domain of allowable values for a data attribute which applies to all situation while the latter does not apply to all situations but only when there exceptions or certain conditions that applies.

Data Structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 Rule defines that cardinality of data for a data relation in cases where there are no conditions of exceptions which apply. This rule makes data structure very easy to understand. A conditional data structure rule is slightly different in that is governs when conditions or exceptions apply on data cardinality for a data relation.

A Data Derivation Rule specifies the how a data value is derived based on algorithm, contributors and conditions. It also specifies the conditions on how the data value could be re-derived.

A Data Retention
Data retention
Data retention defines the policies of persistent data and records management for meeting legal and business data archival requirements. A data retention policy weighs legal and privacy concerns against economics and need to know concerns to determine both the retention time, archival rules, data...

 Rule specifies the length of time of data values which can be retained in a particular database. It is specifies what can be done with data values when its use for a database expires A data occurrence retention rule specifies the length of time the data occurrence is retained and what can be done with data when it is no longer useful. A data attribute retention rule is similar to a data retention rule but the data attribute retention rule only applies to specific data values rather than the entire data occurrence.

These Data Integrity Rules, like any other rules, are totally without meaning when they are not implemented and enforced.

In order to achieve Data Integrity, these rules should be consistently and routinely applied to all data which are entering the Data Warehouse or any Data Resource for that matter. There should be no waivers or exceptions for the enforcement of these rules because any slight relaxation of enforcement could mean a tremendous error result.

As much as possible, these Data Integrity Rules must be implemented in as close to the initial capture of data so that early detection and correction of potential breach of integrity can be taken action. This can greatly prevent errors and inconsistencies from entering the database.

With strict implementation and enforcement of these Data Integrity Rules, data error rates could be much lower so less time is spent on trying to troubleshoot and trace faulty computing results. This translates to savings from manpower expense.

Since there is low error rate, there can only be high quality data that can be had to provide better support in the statistical analysis, trend and pattern spotting, and decision making tasks of a company. In today's digital age, information one major key to success and having the right information means having better edge over the competitors.

"

Most narrowly, data with integrity has a complete or whole structure. All characteristics of the data including business rule
Business rule
A Business rule is a statement that defines or constrains some aspect of the business and always resolves to either true or false. Business rules are intended to assert business structure or to control or influence the behavior of the business. Business rules describe the operations, definitions...

s, rules for how pieces of data relate, dates, definitions and lineage must be correct for data to be complete.

Per the discipline of data architecture, when functions are performed on the data the functions must ensure integrity. Examples of functions are transforming the data, storing the history, storing the definitions (Metadata) and storing the lineage of the data as it moves from one place to another. The most important aspect of data integrity per the data architecture discipline is to expose the data, the functions and the data's characteristics.

Data that has integrity is identically maintained during any operation (such as transfer, storage or retrieval). Put simply in business terms, data integrity is the assurance that data is consistent, certified and can be reconciled.

In terms of a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 data integrity refers to the process of ensuring that a database remains an accurate reflection of the universe of discourse it is modelling or representing. In other words there is a close correspondence between the facts stored in the database and the real world it models.

Types of integrity constraints

Data integrity is normally enforced in a database system
Database system
A database system is a term that is typically used to encapsulate the constructs of a data model, database Management system and database....

 by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity.

Entity integrity
Entity integrity
In the relational data model, entity integrity is one of the three inherent integrity rules. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null.A direct consequence of...

concerns the concept of a primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null.

Referential integrity
Referential integrity
Referential integrity is a property of data which, when satisfied, requires every value of one attribute of a relation to exist as a value of another attribute in a different relation ....

concerns the concept of a foreign key
Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...

. The referential integrity rule states that any foreign key value can only be in one of two states. The usual state of affairs is that the foreign key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the business, a foreign key value can be null. In this case we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown.

Domain integrity specifies that all columns in relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.

If a database supports these features it is the responsibility of the database to insure data integrity as well as the consistency model
Consistency model
In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...

 for the data storage and retrieval. If a database does not support these features it is the responsibility of the application to insure data integrity while the database supports the consistency model
Consistency model
In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...

 for the data storage and retrieval.

Having a single, well controlled, and well defined data integrity system increases stability (one centralized system performs all data integrity operations), performance (all data integrity operations are performed in the same tier as the consistency model
Consistency model
In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...

), re-usability (all applications benefit from a single centralized data integrity system), and maintainability (one centralized system for all data integrity administration).

Today, since all modern database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s support these features (see Comparison of relational database management systems
Comparison of relational database management systems
The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. This article is not all-inclusive or necessarily up to date...

), it has become the defacto responsibility of the database to insure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets, ISAM, flat files, etc.) for their consistency model
Consistency model
In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...

lack any kind of data integrity model. This requires companies to invest a large amount of time, money, and personnel in the creation of data integrity systems on a per application basis that effectively just duplicate the existing data integrity systems found in modern databases. Many companies, and indeed many database systems themselves, offer products and services to migrate out-dated and legacy systems to modern databases to provide these data integrity features. This offers companies a substantial savings in time, money, and resources because they do not have to develop per application data integrity systems that must be re-factored each time business requirements change.

Examples

An example of a data integrity mechanism is the parent and child relationship of related records. If a parent record owns one or more related child records all of the referential integrity processes are handled by the database itself, which automatically insures the accuracy and integrity of the data so that no child record can exist without a parent (also called being orphaned) and that no parent loses their child records. It also ensures that no parent record can be deleted while the parent record owns any child records. All of this is handled at the database level and does not require coding integrity checks into each applications.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK