Database design
Encyclopedia
Database design is the process of producing a detailed data model
of a database
. This logical data model
contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language
, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.
The term database design can be used to describe many different parts of the design of an overall database system
. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model
these are the tables and views. In an object database
the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system
(DBMS).
The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:
) diagrams. An ER diagram is a diagram that helps to design databases in an efficient way.
Attributes in ER diagrams are usually modeled as an oval with the name of the attribute, linked to the entity or relationship that contains the attribute.
Within the relational model
the final step can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects. This step is not necessary with an Object database
.
Bold text
This process is one which is generally considered part of requirements analysis
, and requires skill on the part of the database designer to elicit the needed information from those with the domain knowledge. This is because those with the necessary domain knowledge frequently cannot express clearly what their system requirements for the database are as they are unaccustomed to thinking in terms of the discrete data elements which must be stored. Data to be stored can be determined by Requirement Specification.
design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity
.
A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization
can subsequently be performed, but only for performance
reasons. However, some modeling disciplines, such as the dimensional modeling
approach to data warehouse
design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to 3NF.
Normalization consists of normal forms that are 1NF,2NF,3NF,BOYCE-CODD NF (3.5NF),4NF and 5NF
(NOTE: A common misconception is that the relational model
is so called because of the stating of relationships between data elements therein. This is not true. The relational model is so named because it is based upon the mathematical structures known as relations
.)
. In the case of relational databases the storage objects are tables which store data in rows and columns.
Each table may represent an implementation of either a logical object or a relationship joining one or more instances of one or more logical objects. Relationships between tables may then be stored as links connecting child tables with parents. Since complex logical relationships are themselves tables they will probably have links to more than one parent.
In an Object database
the storage objects correspond directly to the objects used by the Object-oriented programming language
used to write the applications that will manage and access the data. The relationships may be defined as attributes of the object classes involved or as methods that operate on the object classes.
s, data type
s, indexing
options and other parameters residing in the DBMS data dictionary
. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system.
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
of a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
. This logical data model
Logical data model
A logical data model in systems engineering is a representation of an organization's data, organized in terms of entities and relationships and is independent of any particular data management technology.- Overview :...
contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language
Data Definition Language
A data definition language or data description language is a syntax similar to a computer programming language for defining data structures, especially database schemas.-History:...
, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.
The term database design can be used to describe many different parts of the design of an overall database system
Database system
A database system is a term that is typically used to encapsulate the constructs of a data model, database Management system and database....
. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
these are the tables and views. In an object database
Object database
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...
(DBMS).
The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:
- Determine the relationships between the different data elements.
- Superimpose a logical structure upon the data on the basis of these relationships.
ER Diagram (Entity-relationship model)
Database designs also include ER(Entity-relationship modelEntity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...
) diagrams. An ER diagram is a diagram that helps to design databases in an efficient way.
Attributes in ER diagrams are usually modeled as an oval with the name of the attribute, linked to the entity or relationship that contains the attribute.
Within the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
the final step can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects. This step is not necessary with an Object database
Object database
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
.
Bold text
The Design Process
The design process consists of the following steps:- Determine the purpose of your database - This helps prepare you for the remaining steps.
- Find and organize the information required - Gather all of the types of information you might want to record in the database, such as product name and order number.
- Divide the information into tables - Divide your information items into major entities or subjects, such as Products or Orders. Each subject then becomes a table.
- Turn information items into columns - Decide what information you want to store in each table. Each item becomes a field, and is displayed as a column in the table. For example, an Employees table might include fields such as Last Name and Hire Date.
- Specify primary keys - Choose each table’s primary key. The primary key is a column that is used to uniquely identify each row. An example might be Product ID or Order ID.
- Set up the table relationships - Look at each table and decide how the data in one table is related to the data in other tables. Add fields to tables or create new tables to clarify the relationships, as necessary.
- Refine your design - Analyze your design for errors. Create the tables and add a few records of sample data. See if you can get the results you want from your tables. Make adjustments to the design, as needed.
- Apply the normalization rules - Apply the data normalization rules to see if your tables are structured correctly. Make adjustments to the tables
Determining data to be stored
In a majority of cases, a person who is doing the design of a database is a person with expertise in the area of database design, rather than expertise in the domain from which the data to be stored is drawn e.g. financial information, biological information etc. Therefore the data to be stored in the database must be determined in cooperation with a person who does have expertise in that domain, and who is aware of what data must be stored within the system.This process is one which is generally considered part of requirements analysis
Requirements analysis
Requirements analysis in systems engineering and software engineering, encompasses those tasks that go into determining the needs or conditions to meet for a new or altered product, taking account of the possibly conflicting requirements of the various stakeholders, such as beneficiaries or users...
, and requires skill on the part of the database designer to elicit the needed information from those with the domain knowledge. This is because those with the necessary domain knowledge frequently cannot express clearly what their system requirements for the database are as they are unaccustomed to thinking in terms of the discrete data elements which must be stored. Data to be stored can be determined by Requirement Specification.
Normalization
In the field of relational databaseRelational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity
Data integrity
Data Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...
.
A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization
Denormalization
In computing, denormalization is the process of attempting to optimise the read performance of a database by adding redundant data or by grouping data. In some cases, denormalisation helps cover up the inefficiencies inherent in relational database software...
can subsequently be performed, but only for performance
Computer performance
Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.Depending on the context, good computer performance may involve one or more of the following:...
reasons. However, some modeling disciplines, such as the dimensional modeling
Dimensional modeling
Dimensional modeling is the name of a set of techniques and concepts used in data warehouse design. It is considered to be different from entity-relationship modeling . Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be...
approach to data warehouse
Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...
design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to 3NF.
Normalization consists of normal forms that are 1NF,2NF,3NF,BOYCE-CODD NF (3.5NF),4NF and 5NF
Conceptual schema
Once a database designer is aware of the data which is to be stored within the database, they must then determine where dependancy is within the data. Sometimes when data is changed you can be changing other data that is not visible. For example, in a list of names and addresses, assuming a situation where multiple people can have the same address, but one person cannot have more than one address, the name is dependent upon the address, because if the address is different, then the associated name is different too. However, the other way around is different. One attribute can change and not another.(NOTE: A common misconception is that the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
is so called because of the stating of relationships between data elements therein. This is not true. The relational model is so named because it is based upon the mathematical structures known as relations
Relation (mathematics)
In set theory and logic, a relation is a property that assigns truth values to k-tuples of individuals. Typically, the property describes a possible connection between the components of a k-tuple...
.)
Logically structuring data
Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management systemDatabase management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...
. In the case of relational databases the storage objects are tables which store data in rows and columns.
Each table may represent an implementation of either a logical object or a relationship joining one or more instances of one or more logical objects. Relationships between tables may then be stored as links connecting child tables with parents. Since complex logical relationships are themselves tables they will probably have links to more than one parent.
In an Object database
Object database
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
the storage objects correspond directly to the objects used by the Object-oriented programming language
Object-oriented programming language
This is a list of object-oriented programming programming languages.-Languages with object-oriented features:*ABAP*Ada 95*AmigaE*BETA*Blue*Boo*C++*C#*COBOL*Cobra*ColdFusion*Common Lisp*COOL*CorbaScript*Clarion*CLU*Curl*D*Dylan*E*Eiffel...
used to write the applications that will manage and access the data. The relationships may be defined as attributes of the object classes involved or as methods that operate on the object classes.
Physical database design
The physical design of the database specifies the physical configuration of the database on the storage media. This includes detailed specification of data elementData element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:# An identification such as a data element name# A clear data element definition# One or more representation terms...
s, data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
s, indexing
Index (database)
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space...
options and other parameters residing in the DBMS data dictionary
Data dictionary
A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format." The term may have one of several closely related meanings pertaining to...
. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system.
See also
- Database normalizationDatabase normalizationIn the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
- Relational databaseRelational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
- Relational modelRelational modelThe relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
- POODPoodPood , is a unit of mass equal to 40 funt . It is approximately 16.38 kilograms . It was used in Russia, Belarus and Ukraine. Pood was first mentioned in a number of documents of the 12th century....
(Principle of Orthogonal DesignPrinciple of Orthogonal DesignThe Principle of Orthogonal Design was developed by database researchers David McGoveran and Christopher J. Date in the early 1990s, and first published "A New Database Design Principle" in the July 1994 issue of Database Programming and Design and reprinted several times...
) - POFN (Principle of Full Normalization)
- The Third ManifestoThe Third ManifestoThe Third Manifesto is Christopher J. Date's and Hugh Darwen's proposal for future database management systems, a response to two earlier Manifestos with the same purpose. The theme of the manifestos is how to avoid the 'object-relational impedance mismatch' between object-oriented programming...
- Concept mapping
- Data modelingData modelingData modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...
- Entity-relationship modelEntity-relationship modelIn software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...
- Object-relationship modelling
- Object role modelingObject role modelingObject Role Modeling is a method for conceptual modeling, and can be used as a tool for information and rules analysis, ontological analysis, and data modeling in the field of software engineering.- Overview :...
- Knowledge representationKnowledge representationKnowledge representation is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge...
- Logical data modelLogical data modelA logical data model in systems engineering is a representation of an organization's data, organized in terms of entities and relationships and is independent of any particular data management technology.- Overview :...
- Mindmap
- Physical data modelPhysical data modelA physical data model is a representation of a data design which takes into account the facilities and constraints of a given database management system. In the lifecycle of a project it is typically derived from a logical data model, though it may be reverse-engineered from a given database...
- Semantic WebSemantic WebThe Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
- Three schema approachThree schema approachThe three-schema approach, or the Three Schema Concept, in software engineering is an approach to building information systems and systems information management from the 1970s...
Further reading
- S. Lightstone, T. Teorey, T. Nadeau, “Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more”, Morgan Kaufmann Press, 2007. ISBN 0-12369389-6
External links
- http://www.sqlteam.com/article/database-design-and-modeling-fundamentals
- http://office.microsoft.com/en-us/access/HA012242471033.aspx
- Database Normalization Basics by Mike Chapple (About.com)
- Database Normalization Intro, Part 2
- An Introduction to Database Normalization by Mike Hillyer.
- Normalization by ITS, University of Texas.
- Efficient Database Design
- Data Modelers Community