Data mart
Encyclopedia
A data mart is the access layer of the data warehouse
Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

 environment that is used to get data out to the users. The data mart is a subset of the data warehouse which is usually oriented to a specific business line or team.

Terminology

There can be multiple data marts inside a single corporation; each one relevant to one or more business units for which it was designed. Data marts may or may not be dependent or related to other data marts in a single corporation. If the data marts are designed using conformed facts
Fact table
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema or a snowflake schema, surrounded by dimension tables....

 and dimensions
Dimension (data warehouse)
In a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric...

, then they will be related. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data. This enables each department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc.

The related term spreadmart
Spreadmart
A spreadmart is a concept describing the tendency of spreadsheets to "run amok" in organizations. Typically a spreadmart is created by individuals at different times using different data sources and rules for defining metrics in an organization, creating a fractured view of the enterprise...

 describes the situation that occurs when one or more business analysts develop a system of linked spreadsheets to perform a business analysis, then grow it to a size and degree of complexity that makes it nearly impossible to maintain.

Design schemas

  • Star schema
    Star schema
    In computing, the star schema is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables...

     or dimensional model is a fairly popular design choice, as it enables a relational database
    Relational database
    A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

     to emulate the analytical functionality of a multidimensional database.
  • Snowflake schema
    Snowflake schema
    In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.The snowflake schema...

  • Datamart Architecture Pattern (EA Reference Architecture)

Reasons for creating a data mart

  • Easy access to frequently needed data
  • Creates collective view by a group of users
  • Improves end-user response time
  • Ease of creation
  • Lower cost than implementing a full data warehouse
    Data warehouse
    In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

  • Potential users are more clearly defined than in a full data warehouse

Dependent data mart

According to the Inmon
Bill Inmon
William Harvey Inmon is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference , wrote the first column in a magazine and was the first to offer classes in data warehousing...

 school of data warehousing, a dependent data mart is a logical subset (view
View (database)
In database theory, a view consists of a stored query accessible as a virtual table in a relational database or a set of documents in a document-oriented database composed of the result set of a query or map and reduce functions...

) or a physical subset (extract) of a larger data warehouse
Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

, isolated for one of the following reasons:
  • A need refreshment for a special data model
    Data model
    A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

     or schema
    Logical schema
    A Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...

    : e.g., to restructure for OLAP
  • Performance: to offload the data mart to a separate computer
    Computer
    A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

     for greater efficiency or to obviate the need to manage that workload on the centralized data warehouse.
  • Security: to separate an authorized data subset selectively
  • Expediency: to bypass the data governance and authorizations required to incorporate a new application on the Enterprise Data Warehouse
  • Proving Ground: to demonstrate the viability and ROI (return on investment) potential of an application prior to migrating it to the Enterprise Data Warehouse
  • Politics: a coping strategy for IT (Information Technology) in situations where a user group has more influence than funding or is not a good citizen on the centralized data warehouse.
  • Politics: a coping strategy for consumers of data in situations where a data warehouse team is unable to create a usable data warehouse.


According to the Inmon school of data warehousing, tradeoffs inherent with data marts include limited scalability, duplication of data, data inconsistency with other silos of information, and inability to leverage enterprise sources of data.

See also

  • Data warehouse
    Data warehouse
    In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

  • Enterprise architecture
    Enterprise architecture
    An enterprise architecture is a rigorous description of the structure of an enterprise, which comprises enterprise components , the externally visible properties of those components, and the relationships between them...

  • OLAP cube
    OLAP cube
    An OLAP cube is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK