What is the definition of the data warehouse?
“The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”
From the Characteristics of Data Warehousing: what is subject oriented?
data are organized by detailed subject, containing information relevant for decision support.
This ability to define a data warehouse by subject matter makes the data warehouse subject oriented.
e.g. sales, products, or customers. For example, to learn more about your company’s sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like “Who was our best customer for this item last year?”
From the Characteristics of Data Warehousing: what is integrated?
data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
From the Characteristics of Data Warehousing: what is time variant?
In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse’s focus on change over time is what is meant by the term time variant.
From the Characteristics of Data Warehousing: what is nonvolatile?
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred
What in short are the 4 defining characteristics of the DWH?
4 Defining Characteristics:
What are three main types of data warehouses?
What is a data mart?
What is a dependent data mart?
What is an independent data mart?
What are OPERATIONAL DATA STORES (ODS)?
What are OPER MARTS?
What is an ENTERPRISE DATA WAREHOUSE
(EDW)?
What is Metadata?
What are the Two Data Warehousing Strategies?
What is the Data Mart Strategy (bottom up, Kimball)?
What is the Enterprise-Wide Strategy (top down, Inmon)?
Note: if organization has ERP and that is the only system available, it is easy to implement the enterprise-wide strategy
What is the difference between OLTP and DWH in terms of indexes?
What are Indexes?
Indexes are data structures which hold field values from the indexed column(s) and pointers to the related record(s). This data structure is then sorted and binary searches are performed to quickly find the record.
What is the difference between OLTP and DWH in terms of joins?
What is the difference between OLTP and DWH in terms of duplicated data?
Normalization = minimize the redundancy
What is the difference between OLTP and DWH in terms of derived data and aggregates?
What is dimensional modelling?
What are the objectives of dimensional modelling?
What are facts?
Facts are the important entity: a sale
Facts have measures that can be aggregated: sales price