SESSION 7 - Recent developments in data warehousing Flashcards

Question 1

Q

Whats the article about?

Answer

A

data warehouses are created to provide dedicated source of data to support decision-making applications  data repositories play important role in understanding customer behaviour, connecting trading partners along supply chain, supporting performance management systems etc
* aim of paper: tutorial on data warehousing that introduces newest concepts in field

Question 2

Q

What are the 4 data warehouse characteristics?

Answer

A

Data warehouse: collection of data created to support decision making; provides data infrastructure & thus eliminates failures that are due to lack of quality
–> ‘single version of truth’
4 DATA WAREHOUSE CHARACTERISTICS
1) Subject oriented: organised around specific subjects (sales, customer, products)
2) Integrated: data collected from multiple systems (internal/external) and integrated around subjected (eg in the end data around one customer identifier)
3) Time variant: maintains historical data
4) Non-volatile: users cannot change or update data; updates happen through IT controlled load processes

Question 3

Q

What are data marts?

Answer

A

Data mart: similar to data warehouse but stores data only for limited number of subject areas (smaller in scope than data warehouses)

Independent data mart: built directly from source systems; often ‘point solution’ that solves immediate problem but creates new ones in long run when firm tries to implement enterprise-wide data infrastructure (eg data warehouse)

Dependent data mart: created with data drawn from data warehouse; provides copy of data extracted from data warehouse to give users customised view of specific data; are preferred over independent ones as they come from organisation-wide source - ‘single version of truth’ is maintained

Question 4

Q

What are operational data stores?

Answer

A

Operational data store (ODS): consolidates data from multiple source systems & provides near real-time, integrated view of volatile, current (max 60 day old) data; aim to provide integrated data for organisational purposes; often used to avoid ERP implementation

Question 5

Q

Whats the architecture for data warehousing?

Answer

A

Architecture for data warehousing: includes the component parts and the relationships among the parts –>Data sources, ETL software, data stores, data analysis tools & applications, users

Question 6

Q

Whats the data mart strategy?

Answer

A

Data mart strategy: ‘start small, think big approach’  begins with specific business need for data and grows after time
* initial data mart contains data for only limited number of subject areas, draws from limited sources
* recognising that data mart will grow: needs to be planned and exercised beforehand as mart will grow if it is successful
–>if not: difficult to integrate data logically & physically
* advantages: due to limited scope initially developed quickly, at low cost, with less financial risk
* disadvantages: difficult to successfully grow a data mart while integrating new subject areas, data, users, applications along the way

Question 7

Q

Whats the enterprise data warehouse approach?

Answer

A

Data warehouse approach: traditional approach t warehousing; does not preclude creation of data marts but mart are created after warehouse is built & pull data from warehouse, not from source systems (dependent data)
* advantages: faster system response time as marts + simpler data view as marts are dependent; results in integrated data warehouse containing many subject areas for multiple users
* disadvantages: risk that it is never completed or fails to meet end users needs

Question 8

Q

What is ETL?

Answer

A

ETL: data extraction, transformation, loading
–> takes fata from source systems, prepares it for decision-support purposes & places it in target data base; ‘plumbing’ – dirty, complex, time-consuming, expensive

Question 9

Q

What are the two types of data extraction methods?

Answer

A

Custom-write data extraction programs: firm writes own ETL software when: wanting to avoid costs of purchasing, knows source systems well, understand complexities
Purchasing commercial ETL software: available from major database vendors & firms specialising in ETL software; allows firms to specify source systems, indicate tables and columns to be used, move data to specified targets & automate process

Question 10

Q

What is meant by data transformation?

Answer

A

Data transformation: several ways - cleansing the data, integrating the data, other transformations
DATA CLEANSING
Dirty data: result of poor data quality practice which little can be done about the data that is already there
* sources of dirty data: dummy values - inappropriate values entered; absence of data; cryptic data; multipurpose fields; contradicting data; violations of business rules; non-unique identifiers
* solutions to data cleansing: 1) rely on basic cleaning capabilities of ETL software 2) custom-write cleansing routines 3) use special-purpose data cleansing software

Question 11

Q

What is meant by data loading?

Answer

A

first loading provides the initial data for warehouse; second loadings can be handled differently
Bulk loading the warehouse: all of the data (old and new) is loaded each time; simple processing logic but impractical as volume increases
Change data capture/refreshing: refreshing the warehouse with only newly generated data; more complicated
when refreshing, must be decided how many times to load
–> based on business need for data & business cycle that provides the data = trend: continuous updating (trickle loading)

Question 12

Q

What is meta data?

Answer

A

Metadata: data about the data; needed by both IT personnel & users who access data
* IT personnel: data sources and targets, database, table, column names; refresh schedules etc
* data users: attribute definitions, report tools available, report distribution info, held desk info
* meta data has not received much attention yet: uncertainty about what metadata should be stored & lack of methods to share it across vendors’ products
* developments improving ability to create metadata: 1) attempts to create standards for metadata models 2) use of application program interfaces APIs

Question 13

Q

What are the three different types of information users?

Answer

A

Information producers: like analysts, create info for own and for others use
Information consumers: like executives, consume info that is prepared for them
1) Tourists: do not know what they want; look at overview of things with little deep data analysis; design of database is not issue as they do not do much with it; need metadata
2) Explorers: have idea what they are looking for but do not know how to find ‘priceless gem’ in data; biggest problem is vast amount of data; require metadata to get started
3) Farmers: know what they want, where and how to get it; find ‘flakes of gold’ in data rather than ‘gem’; use the same data regularly and thus do not rely on metdata

Question 14

Q

Whats the conclusion?

Answer

A

future for data warehousing is bright
–>necessity in many companies
REASONS
1) organisations become more information intensive = need for reliable decision support
2) through use of operational data stores, it is used with operational applications and leads to better understanding of market segments, buying behaviour and other changes
3) data warehousing research is on rise
4) industry is starting to provide resources to help in teaching about data warehousing

SESSION 7 - Recent developments in data warehousing Flashcards

(14 cards)