Data Warehouse
Collection of processes and data to support the business with its analysis and decision making
Current, Historical Data
Data warehouse store ___ _ and _____ and are used for creating analytical reports for knowledge workers
subject-oriented databases
Data warehousing is a collection of integrated _________
Decision Support System
Designed to support the ________ function, where each unit of data is relevant to some moment in time
Atomic data and lightly summarized data
Data warehouse contains______ ****__ and _____________
Non-Volatile
Subject-Oriented
Integrated
Time Variant
Characteristics of Data Warehouse
1960
General Mills and Darthmouth College, in a joint research project, develop the terms dimensions and facts.
1970
ACNielsen and IRI provide dimensional data marts for retail sales.
1970
Bill Inmon beings to define and discuss the term: Data Warehouse
1975
Sperry Univac Introduces MAPPER (Maintain, Prepare, and Produce Executive Reports) is a database management and reporting system that includes world’s first 4GL, First platform designed for building information Centers (forerunner of contemporary Enterprise Data Warehousing platforms)
1996
Ralph Kimball publishes the book the Data Warehouse Toolkit
2000
Daniel Linstedt releases the Data Vault, enabling real-time auditable data warehouses
2012
Bill Inmon developed and made public technology known as “textual disambiguation”. Textual disambiguation applies context to the raw text and reformats the raw text and context into a standard database format.
Dimension and Facts
What did General Mills and Darthmouth College developed?
MAPPER
a database management and reporting system that includes the world’s first 4GL, first platform designed for building information Centers (forerunner of contemporary enterpise data warehousing platforms).
Data Vault
enables real-time auditable data warehouses warehouse.
Textual Disambiguation
applies context to the raw text and reformats the raw text and context into a standard database format.
Source System
Where data comes from
Data created or collected by operational application systems that run the business
Order processing
Production Scheduling
Financial Trading Systems
Policy Administration
Claims Handling
Accounts Payable/Receivable
Employee Payroll
Examples of Large Applications that have existed for a long time
Data Mart
Database in which the data is organized to support the business
Business Intelligence
Collection of reports and analyses
Data Model
Dictates what is the correct graph, decision, and format to process the data
Abstraction of how individual data elements relate to each other.
Visually depicts how the data will be organized and stored in a database.
Provides the mechanism for documenting and understanding how data is organized.
Business Requirements
Data Sources
Data Modelling
ETL Design
Front-End Design
Steps In Designing a Data
To understand what is really happening in the business
To identify historical trends
To predict future opportunities
To measure performance
Why Build a Data Warehouse?
Descriptive Analytics
answers what is happening in the business; current state
Predictive Analytics
historical trends to dictate what will happen in the future; trends indicate baseline patterns
Diagnostic
performance, comparison between a and b; applicable for SM
Direct Data Access
Data Usefulness
Poor Data Quality
Facilitating Exception Reporting
Timeliness of Data
Flexibility
Data Integration
Silo Reporting Environments
Unclear Definitions of Data
Why build a data warehouse? - Basic Issues
Tracking and trending key performance indicators
Measuring business performance
Reporting and Understanding financial results
Understanding customers and their Behavior
Identifying High-Value Customers
Attracting and Retaining High-Value Customers
Better Selection or Development of New Products
Understanding which Products should be Scaled Back or Eliminated
Understanding the impact of highly qualified professionals
The value and benefits of Data Warehouse
Single Version of Truth
Access all data whenever it is needed
Promises of Data Warehouse
Strong Partnership between the Business & IT Communities
Ensuring that the DW is driven by True Business Requirements
Shifting to a Global Perspective
Overcoming Unrealistic Expectations
Providing Clear Communication
Treating Data as a Corporate Asset
Effectively Leveraging Technology
Key Success Factors
Survival Mode
Staying ahead of the game
Changing with the times
Back to Basics
Global Innovator
Impact of Data Warehouse in Business
65 Rows
Problem:
TBLA.COL_A (LEFT) = TBLB.COL_A (RIGHT)
TBLA: 100 Rows
TBLB: 75 Rows
Matches: 65
Using Inner Join, how many rows are to be displayed?
100 rows
Problem:
TBLA.COL_A (LEFT) = TBLB.COL_A (RIGHT)
TBLA: 100 rows
TBLB: 75 rows
Matches: 65
Using Left Outer Join, how many rows are to be displayed?
75 rows
Sample Problem:
TBLA.COL_A (LEFT) = TBLB.COL_A (RIGHT)
TBLA: 100 rows
TBLB: 75 rows
Matches: 65
Using Right Outer Join, how many rows are to be displayed?
110 Rows
Problem:
TBLA.COL_A (LEFT) = TBLB.COL_A (RIGHT)
TBLA: 100 rows
TBLB: 75 rows
Matches: 65
Using Full Outer Join, how many rows are to be displayed?
Aggregate Functions
Summarizes data
Single Column, Single Row Result
COUNT()
SUM()
AVG()
MIN()
MAX()
Keywords Used in Aggregate Functions
Nested - Single Value
Single Row - Single Column
Nested - Multiple Values
Multiple Values
Multiple Rows - Single Column
IN Keyword
Analogous to If-else Operator rather than equal operator
EXISTS Keyword
Analogous to the Equal Operator
ANY Keyword
Analogous to the OR Operator
ALL Keyword
Analogous to the AND Operator