# A glimpse of Dimensional Modelling

## Why Data Warehouse?

Datawarehouse is important in modern technology world for several reasons as below:

• To ensure the consistency and quality of data.
• To ensure the governance of data for a large enterprises.
• To analyze data from different sources(homo and hetero)
• To separate data from transactional to analytical model
• To improve the speed and efficiency of accessing data
• To help in arriving decision making system

## Dimensional Modelling

“Dimensional Modelling is a design technique for databases intended to support end-user queries in a data warehouse.”

— Ralph Kimball

Dimensional modelling features as best to answer the questions the business would like to understand; not on the operational system. For eg. business might want to get the sales order for last year or growth over a period of times etc. Dimensional modelling ensures an easy and efficient way of data retrieval.

Dimensional Modelling primarily consists of two types of entities: Dimension and Fact.

### Fact:

A measurable matrix that is normally used to calculate the number of sales, quantity or sales order etc. It contains the Primary key of Dimension associated with. A Fact can contain facts and pointer to dimension.

### Dimension:

A descriptive information of a measurement in Fact. There will be more number of Dimension tables than Fact. A dimension can contain many dimension attributes.

### How do we understand the Dimension & Fact

One of the widely appreciated way is based on 7Ws of Data Warehouse design on a data story. The question words are:

How, What, When , Who, Where and Why – determines Dimensions.

How many/much – determines Fact.

## Star Schema:

A pictorial representation of star schema as below. Few points to be noted are:

• Fact is pointing to Dimension.
• Dimensions are descriptive and denormalized form, not pointing to another dimension.
• Dimension Keys are normally surrogate keys, not generated by source, but load process. This is to ensure the change in the source is not going to impact the data warehouse system.
• Dimension attributes determines the granularity called grains of Facts.

## Snowflake Schema:

• Fact is pointing to a Dimension.
• Dimension can point to another Dimension as shown in the below picture.
• Good for Transaction Database design, however not so efficient one for Data warehouse (unless its proven).
• It can incur performance issues due to several join requirement to get the required data that business asked in reporting purposes.

## Bus Matrix:

Bus matrix is a key item in the requirement phase of Data warehouse systems. It provides information and relation of dimensions and subject areas in a matrix form. The matrix is generated from various sources of truth by asking questions like user interviews/surveys etc. It provides a clear picture of shared/common Dimensions used by various subject areas and its dependencies easily. This is important for any DW project to keep this ready and up-to-date.

The below picture is a representation of bus matrix on AdventureWorks DW prepared in an Excel.

## Changing the Dimension values & its significance

Occasionally there could be some changes happen even for Dimension, though its rare. So, its important to understand how the system should behave to the modification and these are called slowly changing dimensions.

3 Different Types of SCD (Slowly Changing Dimensions)

• Type 1

This is the most simplest form of SCD. Its is easier to implement. In principle, it just updates the current value with new value. The disadvantage is that we will lose the ability of understanding the previous data, history data, as the update is in-place.

• Type 2

This type tracks the historical data as a snapshot. For eg: This can be implemented with versioning concepts with start date and end date, hence at any point of times, the reports can roll up the data accordingly based on the versioning. The best way to get historical information, however, more complex to be implemented.

• Type 3

This type is an easier one than Type2, while keeping the historical data. As an example, keeping the old and new values by adding an extra column for the changed dimension value.

If you enjoyed this blog post, feel free to share it with your friends!

# Data Modelling & Key Concepts

## Definition of Data Modelling

Data Modelling is a process of identifying the business requirements of an organization and modelling the organization of data to meet. Data modelling evaluates and measures the flow of data in and out to the database management system. It describes the treatment of data and its connecting objects. It ensures a great customer experience by modelling the business data in a most appropriate manner.

## Data modelling Types

There are 3 types of Data modelling

• Conceptual
• Logical
• Physical

### Conceptual Data Modelling

Conceptual modelling is a method of describing information in high level. It describes the semantics of an organization and its assertions. It usually hides the internal or low level of information about data structures.

### Logical Data Modelling

Logical modelling is a representation of logical design of an information system, a representation of the abstract structure of the information domain that defines all the logical constraints applied to the data stored.

### Physical Data Modelling

Physical data modelling is a representation of an implementation design of business requirement. A complete physical schema includes artifacts required to achieve performance goals or create relationships between data, such as indexes, linking tables, and constraint definitions. Analysts can use a physical schema to calculate storage estimates, and this may include specific storage allocation details for an information system.

In a simpler way, An entity (in logical modelling) is called as Table (in physical modelling) and Attribute (in logical modelling) is called as Columns (in physical modelling).

If you enjoyed this blog post, feel free to share it with your friends!