Springwater 02 | Creating a snowflake data schema

Springwater 02 | Creating a snowflake data schema

Introduction

In the first post in this series I described the business model underlying Fultec’s network service agreement with its customer Springwater.

In this post I’m going to introduce the final data model used in Power BI to support business intelligence reporting for the contract. Here’s a reminder of the structure of the OLTP database Fultec uses to manage service provision to Springwater:

oltp data model in microsoft sql server

I’m going to be using Power BI Desktop to create a data model that connects to this OLTP database. The data model will then drive a set of reports and dashboards that Fultec’s business managers can use to help manage the contract effectively.

OLTP vs. OLAP

An unmodified OLTP database such as the one illustrated above is almost always unsuitable for use in a business intelligence environment:

  • it’s highly normalised, and designed to support day-to-day business processes, rather than decision support
  • ordinary business users find OLTP databases difficult to understand
  • it’s very difficult for a designer to create effective and meaningful business measures because of the complexity of the relationships in the model
  • running a business intelligence solution directly from an operational database may adversely impact the performance of day-to-day business processes

Generally speaking, the objective of a business intelligence data model design should be to turn the OLTP data model into one that implements either a star schema or a snowflake schema. These two patterns are examples of how to structure an OLAP database. They are generally conceptually easier for business users to understand. More importantly, they make the job of the developer much easier because they simplify and help eliminate ambiguities from the resulting measures used to report business metrics.

Star and snowflake schemata typically restructure a database into:

  • a series of fact tables that contain the numeric data (such as cost or revenue) on which the business wishes to report, and
  • a number of dimension tables to which the fact tables are related. Dimension tables contain related attributes which help to define the context of the data in the fact tables, such as customer, supplier, customer order and product.

Reports built on the star or snowflake schema slice and dice the numeric data in the fact tables in the model using the attributes from the dimension tables.

The Springwater snowflake schema

The diagram below shows the snowflake schema I designed based on the Springwater OLTP database. There is no loss of information in this model. All the data from the OLTP database can also be found in the snowflake schema, but structured in a way that is far easier to report on.

snowflake schema in microsoft power bi desktop

Springwater fact tables

This model comprises six fact tables:

  • Service Package Inventory – the monthly inventory of active (live) service packages ordered by Springwater and deployed to the network by Fultec
  • Commercial Inventory – the monthly inventory of active infrastructure sets representing all of the live service packages deployed to the network. Fultec bills Springwater for the total bandwidth on each service package at the specified resilience level
  • Revenue – the individual revenue components billed by Fultec for each infrastructure set in the commercial inventory. Typically, there will be up to three revenue components for each infrastructure set each month
  • Circuit Inventory – the monthly inventory of active physical access circuits deployed by Fultec to support the service packages deployed in each commercial inventory infrastructure set
  • Cost – the individual cost components incurred by Fultec for the access circuits, CPE and ports in the physical network. Similarly to revenue, there are multiple cost components for each circuit in the circuit inventory

Springwater dimension tables

As well as the fact tables, the model also includes seven dimension tables:

  • Service Package – the list of valid service packages that Springwater can order from Fultec for its customers
  • Site – the list of Springwater customer sites. Each site has a customer name associated with it as well as a site name and location
  • Set – the list of all infrastructure sets ever recorded at each customer site. Each site may have one or more infrastructure sets. The change in infrastructure sets at a site over time is recorded in the Commercial Inventory fact table
  • Revenue Type – the types of revenue recorded against the commercial inventory in the Revenue fact table
  • Circuit – the list of all of the physical access circuits that have ever been provisioned for an infrastructure set. The change in physical access circuits for a set over time is recorded in the Circuit Inventory fact table
  • Cost Type – the types of cost recorded against the circuit inventory in the Cost table
  • Date – all Power BI data models should contain a date dimension table which is used to enable time intelligence reporting capabilities. At this stage in development of the model, I have included only a simplified Date table containing a date key. In future posts I will explain how this table can be enhanced to provide a richer reporting experience for users

In the next post I’ll dive in to how I built the Power BI data model, starting with the Service Package dimension table.