Springwater 03 | Setting up service packages

Springwater 03 | Setting up service packages

Introduction

In earlier posts I introduced the Springwater case study and explained the structure of the snowflake data model I’m planning to implement in Power BI. In this post I’ll walk through the approach I’ve taken to one component of the data model – the Service Package dimension table.

Service Packages in the OLTP database

In the OLTP database, service package data is part of the sp schema:

service package OLTP data model

The lower three tables in the diagram above store data about specific service package orders and the inventory of service packages at each customer site. We’re not concerned with this data in this post, because the purpose here is to create a dimension table that contains only those fixed attributes of a service package that do not change over time. This data is stored in the Service Package Type and Service Package tables in the OLTP database. You can see that the Service Package Type table contains only a single attribute, representing the type group that an individual service package belongs to. There’s no point in having a single attribute stored in a dimension table in Power BI, so I’ll combine the type attribute with the other attributes of a service package in a single Service Package dimension table.

In summary, I’m going to take the two tables Service Package Type and Service Package from the OLTP database, and extract, transform and load (ETL) them into a single Service Package dimension table in Power BI. To do this, I’ll use the Power Query tool in Power BI Desktop.

Organising Power Query

In a complex Power BI data model it’s important to adopt a consistent structure for queries, otherwise it will be difficult for you (or someone else) to keep the model up-to-date. I adopt a relatively straightforward hierarchical structure and naming convention in my Power BI files, sticking to the following principles:

  • only include spaces in names of queries if they are going to be exposed as tables in Power BI reports
  • keep fact and dimension queries separate from one another
  • keep queries only used in Power Query (support queries) separate from those that will be loaded in the data model (published queries)
  • maintain a separate set of queries that access external data sources. Don’t put any other ETL code in these queries
  • avoid intermediate queries if they only support a single query in Power Query. If this is the case, fold the code for the supporting query into the code for the dependent query so that it forms a single piece of code

Based on these principles, this is the query structure I’ve adopted for the Springwater case study:

power query query structure

As you can see, I’ve used the group structure in Power Query to separate queries into those that will be loaded into Desktop, dimension and fact queries, and those that are just used for support. The support queries comprise those used to extract data from the OLTP database. They’re all named using the schema name from SQL Server, followed by an underscore (_) and the name of the underlying table in SQL Server.

I’ve set a couple of other default options in Power Query because I want to control the way that data gets loaded from the OLTP database:

  • I’ve disabled auto date/time for new files. I’m going to define my own date table for the time intelligence capabilities in Power BI, so I don’t want Power BI making any assumptions about dates and times in the model
  • I’ve also enabled import relationships from data sources on first load. I know that the relationships in the underlying OLTP database are correct and won’t be changing, so I want to detect them and reproduce them in my data model wherever relevant

The Service Package dimension query

The Service Package query depends on the sp_ServicePackage database access query:

// sp_ServicePackage (data source query)

let
    q = Sql.Databases ( "SWITCH" )
        { [ Name = "Springwater" ] } [ Data ]
        { [ Schema = "sp", Item = "Service Package" ] } [ Data ],
    Result = if buffer then Table.Buffer ( q ) else q
in
    Result

I generally try to adapt and adopt sqlbi‘s excellent DAX code formatting rules for Power Query code as well (the two languages are actually surprisingly similar). 

Buffering queries

My SQL Server database is called ‘Springwater’ and hosted on a server called ‘SWITCH’. If you look carefully at the code, I’ve wrapped the query result in an if statement that is conditional on the value of the buffer parameter. If the value of buffer is true then I use the Power Query Table.Buffer function to load the Service Package table, otherwise I load the table directly, without buffering. Buffering a table can sometimes lead to improved performance, but this can be unpredictable and depends on a number of factors. In fact, in my production version of the Springwater model I’ve found that query refresh performance is better without buffering, so I’ve set the value of the buffer parameter to false:

// buffer (parameter)

// Controls whether or not SQL Server DB queries are buffered
false meta [
    IsParameterQuery = true, 
    Type = "Logical", 
    IsParameterQueryRequired = true
]

The diagram below shows an extract containing the first few rows of the sp_ServicePackage query:

power query connection to oltp database

The query returns all of the columns of the Service Package table in the underlying OLTP database. Because I’e set Power Query to import existing relationships from the source database, it also includes two further columns, one for the data in the related Service Package Type table, and one for the related Service Package Order Table. Because Service Package is on the many side of the one-to-many relationship with the Service Package Type table, there is only a single value in the sp.Service Package Type column. However, Service Package is on the one side or the one-to-many relationship with sp.Service Package Order, so the sp.Service Package Order column contains a full table of orders for each Service Package.

The final query

As I’m only building the Service Package dimension table here, I’m not interested in the contents of the sp.Service Package Order column, so I’ll discard that in the query code. The only other transformation to make is to ensure that any empty string values in the Notes and Service Package Description columns are converted to null values in the query result.

Here’s the final code listing for the Service Package dimension table query:

// Service Package (dimension query)

let
    Source = sp_ServicePackage,

    // Expand all required columns currently represented as related values
    ExpandColumns =
    Table.ExpandRecordColumn (
        Source, 
        "sp.Service Package Type", 
        { "Service Package Type" }, 
        { "Service Package Type" }
    ),
    
    // Remove columns not required in the final query
    RemoveColumns = 
    Table.RemoveColumns (
        ExpandColumns,
        { "ServicePackageTypeID", "sp.Service Package Order" }
    ),

    // Replace blank text with nulls
    Result = 
    Table.ReplaceValue (
        RemoveColumns,
        "", null,
        Replacer.ReplaceValue,
        { "Notes", "Service Package Description" }
    )
in
    Result

And here’s the first few rows of the final Service Package dimension query:

power query final version of dimension table

As you can see, this now includes a column containing the name of the Service Package Type for each Service Package.

In the next post I’ll explain the Revenue Type and Cost Type dimension queries and how they are built.

Leave a Reply