What The Hack - This Old Data Warehouse

Introduction

Modern Data Warehouse is a key upgrade motion for organizations to scale out their on-premise analytical workloads to the cloud. This hack will help data engineers and administrators upgrade their skills to migrate to Azure Synapse Analytics. The hack will be the sequential migration steps required to migrate from on-premise to Synapse Analytics and re-platform attached workloads like ETL and reporting.

The solution will require us to migrate the on-premise data warehouse to Azure Synapse Analytics. In the first challenge, you will want to source data from the WWI OLTP sample database to Azure Synapse Analytics. Data will be loaded from source to target thru the DailyETL packages (SSIS) found in the WWI importers sample database. Secondly, a data lake (Azure Data Lake Store) will be built out to be your staging area for future challenges to stage data prior to loading it into Azure Synapse. Third, the SSIS packages in Challenge 1 will be refactored into Azure Data Factory jobs to optimize the data loads and leverage the Data Lake as a staging area. Next, clickstream data will be streamed into the Data Lake using Azure Event Hub. This data will be stored in the lake and interactively queried via Azure Databricks. Lastly, Power BI data model will be built out with a set of reports to streamline the performance. You will work on how to optimize them via report design, data model tuning and Azure Synapse optimizaitons.

Below is a diagram of the solution architecture you will build in this hack. Please study this carefully, so you understand the whole of the solution as you are working on the various components.

The Solution diagram is described in the text following this diagram.

Learning Objectives

In this hack, data engineers will learn how to migrate their platform to the cloud (data, schema and code). Additionally, they need to build out Data Warehouse architectures that can scale for large data volumes, different data structures and real-time ingestions. Azure Synapse Analytics provides all these capabilities as an integrated platform and we’ll help you better understand how to refactor your existing data warehouse to Azure.

  1. Modern Data Warehouse Architecture
  2. Azure Synapse Decision Tree
  3. Refactor T-SQL code to be compatible with Synapse
  4. ETL/ELT design patterns and how to build them with ADF + ADLS
  5. Setup a streaming data pipeline with Kafka
  6. Tune Synapse for analytical workloads and design report for best performance
  7. Setup Data Governance in Syanpase
  8. Build Enterprise Security into your data warehouse

Challenges

Prerequisites

Technologies

Azure services and related products

Learning Path for Modern Data Warehouse

Contributors