Challenge 02 - Data Lake Integration

< Previous Challenge - Home - Next Challenge >

Introduction

WWI importers realize they need to further modernize their data warehouse and wants to proceed to the second stage. They are starting to reach capacity constraints on their data warehouse and need to offload data files from the relational database. Likewise, they are receiving more data in json and csv file formats. They’ve been discussing re-engineering their data warehouse to accommodate larger data sets, semi-structured data and real-time ingestion of data. They would like to conduct a POC on the Data Lake and see how to best to design it for integration into the Data Warehouse.

For this challenge, WWI wants us to build out the data lake and show how to load data into the lake from an on-premise data source.

Description

The objective of this challenge is to build a Data Lake with Synapse Analytics or Azure Data Lake Store (ADLS) Gen 2. The Data Lake will be a staging area where all our source system data files reside. We need to ensure this Data Lake is well organized and doesn’t turn into a swamp. This challenge will help us organize the folder structure and setup security to prevent unauthorized access. Lastly, we will extract data from the WWI OLTP platform and store it in the Data Lake.

The OLTP platform is on-premise so you will need to build a hybrid architecture to integrate it into Azure. Keep in mind that the pipeline that you build will become the EXTRACT portion of the new E-L-T process. The first requirement is to build a functional POC that is able to move a single dataset to the new ADLS Gen 2 data lake. Ideally, it would be nice to make the process table driven so that new pipelines do not need to be created for each additional table that needs to be copied. (Optional, sharing to give insights on end-state.)

The Solution diagram is described in the text following this diagram.

Setup Source Database for Incremental Load

Prior to starting this challenge, you should ensure that there are changes in the City data captured from Wide World Importers OLTP Database. Execute the scripts below to insert/change data in the source, and update necessary configuration values.

Success Criteria

Learning Resources

Tips

Advanced Challenges (Optional)

Too comfortable? Eager to do more? Try these additional challenges!