Course Code: azuredata1
Duration: 35 hours
Course Outline:

Day 1

Azure Active Directory

  • Azure Active Directory overview
  • Create Azure users and groups in Azure Active Directory
  • Manage access to an Azure subscription by using Azure role-based access control (Azure RBAC)
  • Secure your Azure resources with Azure role-based access control (Azure RBAC)
  • Control and organize Azure resources with Azure Resource Manager

Large-Scale Data Processing with Azure Data Lake Storage Gen2

  • Getting Started with Azure Data Lake Store Gen2
  • The Road to Azure Data Lake Store Gen2
  • Architecture and Features of ADLS Gen2
  • Lab: Creating an Azure Data Lake Store Gen2 with Portal
  • Lab: Creating and Deleting an Azure Data Lake Store Gen 2 with PowerShell

Managing Data with Azure Data Lake Store Gen2

  • Ingesting Data and Securing It
  • Ingesting Data to ADLS Gen2 from ADLS Gen1 Using ADF
  • Using the Azure Data Lake Store REST AP
  • Moving Data from Blobs Using distcp with ABFS
  • Copying or Moving Data to Azure Data Lake Store Gen2 with AzCopy
  • Secure your Azure Storage account
  • Azure Storage security
  • Azure Data Lake Storage security

Azure Data Explorer

  • Getting Started with Azure Data Explorer: Overview and Architecture
  • What Is Azure Data Explorer and Why Should I Use It?
  • ADX Key Characteristics and Use Cases
  • ADX Architecture, Components, and Scalability
  • ADX Security
  • An Azure Data Explorer Lab

Understanding and Creating Azure Data Explorer Infrastructure

  • Creating a Cluster
  • Managing Cluster Scaling
  • Creating a Database4m
  • Managing Database Permissions
  • The Azure Data Explorer Web UI

Ingesting Data in Azure Data Explorer

  • Ingesting Data in Azure Explorer
  • Ingesting Sample Data
  • Loading Data Using One-click Ingestion
  • Ingesting Data from a Folder or Blob Container with LightIngest
  • Data Ingestion with Azure Data Factory
  • Ingesting Data Using the Python SDK
  • Ingesting JSON Formatted Data

Querying Data in Azure Data Explorer

  • Getting to Know the Kusto Query Language (KQL)
  • Querying Azure Data Explorer, and the Sample Database
  • Getting Started with Kusto Control Commands
  • The Basics of KQL - Most Commonly Used Operators
  • Advanced KQL
  • Querying External Tables
  • Exporting Data

Visualizing Data in Azure Data Explorer

  • Visualizing the Results of a Query with the Render Operator
  • Data Visualization Using the Azure Data Explorer Dashboard
  • Visualizing Data Using Power BI

Monitoring in Azure Data Explorer

  • Using Metrics to Monitor Cluster Health
  • Use Resource Health to Monitor Cluster Health
  • Troubleshooting

Day 2

Azure Data Factory-1

Understanding Azure Data Factory and Its Interface

  • What Is Azure Data Factory?
  • Data Factory within the Microsoft Ecosystem
  • Main Data Factory Elements
  • Preparing the Environment
  • Installing Azure Data Factory

Using Azure Data Factory for ETL Operations

  • Integration Runtimes
  • Additional ADF Elements
  • Runtimes, Activities, and Triggers
  • Parameters and Variables
  • Working with Data Flows

Using Azure Data Factory for Orchestration

  • Monitoring on ADF
  • Lab: ADF Monitoring
  • ADF Orchestration
  • Lab: Orchestration on ADF

Mapping Data Flows Definition

  • What Are Mapping Data Flows?
  • The Adventure Works Case Study
  • Setting up the Course Prerequisites
  • The Source Transformation
  • Source Transformation Settings
  • Working with the Source Transformation
  • The Sink Transformation
  • Data Flow Debugging
  • Working with Sinks

Simple Mapping Data Flow Operations

  • The Sort Transformation
  • Working with Sorts
  • The Filter Transformation
  • Working with Filters
  • Derived Columns
  • Working with Derived Columns
  • The Select Transformation
  • Working with Selects

Working with Multiple Data Streams

  • The Lookup Transformation
  • Integration Runtimes and Data Flows
  • Working with Lookups
  • The Conditional Split Transformation
  • Working with Conditional Splits
  • The Exists Transformation
  • The Union Transformation
  • The Join Transformation
  • The New Branch Transformation

Additional Data Flow Operations

  • The Aggregate Transformation
  • The Rank Transformation
  • The Surrogate Key Transformation
  • The Alter Row Transformation
  • Working with Alter Row
  • The Window Transformation
  • The Parse Transformation
  • The Flatten Transformation
  • The Pivot Transformation
  • Working with Pivots
  • The Unpivot Transformation

Day 3

Azure Data Factory -2

Migrating SSIS Packages to Azure Data Factory

  • Introduction
  • Why Migrate SSIS Packages to Data Factory?
  • Does Azure Data Factory Replace SSIS?
  • Prerequisites
  • What Is Azure Data Factory?
  • How Azure Data Factory Works?
  • Workflow Changes for a Developer
  • SSIS Migration Levels
  • Lab: Setting up Azure Data Factory
  • Integration Runtime
  • Lab: Creating Integration Runtime
  • Lab: Deploying SSIS Packages to Azure SQL Database
  • Azure SQL Database vs. Managed Instance
  • Common Concerns for Migrating to Data Factory

Running SSIS Packages in Azure Data Factory

  • Introduction
  • Lab: Run SSIS Packages with Stored Procedure Activity
  • Lab: Execute SSIS Package Activity
  • Lab: Customize Azure SSIS Integration Runtime
  • Lab: Using Parameters in Data Factory
  • Lab: Using System Variables in Data Factory
  • Executing Packages with On-premises Data
  • Lab: Execute Packages with On-premises Data
  • Lab: Override SSIS Package Properties

Securing data in Azure Data Factory

  • Introduction
  • Lab: Using SQL DB with Virtual Network Service Endpoints
  • Lab: Effect of Removing 'Allow Azure Services to Access Server
  • Lab: Using Service Principal Authentication
  • Lab: Storing Passwords in Key Vault
  • Lab: Using Managed Identity

Scheduling SSIS Packages in Azure Data Factory

  • Introduction
  • Triggers in Data Factory
  • Lab: Schedule trigger
  • Lab: Event based trigger
  • Schedule Trigger vs. Tumbling Window Trigger
  • Lab: Tumbling Window Trigger
  • Lab: Schedule Packages in SSMS
  • Lab: Web Activity to Schedule Integration Runtime
  • Lab: Azure Automation to Schedule Integration Runtime

Monitoring Azure Data Factory Pipelines

  • Introduction
  • Lab: Overview of Data Factory Monitoring
  • Lab: Creating Alerts in Data Factory
  • Lab: Configuring User Properties
  • Lab: Restrict Data Logged to Monitoring

Day 4

Azure Databricks  -1

Implementing an Azure Databricks Environment

  • Introduction to Azure Databricks
  • Fundamentals of Azure Databricks
  • Creating an Azure Databricks Workspace
  • Getting Started with the Databricks CLI
  • Azure Spark Clusters
  • Working with Notebook
  • Azure Databricks Tables
  • Apache Spark Jobs
  • Configuring Security

Performing ETL (Extract, Transform, Load) Operations with Azure Databricks

  • Overview
  • Basics of Extract, Transform, and Load (ETL) Process
  • Scenario: Working with Audience Information
  • Lab - Ingesting and Extracting Data in Azure Databricks
  • Lab - Transforming Data in Azure Databricks
  • Lab - Loading Data in Azure Databricks

Streaming HDInsight Kafka Data into Azure Databricks

  • Overview
  • Apache Kafka on Azure HDInsight
  • Lab: Building a HDInsight Kafka Cluster
  • Lab: Configuring Kafka for IP Advertising
  • Lab: Create a Kafka topic
  • Lab: Building and Configuring an Azure Databricks Cluster
  • Virtual Network Peering
  • Azure Databricks and Streaming Data
  • Producing Events and Consuming Data with Azure Databricks Notebooks

Extracting Data from Multiple Sources

  • Overview
  • Extracting from Azure Storage Services
  • Reading Multiple File Formats
  • Applying Schemas

Transforming and Cleaning Data

  • Overview
  • Understanding Common Transformations
  • Analyzing and Cleaning Data
  • Applying Transformations
  • Working with Spark SQL
  • Handling Corrupt Data

Loading Data

  • Overview
  • Loading to Files
  • Working with Databricks Tables

Orchestrating ETL Pipeline

  • Overview
  • Setting up Workflow
  • Scheduling with Databricks Jobs
  • Orchestrating with Azure Data Factory

Building Better Pipelines on Databricks

  • Module Overview
  • Using Databricks APIs
  • Understanding Delta Lake

Day 5

Azure Databricks  -2

Handling Streaming Data with Azure Databricks Using Spark Structured Streaming

  • Quick Recap: Spark Structured Streaming
  • Configuring Azure Event Hubs as Source
  • Setup Sample App to Send NYC Taxi Events

Building Streaming Pipeline

  • Extracting and Processing Source Data
  • Applying Transformations
  • Loading to Files
  • Understanding Checkpointing and Delivery Guarantees
  • Loading to Azure Event Hub
  • Loading to Azure SQL Database

Handling Stateful Operations

  • Understanding State Management
  • Handling Late Data Using Watermarking
  • Deduplicating Streaming Data

Working with Multiple Streams and Datasets

  • Joining Stream with Static Data
  • Combining Multiple Streams
  • Handling State in Stream-Stream Joins

Running Streaming Pipeline in Production

  • Parameterize Streaming Pipeline
  • Scheduling with Databricks Jobs
  • Manage Environment Using Databricks CLI

Azure Synapse Analytics

  • Understanding Microsoft Azure Synapse Analytics
  • Introduction
  • Understanding Azure Synapse Analytics
  • Knowing When to Use Synapse Analytics
  • Understanding Massively Parallel Processing
  • Implementing Data Distribution for an SQL Data Warehouse
  • Implementing Partitions for an SQL Data Warehouse

Deploying a Data Warehouse in Microsoft Azure Synapse Analytics

  • Introduction
  • Deploying an SQL Pool in Azure Synapse Analytics with the Azure Portal
  • Setting a Firewall Rule and Connecting to an SQL Data Warehouse
  • Preparing an SQL Pool in Azure Synapse Analytics to Load Data
  • Loading NYC Taxi Data into an SQL Pool in Azure Synapse Analytics
  • Examining Configuration Options for an SQL Pool in Azure Synapse Analytics
  • Performing Common Tasks with Azure Synapse Analytics in the Portal

Tuning and Optimizing a Data Warehouse in Microsoft Azure Synapse Analytics

  • Introduction
  • Performing a Backup in Azure Synapse Analytics2m
  • Performing a Restore in Azure Synapse Analytics4m
  • Managing Costs in Azure Synapse Analytics5m
  • Managing Workloads in Azure Synapse Analytics5m
  • Securing an Azure SQL Data Warehouse4m
  • Implementing Azure Synapse Analytics Monitoring4m
  • Deleting an Azure Synapse Analytics SQL Pool

Deploying the Modern Data Warehouse Environment

  • Introduction
  • Modern Data Warehouses
  • Lab: Creating Azure SQL Database
  • Lab: Creating Azure SQL Data Warehouse
  • Lab: Creating an Azure Data Factory
  • Lab: Loading Data into Azure SQL Database

Implementing a Data Warehouse Build and Release Pipeline using Azure DevOps

  • Introduction
  • What Is Continuous Integration and Deployment?
  • Lab: Creating ARM and Database Templates
  • Lab: Creating an Azure DevOps Pipeline

Managing Hybrid Azure SQL Data Warehouse Solutions

  • Introduction
  • Lab: Exploring the Azure Database Migration Guide
  • Lab: Creating a New Azure Migrate Project
  • Lab: Using the Azure Data Migration Assistant
  • Lab: Setting up Azure SQL Data Sync

Secure a data warehouse in Azure Synapse Analytics

  • Understand network security options for Azure Synapse Analytics
  • Configure conditional access.
  • Configure authentication.
  • Manage authorization through column and row level security.
  • Lab - Manage authorization through column and row level security.
  • Manage sensitive data with Dynamic Data Masking
  • Implement encryption in Azure Synapse Analytics