Data Engineering Integration for Developers

Course Code: datainteg

Duration: 21 hours

Prerequisites:

Developer Tool for Big Data Developers

Overview:

This course is for software 10.5. Learn to speed up Data Engineering Integration with large ingestion, incremental loading, transformations, complicated file processing, dynamic mappings, and Python. Examine how to reuse application logic for Data Engineering use cases while monitoring, troubleshooting, and best practices.

Objectives

After successfully completing this course, students should be able to:

Mass ingest data to Hive and HDFS
Perform incremental loads in Mass Ingestion
Perform initial and incremental loads
Integrate with relational databases using SQOOP
Perform transformations across various engines
Execute a mapping using JDBC in Spark mode
Perform stateful computing and windowing
Process complex files
Parse hierarchical data on Spark engine
Run profiles and choose sampling options on Spark engine
Execute Dynamic Mappings
Create Audits on Mappings
Monitor logs using REST Operations Hub
Monitor logs using Log Aggregation and troubleshoot
Run mappings in Databricks environment
Create mappings to access Delta Lake tables
Tune performances of Spark and Databricks jobs

Course Outline:

Module 1: Informatica Data Engineering Management Overview

Data Engineering concepts
Data Engineering Management features
Benefits of Data Engineering Management
Data Engineering Management architecture
Data Engineering Management developer tasks
Data Engineering Integration 10.4 new features

Module 2: Ingestion and Extraction in Hadoop

Integrating DEI with Hadoop cluster
Hadoop file systems
Data Ingestion to HDFS and Hive using SQOOP
Mass Ingestion to HDFS and Hive – Initial load
Mass Ingestion to HDFS and Hive - Incremental load
Lab: Configure SQOOP for Processing Data Between Oracle (SQOOP) to HDFS
Lab: Configure SQOOP for processing data between an Oracle database and Hive
Lab: Creating Mapping Specifications using Mass Ingestion Service

Module 3: Native and Hadoop Engine Strategy

Data Engineering Integration engine strategy
Hive Engine architecture
MapReduce
Tez
Spark architecture
Blaze architecture
Lab: Executing a mapping in Spark mode
Lab: Connecting to a Deployed Application

Module 4: Data Engineering Development Process

Advanced Transformations in Data Engineering Integration Python and Update Strategy
Hive ACID Use Case
Stateful Computing and Windowing
Lab: Creating a Reusable Python Transformation
Lab: Creating an Active Python Transformation
Lab: Performing Hive Upserts
Lab: Using Windowing Function LEAD
Lab: Using Windowing Function LAG
Lab: Creating a Macro Transformation