Course Code: gnplmdb
Duration: 28 hours
Prerequisites:
  • An understanding of RDBMS (Relation Database Management Systems)
Overview:

Greenplum Database is a database software for business intelligence and data warehousing. Users can run Greenplum Database for massive parallel data processing.

This instructor-led, live training (online or onsite) is aimed at administrators who wish to set up Greenplum Database for business intelligence and data warehousing solutions.

By the end of this training, participants will be able to:

  • Address processing needs with Greenplum.
  • Perform ETL operations for data processing.
  • Leverage existing query processing infrastructures.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.
Course Outline:

Module 1: Greenplum Overview

Topics: Greenplum History

Greenplum Database Architecture

Greenplum Components

Module 2: Distributed Data and query processing - Database Size 1TB and Hardware

Topics: Distributed Table architecture

Parallel Query plans and execution

Module 3: Roles, Access, Privileges and Resource Queues

Topics: Overview

Creating database

Connecting to a database

Creating Schema

Creating database users

Creating database groups

Creating database privileges and Access Control management

Configuring Client Authentication

Resource queues and workload management

Module 4: Working with Tables

Topics: Table partitioning

How to partition a table, Append only (AO Tables)

Module 5: Data Loading

Topics: External Tables

GPFDIST and PSQL Command Line

Data Loading Performance

Loading and Unloading of data

Executing scripts

Module 6: Postgresql PSQL basics

Topics: Inserts, Updates, Deletes, Select, Grouping Data

Transactions

Data locking overview

Module 7: Performance tuning

Topics: Performance tuning considerations

Common Causes

Hardware issues

Database statistics

Data Distribution

Explain Plans

Module 8: Database administration

Topics: Stopping and Starting a Database

Monitoring system state

Checking for data skew

Managing Database Objects

Checking for Disk space usage

Log Files

Vacuum

Analyze

Module 9: Database deployment approaches in ETL projects

Module 10: Backup and Recovery

Topics: Backing up data

Restoring data

Automating backups

Crash Recovery

Module 11: Database internals

Topics: System Catalog Tables

Database Processes

Building Greenplum version control tool with system catalog tables

Module 12: Database Analytics

Topics: Setting up Apache Zeppelin

Aggregating data

Assembling results

Using Apache Matlib