- An understanding of RDBMS (Relation Database Management Systems)
Greenplum Database is a database software for business intelligence and data warehousing. Users can run Greenplum Database for massive parallel data processing.
This instructor-led, live training (online or onsite) is aimed at administrators who wish to set up Greenplum Database for business intelligence and data warehousing solutions.
By the end of this training, participants will be able to:
- Address processing needs with Greenplum.
- Perform ETL operations for data processing.
- Leverage existing query processing infrastructures.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Module 1: Greenplum Overview
Topics: Greenplum History
Greenplum Database Architecture
Greenplum Components
Module 2: Distributed Data and query processing - Database Size 1TB and Hardware
Topics: Distributed Table architecture
Parallel Query plans and execution
Module 3: Roles, Access, Privileges and Resource Queues
Topics: Overview
Creating database
Connecting to a database
Creating Schema
Creating database users
Creating database groups
Creating database privileges and Access Control management
Configuring Client Authentication
Resource queues and workload management
Module 4: Working with Tables
Topics: Table partitioning
How to partition a table, Append only (AO Tables)
Module 5: Data Loading
Topics: External Tables
GPFDIST and PSQL Command Line
Data Loading Performance
Loading and Unloading of data
Executing scripts
Module 6: Postgresql PSQL basics
Topics: Inserts, Updates, Deletes, Select, Grouping Data
Transactions
Data locking overview
Module 7: Performance tuning
Topics: Performance tuning considerations
Common Causes
Hardware issues
Database statistics
Data Distribution
Explain Plans
Module 8: Database administration
Topics: Stopping and Starting a Database
Monitoring system state
Checking for data skew
Managing Database Objects
Checking for Disk space usage
Log Files
Vacuum
Analyze
Module 9: Database deployment approaches in ETL projects
Module 10: Backup and Recovery
Topics: Backing up data
Restoring data
Automating backups
Crash Recovery
Module 11: Database internals
Topics: System Catalog Tables
Database Processes
Building Greenplum version control tool with system catalog tables
Module 12: Database Analytics
Topics: Setting up Apache Zeppelin
Aggregating data
Assembling results
Using Apache Matlib