Course Code: bspdatapyth
Duration: 14 hours
Prerequisites:
  • Basic Python and data analysis skills

Audience

  • Python developer
  • Data analysts
Overview:

Python is a versatile programming language known for its simplicity and readability. Pandas is a Python package that provides data structures for working with structured (tabular, multidimensional, potentially heterogeneous) and time series data. Numpy provides fundamental support for numerical computing with its array operations. Together, they form a robust ecosystem for efficient data handling and analysis in Python.

This instructor-led, live training (online or onsite) is aimed at intermediate-level Python developers and data analysts who wish to enhance their skills in data analysis and manipulation using Pandas and NumPy.

By the end of this training, participants will be able to:

  • Set up a development environment that includes Python, Pandas, and NumPy.
  • Create a data analysis application using Pandas and NumPy.
  • Perform advanced data wrangling, sorting, and filtering operations.
  • Conduct aggregate operations and analyze time series data.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.
Course Outline:

Day 1:

Basic Python and Data Analysis Skills Review

Introduction to NumPy

  • Creating NumPy arrays
  • Common operations on matrices
  • Using ufuncs
  • Views and broadcasting on NumPy arrays
  • Optimizing performance by avoiding loops
  • Optimizing performance with cProfile

Data Analysis with Pandas

  • Using vectorized data in pandas
  • Data wrangling
  • Sorting and filtering data
  • Aggregate operations
  • Analyzing time series

MySQL Database visualized with Tableau

Day 2: 

Other Python Libraries for Data Analysis

  • scikit-learn
  • Scipy
  • statsmodel
  • RPy2

Summary and Next Steps