Course Code: aiinfrasarchbes
Duration: 14 hours
Prerequisites:
  • Students should have intermediate SQL and Python programming skills.  
Overview:

Data Engineering and Architecture complement Machine Learning (ML) by allowing for  the storage, pre-processing, and deployment of ML models. Although included within  Data Science, Data Engineering and Architecture are often overlooked as critical  ingredients in the successful development of predictive systems and platforms within companies. This course focuses on how companies can employ on-premises and  cloud-based stacks to successfully implement data pipelines and deployment  architectures for any predictive or prescriptive models they have developed. Along the  way, participants will be able to practice these skills with hands-on exercises and learn  from case studies of successful implementations of these technologies.

By the end of this training, participants will be able to:

  • Understand full end-to-end development of ML models
  • Apply the latest Deep Learning models for Time Series, Image Recognition, and NLP
  • Work with different types of data - unstructured, semi-structured, and structured
  • Understand how different types of data can be stored on-prem - Hadoop,  InfluxDB, ElasticSearch, Neo4J, and Cassandra
  • Build and interact with a cloud-based data lake
  • Automate and monitor data pipelines
  • Develop proficiency in Spark, Airflow, and AWS/GCP tools
  • Additional Knowledge with Big Data and Hadoop
  • Deployment and versioning of ML models with TensorFlow-serving  
Course Outline:

Day 1    

Module 1: Machine Learning and Deep Learning (90 mins)  

● Working with TensorFlow and Pytorch  

● Applications of ML - Image recognition, NLP, Time Series, and more  

● Toolset - Jupyter, Pandas, NumPy, Tensorflow, Pytorch, Scikit-learn, etc    

Module 2: SparkSQL, DataFrames, and Datasets (90 mins)

● SparkSQL  

● Executing SQL commands on a dataframe  

● Using Dataframes instead of RDD’s  

● Spark MLLib    

Module 3: Data Lakes with Hadoop and Spark (90 mins)  

● Introduction to Data Lakes  

● The Power of Spark  

● Data Wrangling with Spark    

Module 4: Hands-on Exercises (90 min)  

● SparkSQL  

● SparkMLLib  

● Deep Learning with Tensorflow 2.0 - CNN’s, LSTM’s, Transformers, and  Autoencoders    

Module 5: Automate Data Pipelines (90 mins)  

● Data Pipelines  

● Create data pipelines with Apache Airflow and Apache NiFi  

● Data Quality  

● Track Data Lineage  

● Production Data Pipelines  

Module 6: Unstructured Binary Data with Hadoop (90 mins)  

● Veracity, variability, visualization, and value (4 V’s)  

● HDFS and MapReduce in Hadoop  

● Unstructured, semi-structured, and structured data    

Module 7: Hands-on Exercises (90 min)  

● Apache Airflow and Nifi - creating real-time and batch data pipelines  

● Hadoop - HDFS and Map-Reduce
 


Day 2
 
Module 7: Structured Big Data with Cassandra (90 mins)  

● Introduction to Cassandra  

● Cassandra and CQL  

● Data Modeling with NoSQL    

Module 8: Time Series Data with InfluxDB (90 mins)  

● The TICK stack  

● Data modeling and Querying using InfluxDB - InfluxQL and Flux  

● Visualizing time series data    

Module 9: Graph-based Data with Neo4J (90 mins)  

● Introduction to Neo4J  

● Introduction to Data modeling and querying with graph databases - Cypher for  Neo4J  

● EDA, Recommendations, and Predictions with Neo4J  

● Similarity metrics    

Module 10: ML Deployment Infrastructure (90 mins)  

● On-Premises deployment with Flask, Pickle, and Tensorflow/Pytorch  

● Cloud deployment with Heroku, AWS and GCP (Tensorflow serving)  

● Versioning and logging ML models in production    

Module 11: Case Studies (90 mins)  

● Time series prediction with stock market data - using LSTM’s  

● Unstructured data Image recognition - using CNN’s  

● Anomaly detection - using Autoencoders  

● Text Classification and Captioning - using CNN’s and RNN’s

Module 12: Hands-on Exercises (90 mins)  

● Cassandra CQL  

● InfluxDB InfluxQL and Flux  

● Neo4J Cypher  

● Tensorflow serving