Artificial Intelligence – Infrastructure Architect

Course Code: aiinfrasarchbes

Duration: 14 hours

Prerequisites:

Students should have intermediate SQL and Python programming skills.

Overview:

Data Engineering and Architecture complement Machine Learning (ML) by allowing for the storage, pre-processing, and deployment of ML models. Although included within Data Science, Data Engineering and Architecture are often overlooked as critical ingredients in the successful development of predictive systems and platforms within companies. This course focuses on how companies can employ on-premises and cloud-based stacks to successfully implement data pipelines and deployment architectures for any predictive or prescriptive models they have developed. Along the way, participants will be able to practice these skills with hands-on exercises and learn from case studies of successful implementations of these technologies.

By the end of this training, participants will be able to:

Understand full end-to-end development of ML models
Apply the latest Deep Learning models for Time Series, Image Recognition, and NLP
Work with different types of data - unstructured, semi-structured, and structured
Understand how different types of data can be stored on-prem - Hadoop, InfluxDB, ElasticSearch, Neo4J, and Cassandra
Build and interact with a cloud-based data lake
Automate and monitor data pipelines
Develop proficiency in Spark, Airflow, and AWS/GCP tools
Additional Knowledge with Big Data and Hadoop
Deployment and versioning of ML models with TensorFlow-serving

Course Outline:

Day 1

Module 1: Machine Learning and Deep Learning (90 mins)

● Working with TensorFlow and Pytorch

● Applications of ML - Image recognition, NLP, Time Series, and more

● Toolset - Jupyter, Pandas, NumPy, Tensorflow, Pytorch, Scikit-learn, etc

Module 2: SparkSQL, DataFrames, and Datasets (90 mins)

● SparkSQL

● Executing SQL commands on a dataframe

● Using Dataframes instead of RDD’s

● Spark MLLib

Module 3: Data Lakes with Hadoop and Spark (90 mins)

● Introduction to Data Lakes

● The Power of Spark

● Data Wrangling with Spark

Module 4: Hands-on Exercises (90 min)

● SparkSQL

● SparkMLLib

● Deep Learning with Tensorflow 2.0 - CNN’s, LSTM’s, Transformers, and Autoencoders

Module 5: Automate Data Pipelines (90 mins)

● Data Pipelines

● Create data pipelines with Apache Airflow and Apache NiFi

● Data Quality

● Track Data Lineage

● Production Data Pipelines

Module 6: Unstructured Binary Data with Hadoop (90 mins)

● Veracity, variability, visualization, and value (4 V’s)

● HDFS and MapReduce in Hadoop

● Unstructured, semi-structured, and structured data

Module 7: Hands-on Exercises (90 min)

● Apache Airflow and Nifi - creating real-time and batch data pipelines

● Hadoop - HDFS and Map-Reduce

Day 2

Module 7: Structured Big Data with Cassandra (90 mins)

● Introduction to Cassandra

● Cassandra and CQL

● Data Modeling with NoSQL

Module 8: Time Series Data with InfluxDB (90 mins)

● The TICK stack

● Data modeling and Querying using InfluxDB - InfluxQL and Flux

● Visualizing time series data

Module 9: Graph-based Data with Neo4J (90 mins)

● Introduction to Neo4J

● Introduction to Data modeling and querying with graph databases - Cypher for Neo4J

● EDA, Recommendations, and Predictions with Neo4J

● Similarity metrics

Module 10: ML Deployment Infrastructure (90 mins)

● On-Premises deployment with Flask, Pickle, and Tensorflow/Pytorch

● Cloud deployment with Heroku, AWS and GCP (Tensorflow serving)

● Versioning and logging ML models in production

Module 11: Case Studies (90 mins)

● Time series prediction with stock market data - using LSTM’s

● Unstructured data Image recognition - using CNN’s

● Anomaly detection - using Autoencoders

● Text Classification and Captioning - using CNN’s and RNN’s

Module 12: Hands-on Exercises (90 mins)

● Cassandra CQL

● InfluxDB InfluxQL and Flux

● Neo4J Cypher

● Tensorflow serving