- A general understanding of math
- A general understanding of programming
- A general understanding of databases
Participants who complete this course will gain a solid understanding of Big Data and its related technologies, methodologies and tools.
Participants will have the opportunity to put their new-gained knowledge into practice by way of exercises and tests. Group interaction and instructor feedback make up an important component of the class.
The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools that enable Big Data storage, Distributed Processing, and scalability.
Audience
- Developers / programmers
- IT consultants
Format of the course
-
Part lecture, part discussion, hands-on practice
DAY 01
Data Analysis and Big Data concepts - a brief review
- VVVV (Velocity, Volume, Variety, Veracity) definition
- Limits to traditional data processing capacity
- Distributed Processing
- Statistical Analysis
- Machine Learning Analysis Types
- Data Visualization
- Distributed Processing (e.g. map-reduce)
Languages used for Data Analysis
- R language (intermediate-to-advanced)
DAY 02
Languages used for Data Analysis
- Python (crash course)
DAY 03
Approaches to Data Analysis
- Statistical Analysis
- Time Series analysis
- Forecasting with Correlation and Regression models
- Inferential Statistics (estimating)
- Descriptive Statistics in Big Data sets (e.g. calculating mean)
DAY 04
Approaches to Data Analysis
- Machine Learning
- Supervised vs unsupervised learning
- Classification and clustering
- Estimating cost of specific methods
- Filter
DAY 05
Approaches to Data Analysis
- Natural Language Processing
- Processing text
- Understaing meaning of the text
- Automatic text generation
- Sentiment/Topic Analysis
- Computer Vision
DAY 06
Big Data tooling
- Data storage solution (SQL, NoSQL, hierarchical, object oriented, document oriented)
- MySQL, Cassandra, MongoDB, Elasticsearch, HDFS, etc.
- Choosing right solution to the problem
DAY 07
Big Data tooling
- Distributed Processing
- Spark
- Machine Learning with Spark (MLLib)
- Spark SQL
DAY 08
Big Data tooling
- Scalability
- Public cloud (AWS, Google, etc...)
- Private cloud (OpenStack, Cloud Foundry)
- Auto-scalability
Closing remarks