Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.
Introduction to Data Science for Big Data Analytics
- Data Science Overview
- Big Data Overview
- Data Structures
- Drivers and complexities of Big Data
- Big Data ecosystem and a new approach to analytics
- Key technologies in Big Data
- Data Mining process and problems
- Association Pattern Mining
- Data Clustering
- Outlier Detection
- Data Classification
Introduction to Data Analytics lifecycle
- Discovery
- Data preparation
- Model planning
- Model building
- Presentation/Communication of results
- Operationalization
- Exercise: Case study
From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology.
Getting started with R
- Installing R and Rstudio
- Features of R language
- Objects in R
- Data in R
- Data manipulation
- Big data issues
- Exercises
Getting started with Hadoop
- Installing Hadoop
- Understanding Hadoop modes
- HDFS
- MapReduce architecture
- Hadoop related projects overview
- Writing programs in Hadoop MapReduce
- Exercises
Integrating R and Hadoop with RHadoop
- Components of RHadoop
- Installing RHadoop and connecting with Hadoop
- The architecture of RHadoop
- Hadoop streaming with R
- Data analytics problem solving with RHadoop
- Exercises
Pre-processing and preparing data
- Data preparation steps
- Feature extraction
- Data cleaning
- Data integration and transformation
- Data reduction – sampling, feature subset selection,
- Dimensionality reduction
- Discretization and binning
- Exercises and Case study
Exploratory data analytic methods in R
- Descriptive statistics
- Exploratory data analysis
- Visualization – preliminary steps
- Visualizing single variable
- Examining multiple variables
- Statistical methods for evaluation
- Hypothesis testing
- Exercises and Case study
Data Visualizations
- Basic visualizations in R
- Packages for data visualization ggplot2, lattice, plotly, lattice
- Formatting plots in R
- Advanced graphs
- Exercises
Regression (Estimating future values)
- Linear regression
- Use cases
- Model description
- Diagnostics
- Problems with linear regression
- Shrinkage methods, ridge regression, the lasso
- Generalizations and nonlinearity
- Regression splines
- Local polynomial regression
- Generalized additive models
- Regression with RHadoop
- Exercises and Case study
Classification
- The classification related problems
- Bayesian refresher
- Naïve Bayes
- Logistic regression
- K-nearest neighbors
- Decision trees algorithm
- Neural networks
- Support vector machines
- Diagnostics of classifiers
- Comparison of classification methods
- Scalable classification algorithms
- Exercises and Case study
Assessing model performance and selection
- Bias, Variance and model complexity
- Accuracy vs Interpretability
- Evaluating classifiers
- Measures of model/algorithm performance
- Hold-out method of validation
- Cross-validation
- Tuning machine learning algorithms with caret package
- Visualizing model performance with Profit ROC and Lift curves
Ensemble Methods
- Bagging
- Random Forests
- Boosting
- Gradient boosting
- Exercises and Case study
Support vector machines for classification and regression
- Maximal Margin classifiers
- Support vector classifiers
- Support vector machines
- SVM’s for classification problems
- SVM’s for regression problems
- Exercises and Case study
Identifying unknown groupings within a data set
- Feature Selection for Clustering
- Representative based algorithms: k-means, k-medoids
- Hierarchical algorithms: agglomerative and divisive methods
- Probabilistic base algorithms: EM
- Density based algorithms: DBSCAN, DENCLUE
- Cluster validation
- Advanced clustering concepts
- Clustering with RHadoop
- Exercises and Case study
Discovering connections with Link Analysis
- Link analysis concepts
- Metrics for analyzing networks
- The Pagerank algorithm
- Hyperlink-Induced Topic Search
- Link Prediction
- Exercises and Case study
Association Pattern Mining
- Frequent Pattern Mining Model
- Scalability issues in frequent pattern mining
- Brute Force algorithms
- Apriori algorithm
- The FP growth approach
- Evaluation of Candidate Rules
- Applications of Association Rules
- Validation and Testing
- Diagnostics
- Association rules with R and Hadoop
- Exercises and Case study
Constructing recommendation engines
- Understanding recommender systems
- Data mining techniques used in recommender systems
- Recommender systems with recommenderlab package
- Evaluating the recommender systems
- Recommendations with RHadoop
- Exercise: Building recommendation engine
Text analysis
- Text analysis steps
- Collecting raw text
- Bag of words
- Term Frequency –Inverse Document Frequency
- Determining Sentiments
- Exercises and Case study
United Arab Emirates - Data Science for Big Data Analytics
Qatar - Data Science for Big Data Analytics
Egypt - Data Science for Big Data Analytics
Saudi Arabia - Data Science for Big Data Analytics
South Africa - Data Science for Big Data Analytics
Brasil - Data Science for Big Data Analytics
Canada - Data Science for Big Data Analytics
中国 - Data Science for Big Data Analytics
香港 - Data Science for Big Data Analytics
澳門 - Data Science for Big Data Analytics
台灣 - Data Science for Big Data Analytics
USA - Data Science for Big Data Analytics
Österreich - Data Science for Big Data Analytics
Schweiz - Data Science for Big Data Analytics
Deutschland - Data Science for Big Data Analytics
Czech Republic - Data Science for Big Data Analytics
Denmark - Data Science for Big Data Analytics
Estonia - Data Science for Big Data Analytics
Finland - Data Science for Big Data Analytics
Greece - Data Science for Big Data Analytics
Magyarország - Data Science for Big Data Analytics
Ireland - Data Science for Big Data Analytics
Luxembourg - Data Science for Big Data Analytics
Latvia - Data Science for Big Data Analytics
España - Ciencia de Datos para Big Data Analytics
Italia - Data Science for Big Data Analytics
Lithuania - Data Science for Big Data Analytics
Nederland - Data Science for Big Data Analytics
Norway - Data Science for Big Data Analytics
Portugal - Data Science for Big Data Analytics
România - Data Science for Big Data Analytics
Sverige - Data Science for Big Data Analytics
Türkiye - Data Science for Big Data Analytics
Malta - Data Science for Big Data Analytics
Belgique - Data Science for Big Data Analytics
France - Data Science for Big Data Analytics
日本 - Data Science for Big Data Analytics
Australia - Data Science for Big Data Analytics
Malaysia - Data Science for Big Data Analytics
New Zealand - Data Science for Big Data Analytics
Philippines - Data Science for Big Data Analytics
Singapore - Data Science for Big Data Analytics
Thailand - Data Science for Big Data Analytics
Vietnam - Data Science for Big Data Analytics
India - Data Science for Big Data Analytics
Argentina - Ciencia de Datos para Big Data Analytics
Chile - Ciencia de Datos para Big Data Analytics
Costa Rica - Ciencia de Datos para Big Data Analytics
Ecuador - Ciencia de Datos para Big Data Analytics
Guatemala - Ciencia de Datos para Big Data Analytics
Colombia - Ciencia de Datos para Big Data Analytics
México - Ciencia de Datos para Big Data Analytics
Panama - Ciencia de Datos para Big Data Analytics
Peru - Ciencia de Datos para Big Data Analytics
Uruguay - Ciencia de Datos para Big Data Analytics
Venezuela - Ciencia de Datos para Big Data Analytics
Polska - Data Science for Big Data Analytics
United Kingdom - Data Science for Big Data Analytics
South Korea - Data Science for Big Data Analytics
Pakistan - Data Science for Big Data Analytics
Sri Lanka - Data Science for Big Data Analytics
Bulgaria - Data Science for Big Data Analytics
Bolivia - Ciencia de Datos para Big Data Analytics
Indonesia - Data Science for Big Data Analytics
Kazakhstan - Data Science for Big Data Analytics
Moldova - Data Science for Big Data Analytics
Morocco - Data Science for Big Data Analytics
Tunisia - Data Science for Big Data Analytics
Kuwait - Data Science for Big Data Analytics
Oman - Data Science for Big Data Analytics
Slovakia - Data Science for Big Data Analytics
Kenya - Data Science for Big Data Analytics
Nigeria - Data Science for Big Data Analytics
Botswana - Data Science for Big Data Analytics
Slovenia - Data Science for Big Data Analytics
Croatia - Data Science for Big Data Analytics
Serbia - Data Science for Big Data Analytics
Bhutan - Data Science for Big Data Analytics