Knowledge of one of the following:
- Java
- Scala
- Python
- SparkR.
MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.
It divides into two packages:
-
spark.mllib contains the original API built on top of RDDs.
-
spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.
Audience
This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark
spark.mllib: data types, algorithms, and utilities
- Data types
- Basic statistics
- summary statistics
- correlations
- stratified sampling
- hypothesis testing
- streaming significance testing
- random data generation
- Classification and regression
- linear models (SVMs, logistic regression, linear regression)
- naive Bayes
- decision trees
- ensembles of trees (Random Forests and Gradient-Boosted Trees)
- isotonic regression
- Collaborative filtering
- alternating least squares (ALS)
- Clustering
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- bisecting k-means
- streaming k-means
- Dimensionality reduction
- singular value decomposition (SVD)
- principal component analysis (PCA)
- Feature extraction and transformation
- Frequent pattern mining
- FP-growth
- association rules
- PrefixSpan
- Evaluation metrics
- PMML model export
- Optimization (developer)
- stochastic gradient descent
- limited-memory BFGS (L-BFGS)
spark.ml: high-level APIs for ML pipelines
- Overview: estimators, transformers and pipelines
- Extracting, transforming and selecting features
- Classification and regression
- Clustering
- Advanced topics
United Arab Emirates - Apache Spark MLlib
Saudi Arabia - Apache Spark MLlib
South Africa - Apache Spark MLlib
Österreich - Apache Spark MLlib
Deutschland - Apache Spark MLlib
Czech Republic - Apache Spark MLlib
Magyarország - Apache Spark MLlib
Luxembourg - Apache Spark MLlib
Lithuania - Apache Spark MLlib
Nederland - Apache Spark MLlib
Australia - Apache Spark MLlib
New Zealand - Apache Spark MLlib
Philippines - Apache Spark MLlib
Singapore - Apache Spark MLlib
Argentina - Apache Spark MLlib
Costa Rica - Apache Spark MLlib
Guatemala - Apache Spark MLlib
Venezuela - Apache Spark MLlib
United Kingdom - Apache Spark MLlib
South Korea - Apache Spark MLlib
Sri Lanka - Apache Spark MLlib
Indonesia - Apache Spark MLlib