Course Code: spmllib
Duration: 35 hours
Prerequisites:

Knowledge of one of the following:

  • Java
  • Scala
  • Python
  • SparkR.
Overview:

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

It divides into two packages:

  • spark.mllib contains the original API built on top of RDDs.

  • spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

Audience

This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark

Course Outline:

spark.mllib: data types, algorithms, and utilities

  • Data types
  • Basic statistics
    • summary statistics
    • correlations
    • stratified sampling
    • hypothesis testing
    • streaming significance testing
    • random data generation
  • Classification and regression
    • linear models (SVMs, logistic regression, linear regression)
    • naive Bayes
    • decision trees
    • ensembles of trees (Random Forests and Gradient-Boosted Trees)
    • isotonic regression
  • Collaborative filtering
    • alternating least squares (ALS)
  • Clustering
    • k-means
    • Gaussian mixture
    • power iteration clustering (PIC)
    • latent Dirichlet allocation (LDA)
    • bisecting k-means
    • streaming k-means
  • Dimensionality reduction
    • singular value decomposition (SVD)
    • principal component analysis (PCA)
  • Feature extraction and transformation
  • Frequent pattern mining
    • FP-growth
    • association rules
    • PrefixSpan
  • Evaluation metrics
  • PMML model export
  • Optimization (developer)
    • stochastic gradient descent
    • limited-memory BFGS (L-BFGS)

spark.ml: high-level APIs for ML pipelines

  • Overview: estimators, transformers and pipelines
  • Extracting, transforming and selecting features
  • Classification and regression
  • Clustering
  • Advanced topics
Sites Published:

United Arab Emirates - Apache Spark MLlib

Qatar - Apache Spark MLlib

Egypt - Apache Spark MLlib

Saudi Arabia - Apache Spark MLlib

South Africa - Apache Spark MLlib

Brasil - Apache Spark MLlib

Canada - Apache Spark MLlib

中国 - Apache Spark MLlib

香港 - Apache Spark MLlib

澳門 - Apache Spark MLlib

台灣 - Apache Spark MLlib

USA - Apache Spark MLlib

Österreich - Apache Spark MLlib

Schweiz - Apache Spark MLlib

Deutschland - Apache Spark MLlib

Czech Republic - Apache Spark MLlib

Denmark - Apache Spark MLlib

Estonia - Apache Spark MLlib

Finland - Apache Spark MLlib

Greece - Apache Spark MLlib

Magyarország - Apache Spark MLlib

Ireland - Apache Spark MLlib

Luxembourg - Apache Spark MLlib

Latvia - Apache Spark MLlib

España - Apache Spark MLlib

Italia - Apache Spark MLlib

Lithuania - Apache Spark MLlib

Nederland - Apache Spark MLlib

Norway - Apache Spark MLlib

Portugal - Apache Spark MLlib

România - Apache Spark MLlib

Sverige - Apache Spark MLlib

Türkiye - Apache Spark MLlib

Malta - Apache Spark MLlib

Belgique - Apache Spark MLlib

France - Apache Spark MLlib

日本 - Apache Spark MLlib

Australia - Apache Spark MLlib

Malaysia - Apache Spark MLlib

New Zealand - Apache Spark MLlib

Philippines - Apache Spark MLlib

Singapore - Apache Spark MLlib

Thailand - Apache Spark MLlib

Vietnam - Apache Spark MLlib

India - Apache Spark MLlib

Argentina - Apache Spark MLlib

Chile - Apache Spark MLlib

Costa Rica - Apache Spark MLlib

Ecuador - Apache Spark MLlib

Guatemala - Apache Spark MLlib

Colombia - Apache Spark MLlib

México - Apache Spark MLlib

Panama - Apache Spark MLlib

Peru - Apache Spark MLlib

Uruguay - Apache Spark MLlib

Venezuela - Apache Spark MLlib

Polska - Apache Spark MLlib

United Kingdom - Apache Spark MLlib

South Korea - Apache Spark MLlib

Pakistan - Apache Spark MLlib

Sri Lanka - Apache Spark MLlib

Bulgaria - Apache Spark MLlib

Bolivia - Apache Spark MLlib

Indonesia - Apache Spark MLlib

Kazakhstan - Apache Spark MLlib

Moldova - Apache Spark MLlib

Morocco - Apache Spark MLlib

Tunisia - Apache Spark MLlib

Kuwait - Apache Spark MLlib

Oman - Apache Spark MLlib

Slovakia - Apache Spark MLlib

Kenya - Apache Spark MLlib

Nigeria - Apache Spark MLlib

Botswana - Apache Spark MLlib

Slovenia - Apache Spark MLlib

Croatia - Apache Spark MLlib

Serbia - Apache Spark MLlib

Bhutan - Apache Spark MLlib

Nepal - Apache Spark MLlib

Uzbekistan - Apache Spark MLlib