Svm with MLLib

Spark has a sophisticated machine learning offering in the form of mllib. Here, we’ll use mllib to do classification with an SVM.

Steps

Add in mllib in build.sbt:

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.4.0",
  "org.apache.spark" %% "spark-sql" % "1.4.0",
  "org.apache.spark" %% "spark-mllib" % "1.4.0"
)

Add the imports:

import org.apache.spark.SparkContext
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.mllib.util.MLUtils

Main skeleton:

object Main {
  def main(args: Array[String]){
val sc = new SparkContext("local[*]", "hello-spark")
sc.stop()
  }
}

Load the rdd:

val data = MLUtils.loadLibSVMFile(sc, "./data/svm/sample_libsvm_data.txt")

Split the data into training and test sets:

    val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
    val training = splits(0).cache()
    val test = splits(1)

Build the model:

val numIterations = 100
val model = SVMWithSGD.train(training, numIterations)

easy

Run the model on the test set:

    model.clearThreshold()
    
    val scoreAndLabels = test.map { point =>
      val score = model.predict(point.features)
      (score, point.label)
    }

Get result metrics:

val metrics = new BinaryClassificationMetrics(scoreAndLabels)
val auROC = metrics.areaUnderROC()

println("Area under ROC = " + auROC)

You can save the model to disk and reuse it later:

    model.save(sc, "./data/svm/model")
    val sameModel = SVMModel.load(sc, "./data/svm/model")

Svm with MLLib

Ashic Mahtab

Sunday, June 28, 2015

Steps