How to write unit tests for Spark Streaming programs?

How to write unit tests for Spark Streaming programs? - scala

I'm relatively new to using Spark Streaming. I've been searching for the best way to write unit tests for my Spark application and came across the TestSuiteBase trait.
However, I'm unable to extend this trait in my test suite.
Here's the code snippet releant to this issue:
...
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming._
import org.apache.spark.streaming.TestSuiteBase
...
...
class UnitTest extends BaseTest with TestSuiteBase
...
However, I hit this error when running sbt test:
.... object TestSuiteBase is not a member of package org.apache.spark.streaming
[error] import org.apache.spark.streaming.TestSuiteBase
Also, are there any better approaches to writing unit tests for Spark Streaming programs?
Any help would be appreciated.
Thanks in advance.

I modified my "build.sbt" to contain:
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.4.0" classifier "tests"
This will include the test jars for spark streaming, which contains TestSuiteBase

Related

Scalatest problem : value FunSuite is not a member of org.scalatest

I'm trying to write a test for my project in Scala 3. I added Scalatest library as :
libraryDependencies ++= Seq(
....
"org.scalatest" %% "scalatest" % "3.2.9" % Test
)
I know my structure is right:
But it gives me error:
value FunSuite is not a member of org.scalatest - did you mean scalatest.funsuite?
However, I used the same in another project and it works fine.

Thanks to #Johney
The correct usage is as below (at least in scala 3.0.2):
import org.scalatest.funsuite.*
class TestParser extends AnyFunSuite {
}
Of course the tutorials such as Getting started with FunSuite are based using import org.scalatest.FunSuite but the right examples is here which also referred as a first mention of FunSuite in Getting started with FunSuite .

How come Scalatest can be used via Junit in Scala

In the build.sbt of an Example Assignment in a Scala course, the test library used is junit 4.10. There is no mention of Scalatest.
libraryDependencies += "junit" % "junit" % "4.10" % Test
And yet in a test class scalatest could be referenced and the tests could be written using actual scalatest syntax:
import org.scalatest.FunSuite
import org.junit.runner.RunWith
import org.scalatest.junit.JUnitRunner
#RunWith(classOf[JUnitRunner])
class ListsSuite extends FunSuite with Matchers {
...etc...
}
Question: I suppose the Scala compiler accesses scalatest via the junit library. If so, what is the reason to embed scalatest in junit?

I checked the assignment project from the referred course and it actually has quite complicated structure using deprecated way of defining a build in project/*.scala files, extending Build.
The answer to your question is simple though: scalatest dependency is defined in project/CommonBuild.scala and added to the build in project/StudentBuildLike.scala. So there is no magic, you're using normal scalatest and a custom test runner (see project/ScalaTestRunner.scala).

Spark write to S3 bucket giving java.lang.NoClassDefFoundError

I'm trying to integrate Spark 2.3.0 running on my Mac with S3. I can read/write to S3 without any problem using spark-shell. But when I try to do the same using a little Scala program that I run via sbt, I get
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/GlobalStorageStatistics$StorageStatisticsProvider.
I have installed hadoop-aws 3.0.0-beta1.
I have also set s3 access information in spark-2.3.0/conf/spark-defaults.conf:
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.access.key XXXX
spark.hadoop.fs.s3a.secret.key YYYY
spark.hadoop.com.amazonaws.services.s3.enableV4 true
spark.hadoop.fs.s3a.endpoint s3.us-east-2.amazonaws.com
spark.hadoop.fs.s3a.fast.upload true
spark.hadoop.fs.s3a.encryption.enabled true
spark.hadoop.fs.s3a.server-side-encryption-algorithm AES256
The program compiles fine using sbt version 0.13.
name := "S3Test"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.0.0-beta1"
The scala code is:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import com.amazonaws._
import com.amazonaws.auth ._
import com.amazonaws.services.s3 ._
import com.amazonaws. services.s3.model ._
import java.io._
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.s3a.S3AFileSystem
object S3Test {
def main(args: Array[String]) = {
val spark = SparkSession.builder().master("local").appName("Spark AWS S3 example").getOrCreate()
import spark.implicits._
val df = spark.read.text("test.txt")
df.take(5)
df.write.save(<s3 bucket>)
}
}
I have set environment variables for JAVA_HOME, HADOOP_HOME, SPARK_HOME, CLASSPATH, SPARK_DIST_CLASSPATH, etc. But nothing lets me get past this error message.

You can't mix hadoop-* JARs, they all need to be in perfect sync. Which means: cut all the hadoop 2.7 artifacts & replace them.
FWIW, there isn't a significant enough difference between Hadoop 2.8 & Hadoop 3.0-beta-1 in terms of aws support, other than the s3guard DDB directory service (performance & listing through dynamo DB), that unless you need that feature, Hadoop 2.8 is going to be adequate.

How to find the missing Akka Dependencies?

I write a Scala project managed by SBT and I want to use TestKit package (http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-testkit.html#TestKit).
In order to do it, I have to do the following imports:
import system.dispatcher
import akka.pattern.pipe
The problem is that I have to update build.sbt file in order to "bring" the needed dependencies. I tried to add the following dependency:
"com.typesafe.akka" %% "akka-http-testkit" % 2.4.2
and I still got an error for the lines:
import system.dispatcher
and
val probe = TestProbe()
Can you give me a hint on how to find the needed dependency given an import line?

Try to add:
"com.typesafe.akka" %% "akka-http-testkit" % 2.4.2 % Test
akka-http-teskit is a test library, so you should be adding it to the Test scope.

Can't import scala.reflect.runtime.universe

I'd like to play around with reflection in scala (2.10.2) by following the example in
this tutorial. things work fine when I start the sbt (version 0.13) and import
scala.refelct.runtime.universe._
scala> import scala.reflect.runtime.universe._ │~
import scala.reflect.runtime.universe._
but when I try to put the sample code to an object like
object ReflectExample {
import scala.reflect.runtime.universe._
/*
the rest of the Example
*/
}
and compile the code by sbt compile I see the following error message like:
[error] object runtime is not a member of package reflect
[error] import scala.reflect.runtime.universe._

As explained in sbt's documentation, you need to add this line to the libraryDependencies field of your project in build.sbt:
"org.scala-lang" % "scala-reflect" % scalaVersion.value

You may want to try adding dependency to http://mvnrepository.com/artifact/org.scala-lang/scala-reflect

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to write unit tests for Spark Streaming programs? - scala

I modified my "build.sbt" to contain: libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.4.0" classifier "tests" This will include the test jars for spark streaming, which contains TestSuiteBase

Related

Scalatest problem : value FunSuite is not a member of org.scalatest

How come Scalatest can be used via Junit in Scala

Spark write to S3 bucket giving java.lang.NoClassDefFoundError

How to find the missing Akka Dependencies?

Can't import scala.reflect.runtime.universe

Categories

Resources