Run Scala Spark with SBT - scala

The code below causes Spark to become unresponsive:
System.setProperty("hadoop.home.dir", "H:\\winutils");
val sparkConf = new SparkConf().setAppName("GroupBy Test").setMaster("local[1]")
val sc = new SparkContext(sparkConf)
def main(args: Array[String]) {
val text_file = sc.textFile("h:\\data\\details.txt")
val counts = text_file
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
println(counts);
}
I'm setting hadoop.home.dir in order to avoid the error mentioned here: Failed to locate the winutils binary in the hadoop binary path
This is how my build.sbt file looks like:
lazy val root = (project in file(".")).
settings(
name := "hello",
version := "1.0",
scalaVersion := "2.11.0"
)
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "1.6.0"
)
Should Scala Spark be compilable/runnable using the sbt code in the file?
I think code is fine, it was taken verbatim from http://spark.apache.org/examples.html, but I am not sure if the Hadoop WinUtils path is required.
Update: "The solution was to use fork := true in the main build.sbt"
Here is the reference: Spark: ClassNotFoundException when running hello world example in scala 2.11

This is the content of my build.sbt. Notice that if your internet connection is slow it might take some time.
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1",
"org.apache.spark" %% "spark-mllib" % "1.6.1",
"org.apache.spark" %% "spark-sql" % "1.6.1",
"org.slf4j" % "slf4j-api" % "1.7.12"
)
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))
In the main I added this, however it depends on where you placed the winutil folder.
System.setProperty("hadoop.home.dir", "c:\\winutil")

Related

Caliban federation with scala 3

There is no caliban.federation for scala 3 yet.
My question is what is a correct way to use it along with scala 3 libraries?
For now I have such a dependencies in my build.sbt:
lazy val `bookings` =
project
.in(file("."))
.settings(
scalaVersion := "3.0.1",
name := "bookings"
)
.settings(commonSettings)
.settings(dependencies)
lazy val dependencies = Seq(
libraryDependencies ++= Seq(
"com.github.ghostdogpr" %% "caliban-zio-http" % "1.1.0"
),
libraryDependencies ++= Seq(
org.scalatest.scalatest,
org.scalatestplus.`scalacheck-1-15`,
).map(_ % Test),
libraryDependencies +=
("com.github.ghostdogpr" %% "caliban-federation" % "1.1.0")
.cross(CrossVersion.for3Use2_13)
But when I'm trying to build it, it's erroring:
[error] (update) Conflicting cross-version suffixes in:
dev.zio:zio-query,
org.scala-lang.modules:scala-collection-compat,
dev.zio:zio-stacktracer,
dev.zio:izumi-reflect,
com.github.ghostdogpr:caliban-macros,
dev.zio:izumi-reflect-thirdparty-boopickle-shaded,
dev.zio:zio,
com.github.ghostdogpr:caliban,
dev.zio:zio-streams

Error importing scala.tools.reflect.ToolBox in SBT

I am trying to compile the following code in SBT as part of a subproject.
package bitstream.compiler
package eval
import scala.reflect.runtime.universe._
import scala.reflect.runtime.currentMirror
import scala.tools.reflect.ToolBox
// Based on code from:
// https://gist.github.com/xuwei-k/9ba39fe22f120cb098f4
object Eval {
def apply[A](tree: Tree): A = {
val toolbox = currentMirror.mkToolBox()
toolbox.eval(tree).asInstanceOf[A]
}
}
Here is my build.sbt:
lazy val commonSettings = Seq(
organization := "com.bitbucket.example-project",
scalaVersion := "2.12.6"
)
lazy val root = (project in file("."))
.settings(
commonSettings,
version := "0.1.0-SNAPSHOT",
name := "example-project"
)
lazy val plugin = (project in file("plugin"))
.settings(
commonSettings,
scalacOptions += "-J-Xss256m",
name := "plugin",
libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value
)
.dependsOn(root)
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % Test
libraryDependencies += "org.scala-lang" % "scala-reflect" % scalaVersion.value
I try to compile the plugin subproject using plugin/package, and I get the error object tools is not a member of package scala. As far as I know, scala.tools should be provided by the scala-compiler dependency. Is there something I am missing?
scala.tools.reflect.ToolBox is in scala-compiler.jar. Try libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value. Sbt does not assume that you will use classes in scala-compiler.jar directly. - this is documented in https://www.scala-sbt.org/1.0/docs/Configuring-Scala.html

SBT, how to add unmanaged JARs to IntelliJ?

I have build.sbt file:
import sbt.Keys.libraryDependencies
lazy val scalatestVersion = "3.0.4"
lazy val scalaMockTestSupportVersion = "3.6.0"
lazy val typeSafeConfVersion = "1.3.2"
lazy val scalaLoggingVersion = "3.7.2"
lazy val logbackClassicVersion = "1.2.3"
lazy val commonSettings = Seq(
organization := "com.stulsoft",
version := "0.0.1",
scalaVersion := "2.12.4",
scalacOptions ++= Seq(
"-feature",
"-language:implicitConversions",
"-language:postfixOps"),
libraryDependencies ++= Seq(
"com.typesafe.scala-logging" %% "scala-logging" % scalaLoggingVersion,
"ch.qos.logback" % "logback-classic" % logbackClassicVersion,
"com.typesafe" % "config" % typeSafeConfVersion,
"org.scalatest" %% "scalatest" % scalatestVersion % "test",
"org.scalamock" %% "scalamock-scalatest-support" % scalaMockTestSupportVersion % "test"
)
)
unmanagedJars in Compile += file("lib/opencv-331.jar")
lazy val pimage = project.in(file("."))
.settings(commonSettings)
.settings(
name := "pimage"
)
parallelExecution in Test := true
It is working fine, if I use sbt run, but I cannot run from IntelliJ.
I receive error:
java.lang.UnsatisfiedLinkError: no opencv_java331 in java.library.path
I can add manually (File->Project Structure->Libraries->+ necessary dir).
My question is: is it possible to specify build.sbt that it will automatically create IntelliJ project with specified library?
I would say try to: drag and drop the dependency into the /lib which should be in the root directory of your project, if it's not there create it.
Run commands:
sbt reload
sbt update
Lastly you could try something like:
File -> Project Structure -> Modules -> then mark all the modules usually 1 to 3, delete them (don't worry won't delete your files) -> hit the green plus sign and select Import Module -> select root directory of your project and it should then refresh it
If none of these help, I'm out of ideas.

Can't import from CrossType.Pure sbt project in Scala

I'm trying to make Play framework project with Scala.js on frontend and one shared project. My sbt configuration is:
import sbt.Project.projectToRef
lazy val scalaV = "2.11.8"
lazy val shared = (crossProject.crossType(CrossType.Pure) in file("shared"))
.settings(
scalaVersion := scalaV,
libraryDependencies ++= Seq(
"com.mediamath" %%% "scala-json" % "1.0"
),
resolvers += "mmreleases" at "https://artifactory.mediamath.com/artifactory/libs-release-global",
addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)
)
// set up settings specific to the JS project
.jsConfigure(_ enablePlugins ScalaJSPlay)
lazy val sharedJVM = shared.jvm.settings(name := "sharedJVM")
lazy val sharedJS = shared.js.settings(name := "sharedJS")
lazy val root = (project in file(".")).settings(
scalaVersion := scalaV,
scalaJSProjects := jsProjects,
pipelineStages := Seq(scalaJSProd, gzip),
routesGenerator := InjectedRoutesGenerator,
scalikejdbcSettings,
libraryDependencies ++= Seq(
jdbc,
cache,
ws,
evolutions,
"org.scalatestplus.play" %% "scalatestplus-play" % "1.5.1" % Test,
"mysql" % "mysql-connector-java" % "5.1.39",
"com.vmunier" % "play-scalajs-scripts_2.11" % "0.5.0"
),
resolvers += "scalaz-bintray" at "http://dl.bintray.com/scalaz/releases"
).
enablePlugins(PlayScala).
aggregate(jsProjects.map(projectToRef): _*)
lazy val jsProjects = Seq(js)
lazy val js = (project in file("client")).settings(
scalaVersion := scalaV,
persistLauncher := true,
persistLauncher in Test := false,
autoCompilerPlugins := true,
scalacOptions ++= Seq("-unchecked", "-deprecation", "-feature"),
libraryDependencies ++= Seq(
"org.scala-js" %%% "scalajs-dom" % "0.9.0",
"com.mediamath" %%% "scala-json" % "1.0"
),
resolvers += "mmreleases" at "https://artifactory.mediamath.com/artifactory/libs-release-global",
resolvers += Resolver.sonatypeRepo("releases"),
addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)
).enablePlugins(ScalaJSPlugin, ScalaJSPlay)
Everything is working fine but the problem is: I can't import anything from shared project in Scala.js and Play Framework project. Here is how my shared project structure looks:
And here is how I'm trying to import it:
import services.Encryptor
At compile time I got error:
not found: object services [error] import services.Encryptor
How this issue can be fixed?
First of all, never ever (!) do this:
lazy val sharedJVM = shared.jvm.settings(name := "sharedJVM")
lazy val sharedJS = shared.js.settings(name := "sharedJS")
This creates new projects that are picked up by sbt, so the cross project does not hold the right projects anymore. See docs for details.
Instead, use jsSettings and jvmSettings:
(crossProject.crossType(CrossType.Pure) in file("shared"))
// snip
.jsSettings(name := "sharedJS")
.jvmSettings(name := "sharedJVM")
lazy val sharedJVM = shared.jvm
lazy val sharedJS = shared.js
In your build, it seems that your js project does not depend on the shared project. So if course the shared project's contents are not available.
You need to
lazy val js = (project in file("client"))
// snip
.dependsOn(shared.js)

How to add input files in sbt Scala

I am new to scala. I am using sbt assembly to create a fat jar. My program reads input files. I kept my files under src/main/resources folder.But I am getting java.io.FileNotFoundException
I dont know how to specify the path? I will delpoying the jar on the server.
Here is my sbt build file
lazy val commonSettings = Seq(
organization := "com.insnapinc",
version := "0.1.0",
scalaVersion := "2.11.4"
)
lazy val root = (project in file(".")).
settings(commonSettings: _*).
settings(
name := "memcache-client"
)
libraryDependencies ++= Seq (
"org.scalaj" %% "scalaj-http" % "1.1.4"
,"org.json4s" %% "json4s-native" % "3.2.10"
,"org.scalatest" % "scalatest_2.11" % "2.2.4" % "test"
)
/* assembly plugin */
mainClass in AssemblyKeys.assembly := Some("com.insnap.memcache.MemcacheTest")
assemblySettings
test in AssemblyKeys.assembly := {}