Hi i am trying to set up a small spark application in SBT,
My build.sbt is
import Dependencies._
name := "hello"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion = "1.6.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-streaming-twitter" % sparkVersion
)
libraryDependencies += scalaTest % Test
Everything works fine i get all dependencies resolved by SBT, but when i try importing spark in my hello.scala project file i get this error
not found: value spark
my hello.scala file is
package example
import org.apache.spark._
import org.apache.spark.SparkContext._
object Hello extends fileImport with App {
println(greeting)
anime.select("*").orderBy($"rating".desc).limit(10).show()
}
trait fileImport {
lazy val greeting: String = "hello"
var anime = spark.read.option("header", true).csv("C:/anime.csv")
var ratings = spark.read.option("header", true).csv("C:/rating.csv")
}
here is error file i get
[info] Compiling 1 Scala source to C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\target\scala-2.11\classes...
[error] C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\src\main\scala\example\Hello.scala:12: not found: value spark
[error] var anime = spark.read.option("header", true).csv("C:/anime.csv")
[error] ^
[error] C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\src\main\scala\example\Hello.scala:13: not found: value spark
[error] var ratings = spark.read.option("header", true).csv("C:/rating.csv")
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed Sep 10, 2017 1:44:47 PM
spark is initialized in spark-shell only
but for the code you need to initialize the spark variable by yourself
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("testings").master("local").getOrCreate
you can change the testings name to your desired name .master option is optional if you want to run the code using spark-submit
Related
I am trying to comile and run simple h2o scala code. But when I do sbt package I get errors.
Am I missing something in the sbt file
This is my h2o scala code
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import ai.h2o.automl.AutoML
import ai.h2o.automl.AutoMLBuildSpec
import org.apache.spark.h2o._
object H2oScalaEg1 {
def main(args: Array[String]): Unit = {
val sparkConf1 = new SparkConf().setMaster("local[2]").setAppName("H2oScalaEg1App")
val sparkSession1 = SparkSession.builder.config(conf = sparkConf1).getOrCreate()
val h2oContext = H2OContext.getOrCreate(sparkSession1.sparkContext)
import h2oContext._
import java.io.File
import h2oContext.implicits._
import water.Key
}
}
And this is my sbt file.
name := "H2oScalaEg1Name"
version := "1.0"
scalaVersion := "2.11.12"
scalaSource in Compile := baseDirectory.value / ""
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.3"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0"
libraryDependencies += "ai.h2o" % "h2o-core" % "3.22.1.3" % "runtime" pomOnly()
When I do sbt package I get these errors
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:7:8: not found: object ai
[error] import ai.h2o.automl.AutoML
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:8:8: not found: object ai
[error] import ai.h2o.automl.AutoMLBuildSpec
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:10:25: object h2o is not a member of package org.apache.spark
[error] import org.apache.spark.h2o._
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:20:20: not found: value H2OContext
[error] val h2oContext = H2OContext.getOrCreate(sparkSession1.sparkContext)
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:28:10: not found: value water
[error] import water.Key
[error] ^
[error] 5 errors found
How can I fix this problem.
My spark version in spark-2.2.3-bin-hadoop2.7
Thanks,
marrel
pomOnly() in build.sbt indicates to the dependency management handlers that jar libs/artifacts for this dependency should not be loaded and to only look for the metadata.
Try to use libraryDependencies += "ai.h2o" % "h2o-core" % "3.22.1.3" instead.
Edit 1: Additionally I think you are missing (at least) one library dependency:
libraryDependencies += "ai.h2o" % "h2o-automl" % "3.22.1.3"
see: https://search.maven.org/artifact/ai.h2o/h2o-automl/3.22.1.5/pom
Edit 2:
The last dependency you are missing is sparkling-water-core:
libraryDependencies += "ai.h2o" % "sparkling-water-core_2.11" % "2.4.6" should do the trick.
Here is the github of sparkling-water/core/src/main/scala/org/apache/spark/h2o
.
I'm trying to use the scala finch library to build an API.
I have the following simple code:
package example
import io.finch._
import com.twitter.finagle.Http
object HelloWorld extends App {
val api: Endpoint[String] = get("hello") { Ok("Hello, World!") }
Http.serve(":8080", api.toService)
}
And a build.sbt file that looks like this:
name := "hello-finch"
version := "1.0"
scalaVersion := "2.10.6"
mainClass in (Compile, run) := Some("example.HelloWorld")
libraryDependencies ++= Seq(
"com.github.finagle" %% "finch-core" % "0.10.0"
)
// found here: https://github.com/finagle/finch/issues/604
addCompilerPlugin(
"org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full
)
When I compile and run the code I get this error message:
object Http is not a member of package com.twitter.finagle
[error] import com.twitter.finagle.Http
[error] ^
[error] /Users/jamesk/Code/hello_finch/hello-finch/src/main/scala/example/Hello.scala:8: wrong number of type arguments for io.finch.Endpoint, should be 2
[error] val api: Endpoint[String] = get("hello") { Ok("Hello, World!") }
[error] ^
[error] /Users/jamesk/Code/hello_finch/hello-finch/src/main/scala/example/Hello.scala:8: not found: value get
[error] val api: Endpoint[String] = get("hello") { Ok("Hello, World!") }
[error] ^
[error] /Users/jamesk/Code/hello_finch/hello-finch/src/main/scala/example/Hello.scala:10: not found: value Http
[error] Http.serve(":8080", api.toService)
[error] ^
[error] four errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 1 s, completed Aug 15, 2017 12:56:01 PM
At this point I'm running out of ideas, it looks like a good library but it's a pain getting it working. Any help would be very much appreciated.
I have updated your example to work with the last version of Finch: "com.github.finagle" %% "finch-core" % "0.15.1" and also Scala 2.12
the build.sbt file:
name := "hello-finch"
version := "1.0"
scalaVersion := "2.12.2"
mainClass in (Compile, run) := Some("example.HelloWorld")
libraryDependencies ++= Seq(
"com.github.finagle" %% "finch-core" % "0.15.1"
)
then, the src/main/scala/example/HelloWorld.scala file:
package example
import io.finch._
import com.twitter.finagle.Http
import com.twitter.util.Await
object HelloWorld extends App {
val api: Endpoint[String] = get("hello") { Ok("Hello, World!") }
Await.ready(Http.server.serve(":8080", api.toServiceAs[Text.Plain]))
}
Notice also that having Await.ready() is mandatory - your program would exit right away otherwise.
I have a small project.
Where I have the following problem:
scalaTest needs to be added to all three dependency project (client, server, shared), otherwise the scalatest library is not accessible from all projects.
In other words, if I write
val jvmDependencies = Def.setting(Seq(
"org.scalaz" %% "scalaz-core" % "7.2.8"
)++scalaTest)
then things work fine.
But if I don't write ++scalaTest into each three dependencies then it fails like this:
> test
[info] Compiling 1 Scala source to /Users/joco/tmp3/server/target/scala-2.11/test-classes...
[error] /Users/joco/tmp3/server/src/test/scala/Test.scala:1: object specs2 is not a member of package org
[error] import org.specs2.mutable.Specification
[error] ^
[error] /Users/joco/tmp3/server/src/test/scala/Test.scala:3: not found: type Specification
[error] class Test extends Specification {
[error] ^
[error] /Users/joco/tmp3/server/src/test/scala/Test.scala:5: value should is not a member of String
[error] "Test" should {
[error] ^
[error] /Users/joco/tmp3/server/src/test/scala/Test.scala:6: value in is not a member of String
[error] "one is one" in {
[error] ^
[error] /Users/joco/tmp3/server/src/test/scala/Test.scala:8: value === is not a member of Int
[error] 1===one
[error] ^
[error] 5 errors found
[error] (server/test:compileIncremental) Compilation failed
[error] Total time: 4 s, completed Mar 18, 2017 1:56:54 PM
However for production(not test) code everything works just fine: I don't have to add 3 times the same dependencies (in this example autowire) to all three projects if I want to use a library in all three projects, it is enough to add it to only the shared project and then I can use that library from all three projects.
For test code, however, as I mentioned above, currently I have to add the same library dependency (scalaTest - below) to all three projects.
Question: Is there a way to avoid this ?
Settings.scala:
import org.scalajs.sbtplugin.ScalaJSPlugin.autoImport._
import sbt.Keys._
import sbt._
object Settings {
val scalacOptions = Seq(
"-Xlint",
"-unchecked",
"-deprecation",
"-feature",
"-Yrangepos"
)
object versions {
val scala = "2.11.8"
}
val scalaTest=Seq(
"org.scalatest" %% "scalatest" % "3.0.1" % "test",
"org.specs2" %% "specs2" % "3.7" % "test")
val sharedDependencies = Def.setting(Seq(
"com.lihaoyi" %%% "autowire" % "0.2.6"
)++scalaTest)
val jvmDependencies = Def.setting(Seq(
"org.scalaz" %% "scalaz-core" % "7.2.8"
))
/** Dependencies only used by the JS project (note the use of %%% instead of %%) */
val scalajsDependencies = Def.setting(Seq(
"org.scala-js" %%% "scalajs-dom" % "0.9.1"
)++scalaTest)
}
build.sbt:
import sbt.Keys._
import sbt.Project.projectToRef
import webscalajs.SourceMappings
lazy val shared = (crossProject.crossType(CrossType.Pure) in file("shared")) .settings(
scalaVersion := Settings.versions.scala,
libraryDependencies ++= Settings.sharedDependencies.value,
addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)
) .jsConfigure(_ enablePlugins ScalaJSWeb)
lazy val sharedJVM = shared.jvm.settings(name := "sharedJVM")
lazy val sharedJS = shared.js.settings(name := "sharedJS")
lazy val elideOptions = settingKey[Seq[String]]("Set limit for elidable functions")
lazy val client: Project = (project in file("client"))
.settings(
scalaVersion := Settings.versions.scala,
scalacOptions ++= Settings.scalacOptions,
libraryDependencies ++= Settings.scalajsDependencies.value,
testFrameworks += new TestFramework("utest.runner.Framework")
)
.enablePlugins(ScalaJSPlugin)
.disablePlugins(RevolverPlugin)
.dependsOn(sharedJS)
lazy val clients = Seq(client)
lazy val server = (project in file("server")) .settings(
scalaVersion := Settings.versions.scala,
scalacOptions ++= Settings.scalacOptions,
libraryDependencies ++= Settings.jvmDependencies.value
)
.enablePlugins(SbtLess,SbtWeb)
.aggregate(clients.map(projectToRef): _*)
.dependsOn(sharedJVM)
onLoad in Global := (Command.process("project server", _: State)) compose (onLoad in Global).value
fork in run := true
cancelable in Global := true
For test code, however, as I mentioned above, currently I have to add the same library dependency (scalaTest - below) to all three projects.
That is expected. test dependencies are not inherited along dependency chains. That makes sense, because you don't want to depend on JUnit just because you depend on a library that happens to be tested using JUnit.
Although yes, that calls for a bit of duplication when you have several projects in the same build, all using the same testing framework. This is why we often find some commonSettings that are added to all projects of an sbt build. This is also where we typically put things like organization, scalaVersion, and many other settings that usually apply to all projects inside one build.
I'm getting the following error when compiling the following toy class:
package com.example
import scala.tools.reflect.ToolBox
import scala.reflect.runtime.{currentMirror => m}
object Hello {
def main(args: Array[String]): Unit = {
println("Hello, world!")
}
}
[info] Loading project definition from /Users/me/Temp/Bar/project
[info] Set current project to Bar (in build file:/Users/me/Temp/Bar/)
[info] Compiling 1 Scala source to /Users/me/Temp/Bar/target/scala-2.11/classes...
[error] /Users/me/Temp/Bar/src/main/scala/com/example/Hello.scala:3: object tools is not a member of package scala
[error] import scala.tools.reflect.ToolBox
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
This is my build.sbt file:
name := """Bar"""
version := "1.0"
scalaVersion := "2.11.8"
// Change this to another test framework if you prefer
libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.4" % "test"
libraryDependencies += "org.scala-lang" % "scala-reflect" % "2.11.8"
// Uncomment to use Akka
//libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.3.11"
The following dependency fixed the issue:
libraryDependencies += "org.scala-lang" % "scala-compiler" % "2.11.8"
Is this the best solution?
The ToolBox class is part of the compiler, not the public reflection API.
I have a Scala code like below :-
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
object RecipeIO {
val sc = new SparkContext(new SparkConf().setAppName("Recipe_Extraction"))
def read(INPUT_PATH: String): org.apache.spark.rdd.RDD[(String)]= {
val data = sc.wholeTextFiles("INPUT_PATH")
val files = data.map { case (filename, content) => filename}
(files)
}
}
When I compile this code using sbt it gives me the error :
value wholeTextFiles is not a member of org.apache.spark.SparkContext.
I am importing all of which is required but it's still giving me this errror.
But when I compile this code by replacing wholeTextFiles with textFile, the code gets compiled.
What might be the problem here and how do I resolve that?
Thanks in advance!
Environment:
Scala compiler version 2.10.2
spark-1.2.0
Error:
[info] Set current project to RecipeIO (in build file:/home/akshat/RecipeIO/)
[info] Compiling 1 Scala source to /home/akshat/RecipeIO/target/scala-2.10.4/classes...
[error] /home/akshat/RecipeIO/src/main/scala/RecipeIO.scala:14: value wholeTexFiles is not a member of org.apache.spark.SparkContext
[error] val data = sc.wholeTexFiles(INPUT_PATH)
[error] ^
[error] one error found
[error] {file:/home/akshat/RecipeIO/}default-55aff3/compile:compile: Compilation failed
[error] Total time: 16 s, completed Jun 15, 2015 11:07:04 PM
My build.sbt file looks like this :
name := "RecipeIO"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"
libraryDependencies += "org.eclipse.jetty" % "jetty-server" % "8.1.2.v20120308"
ivyXML :=
<dependency org="org.eclipse.jetty.orbit" name="javax.servlet" rev="3.0.0.v201112011016">
<artifact name="javax.servlet" type="orbit" ext="jar"/>
</dependency>
You have a typo: it should be wholeTextFiles instead of wholeTexFiles.
As a side note, I think you want sc.wholeTextFiles(INPUT_PATH) and not sc.wholeTextFiles("INPUT_PATH") if you really want to use the INPUT_PATH variable.