';' expected but 'import' found - Scala and Spark - scala

I'm trying to work with Spark and Scala, compiling a standalone application. I don't know why I'm getting this error:
topicModel.scala:2: ';' expected but 'import' found.
[error] import org.apache.spark.mllib.clustering.LDA
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
This is the build.sbt code:
name := "topicModel"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1"
libraryDependencies += "org.apache.spark" %% "spark-graphx" % "1.3.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.3.1"
And those are the imports:
import scala.collection.mutable
import org.apache.spark.mllib.clustering.LDA
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import org.apache.spark.rdd.RDD
object Simple {
def main(args: Array[String]) {

This could be because your file has old Macintosh line endings (\r)?
See Why do I need semicolons after these imports? for more details.

Related

sbt assembly failing with error: object spark is not a member of package org.apache even though spark-core and spark-sql libraries are included

I am attempting to use sbt assembly on a spark project. sbt compile and package work but when I attempt sbt assembly I get the following error:
object spark is not a member of package org.apache
I have included the spark core and spark sql libraries and have sbt-assembly in my plugins file. Why is assembly producing these errors?
build.sbt:
name := "redis-record-loader"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.1"
val scalatestVersion = "3.0.3"
val scalatest = "org.scalatest" %% "scalatest" % scalatestVersion
libraryDependencies ++=
Seq(
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.347",
"com.typesafe" % "config" % "1.3.1",
"net.debasishg" %% "redisclient" % "3.0",
"org.slf4j" % "slf4j-log4j12" % "1.7.12",
"org.apache.commons" % "commons-lang3" % "3.0" % "test,it",
"org.apache.hadoop" % "hadoop-aws" % "2.8.1" % Provided,
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.mockito" % "mockito-core" % "2.21.0" % Test,
scalatest
)
val integrationTestsKey = "it"
val integrationTestLibs = scalatest % integrationTestsKey
lazy val IntegrationTestConfig = config(integrationTestsKey) extend Test
lazy val root = project.in(file("."))
.configs(IntegrationTestConfig)
.settings(inConfig(IntegrationTestConfig)(Defaults.testSettings): _*)
.settings(libraryDependencies ++= Seq(integrationTestLibs))
test in assembly := Seq(
(test in Test).value,
(test in IntegrationTestConfig).value
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
plugins.sbt:
logLevel := Level.Warn
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
full error message:
/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:11: object spark is not a member of package org.apache
[error] import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:26: not found: type SparkSession
[error] implicit val spark: SparkSession = SparkSession.builder
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:26: not found: value SparkSession
[error] implicit val spark: SparkSession = SparkSession.builder
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:51: not found: type DataFrame
[error] val testDataframe0: DataFrame = testData0.toDF()
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:51: value toDF is not a member of Seq[(String, String)]
[error] val testDataframe0: DataFrame = testData0.toDF()
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:52: not found: type DataFrame
[error] val testDataframe1: DataFrame = testData1.toDF()
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:52: value toDF is not a member of Seq[(String, String)]
[error] val testDataframe1: DataFrame = testData1.toDF()
[error] ^
[error] missing or invalid dependency detected while loading class file 'RedisRecordLoader.class'.
[error] Could not access term spark in package org.apache,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'RedisRecordLoader.class' was compiled against an incompatible version of org.apache.
[error] missing or invalid dependency detected while loading class file 'RedisRecordLoader.class'.
[error] Could not access type SparkSession in value org.apache.sql,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'RedisRecordLoader.class' was compiled against an incompatible version of org.apache.sql.
[error] 9 errors found
Cant comment on that I can say "I doubt the AWS SDK & hadoop-aws versions are going to work". You need the exact version of hadoop-aws to match the hadoop-common JAR on your CP, (it's all one project which releases in sync, after all), and the aws SDK Version built against was 1.10. The AWS SDK has a habit of (a) breaking APIs on every point release (b) aggressively pushing new versions of jackson down, even when they are incompatible and (c) causing regressions in the hadoop-aws code.
If you really want to work with S3A, best to go for hadoop-2.9, which pulls in a shaded 1.11.x version

Spark-Kafka invalid dependency detected

I have a basic Spark - Kafka code, I try to run following code:
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel
import java.util.regex.Pattern
import java.util.regex.Matcher
import org.apache.spark.streaming.kafka._
import kafka.serializer.StringDecoder
import Utilities._
object WordCount {
def main(args: Array[String]): Unit = {
val ssc = new StreamingContext("local[*]", "KafkaExample", Seconds(1))
setupLogging()
// Construct a regular expression (regex) to extract fields from raw Apache log lines
val pattern = apacheLogPattern()
// hostname:port for Kafka brokers, not Zookeeper
val kafkaParams = Map("metadata.broker.list" -> "localhost:9092")
// List of topics you want to listen for from Kafka
val topics = List("testLogs").toSet
// Create our Kafka stream, which will contain (topic,message) pairs. We tack a
// map(_._2) at the end in order to only get the messages, which contain individual
// lines of data.
val lines = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topics).map(_._2)
// Extract the request field from each log line
val requests = lines.map(x => {val matcher:Matcher = pattern.matcher(x); if (matcher.matches()) matcher.group(5)})
// Extract the URL from the request
val urls = requests.map(x => {val arr = x.toString().split(" "); if (arr.size == 3) arr(1) else "[error]"})
// Reduce by URL over a 5-minute window sliding every second
val urlCounts = urls.map(x => (x, 1)).reduceByKeyAndWindow(_ + _, _ - _, Seconds(300), Seconds(1))
// Sort and print the results
val sortedResults = urlCounts.transform(rdd => rdd.sortBy(x => x._2, false))
sortedResults.print()
// Kick it off
ssc.checkpoint("/home/")
ssc.start()
ssc.awaitTermination()
}
}
I am using IntelliJ IDE, and create scala project by using sbt. Details of build.sbt file is as follow:
name := "Sample"
version := "1.0"
organization := "com.sundogsoftware"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.4.1",
"org.apache.spark" %% "spark-streaming-kafka" % "1.4.1",
"org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"
)
However, when I try to build the code, it creates following error:
Error:scalac: missing or invalid dependency detected while loading class file 'StreamingContext.class'.
Could not access type Logging in package org.apache.spark,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with -Ylog-classpath to see the problematic classpath.)
A full rebuild may help if 'StreamingContext.class' was compiled against an incompatible version of org.apache.spark.
Error:scalac: missing or invalid dependency detected while loading class file 'DStream.class'.
Could not access type Logging in package org.apache.spark,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with -Ylog-classpath to see the problematic classpath.)
A full rebuild may help if 'DStream.class' was compiled against an incompatible version of org.apache.spark.
When using different Spark libraries together the versions of all libs should always match.
Also, the version of kafka you use matters also, so should be for example: spark-streaming-kafka-0-10_2.11
...
scalaVersion := "2.11.8"
val sparkVersion = "2.2.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-10_2.11" % sparkVersion,
"org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"
)
This is a useful site if you need to check the exact dependencies you should use:
https://search.maven.org/

object compress is not a member of package org.apache.commons

I've been trying to use the apache commons.
But, it fails and get the following errors.
I've no idea how to fix it. Maybe, need to add something into build.sbt?
$ sbt
> clean
> compile
[error] hw.scala:3: object compress is not a member of package org.apache.commons
[error] import org.apache.commons.compress.utils.IOUtils
[error] ^
[error] hw.scala:24: not found: value IOUtils
[error] val bytes = IOUtils.toByteArray(new FileInputStream(imgFile))
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
hw.scala
import org.apache.commons.codec.binary.{ Base64 => ApacheBase64 }
import org.apache.commons.compress.utils.IOUtils
...
build.sbt
name := "hello"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"commons-codec" % "commons-codec" % "1.10",
"commons-io" % "commons-io" % "2.4"
)
Add this in your build.sbt:
// https://mvnrepository.com/artifact/org.apache.commons/commons-compress
libraryDependencies += "org.apache.commons" % "commons-compress" % "1.14"
To find this by yourself in the future, search on https://mvnrepository.com/

Play 2.4.1, PlayEbean not found

After updating my Java project from 2.2 to 2.4, I followed the instructions on the Migration page, but am getting that error, saying the value PlayEbean was not found.
What am I doing wrong? As far as I can tell I only have to add that one line to the plugins.sbt file and it should work, right?
EDIT: I tried 2.4.2, exact same problem occured.
For clarity's sake: there is no build.sbt file. Only a Build.scala file and a BuildKeys.scala and BuildPlugin.scala file. Though those last 2 have no relation to this problem.
The files:
project/Build.scala:
import sbt._
import Keys._
import play.sbt.PlayImport._
import PlayKeys._
object BuildSettings {
val appVersion = "0.1"
val buildScalaVersion = "2.11.7"
val buildSettings = Seq (
version := appVersion,
scalaVersion := buildScalaVersion
)
}
object Resolvers {
val typeSafeRepo = "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
val localRepo = "Local Maven Repositor" at "file://"+Path.userHome.absolutePath+"/.m2/repository"
val bintrayRepo = "scalaz-bintray" at "https://dl.bintray.com/scalaz/releases"
val sbtRepo = "Public SBT repo" at "https://dl.bintray.com/sbt/sbt-plugin-releases/"
val myResolvers = Seq (
typeSafeRepo,
localRepo,
bintrayRepo,
sbtRepo
)
}
object Dependencies {
val mindrot = "org.mindrot" % "jbcrypt" % "0.3m"
val libThrift = "org.apache.thrift" % "libthrift" % "0.9.2"
val commonsLang3 = "org.apache.commons" % "commons-lang3" % "3.4"
val commonsExec = "org.apache.commons" % "commons-exec" % "1.3"
val guava = "com.google.guava" % "guava" % "18.0"
val log4j = "org.apache.logging.log4j" % "log4j-core" % "2.3"
val jacksonDataType = "com.fasterxml.jackson.datatype" % "jackson-datatype-joda" % "2.5.3"
val jacksonDataformat = "com.fasterxml.jackson.dataformat" % "jackson-dataformat-xml" % "2.5.3"
val postgresql = "postgresql" % "postgresql" % "9.3-1103.jdbc41"
val myDeps = Seq(
// Part of play
javaCore,
javaJdbc,
javaWs,
cache,
// User defined
mindrot,
libThrift,
commonsLang3,
commonsExec,
guava,
log4j,
jacksonDataType,
jacksonDataformat,
postgresql
)
}
object ApplicationBuild extends Build {
import Resolvers._
import Dependencies._
import BuildSettings._
val appName = "sandbox"
val main = Project(
appName,
file("."),
settings = buildSettings ++ Seq (resolvers := myResolvers, libraryDependencies := myDeps)
)
.enablePlugins(play.PlayJava, PlayEbean)
.settings(jacoco.settings: _*)
.settings(parallelExecution in jacoco.Config := false)
.settings(javaOptions in Test ++= Seq("-Xmx512M"))
.settings(javaOptions in Test ++= Seq("-XX:MaxPermSize=512M"))
}
project/plugins.sbt:
// Use the Play sbt plugin for Play projects
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.4.1")
// The Typesafe repository
resolvers ++= Seq(
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/",
"Local Maven Repositor" at "file://"+Path.userHome.absolutePath+"/.m2/repository",
"scalaz-bintray" at "https://dl.bintray.com/scalaz/releases",
"Public SBT repo" at "https://dl.bintray.com/sbt/sbt-plugin-releases/"
)
libraryDependencies ++= Seq(
"com.puppycrawl.tools" % "checkstyle" % "6.8",
"com.typesafe.play" %% "play-java-ws" % "2.4.1",
"org.jacoco" % "org.jacoco.core" % "0.7.1.201405082137" artifacts(Artifact("org.jacoco.core", "jar", "jar")),
"org.jacoco" % "org.jacoco.report" % "0.7.1.201405082137" artifacts(Artifact("org.jacoco.report", "jar", "jar"))
)
// Plugin for code coverage
addSbtPlugin("de.johoop" % "jacoco4sbt" % "2.1.6")
// Play enhancer - this automatically generates getters/setters for public fields
// and rewrites accessors of these fields to use the getters/setters. Remove this
// plugin if you prefer not to have this feature, or disable on a per project
// basis using disablePlugins(PlayEnhancer) in your build.sbt
addSbtPlugin("com.typesafe.sbt" % "sbt-play-enhancer" % "1.1.0")
// Play Ebean support, to enable, uncomment this line, and enable in your build.sbt using
// enablePlugins(SbtEbean). Note, uncommenting this line will automatically bring in
// Play enhancer, regardless of whether the line above is commented out or not.
addSbtPlugin("com.typesafe.sbt" % "sbt-play-ebean" % "1.0.0")
I have tried adding javaEbean to the myDeps variable, output remains the same.
Also, contrary to all the examples and tutorials, if I want to enable PlayJava, I have to do it via play.PlayJava. What is up with that?
For the error: not found: value PlayEbean, you must import play.ebean.sbt.PlayEbean in Build.scala,
Then you will have a not-found error for jacoco, you must import de.johoop.jacoco4sbt.JacocoPlugin.jacoco,
After that a NoClassDefFoundError, there you must upgrade SBT to 0.13.8 in project/build.properties,
Finally the postgresql dependency is incorrect and doesn't resolve.
The SBT part should work, in my case it fail later because I don't have eBeans in project.
Patch version:
diff a/project/Build.scala b/project/Build.scala
--- a/project/Build.scala
+++ b/project/Build.scala
## -1,3 +1,5 ##
+import de.johoop.jacoco4sbt.JacocoPlugin.jacoco
+import play.ebean.sbt.PlayEbean
import play.sbt.PlayImport._
import sbt.Keys._
import sbt._
## -35,7 +37,7 ##
val log4j = "org.apache.logging.log4j" % "log4j-core" % "2.3"
val jacksonDataType = "com.fasterxml.jackson.datatype" % "jackson-datatype-joda" % "2.5.3"
val jacksonDataformat = "com.fasterxml.jackson.dataformat" % "jackson-dataformat-xml" % "2.5.3"
- val postgresql = "postgresql" % "postgresql" % "9.3-1103.jdbc41"
+ val postgresql = "org.postgresql" % "postgresql" % "9.3-1103-jdbc41"
val myDeps = Seq(
// Part of play
diff a/project/build.properties b/project/build.properties
--- a/project/build.properties
+++ b/project/build.properties
## -1,1 +1,1 ##
-sbt.version=0.13.5
+sbt.version=0.13.8
EDIT: How did I end up with this: the latest versions of Scala plugin for IntelliJ IDEA allow better editing of SBT configs (than previously), but (for now) one need to make the SBT project build a first time to import it (i.e. commenting suspicious lines). Once the project is imported, one can use autocompletion, auto-import and other joys. I hope it will be usefull with crossScalaVersions. About that, keep in mind that Play 2.4 is Java 8+ only and Scala 2.10 doesn't support fully Java 8. (First section of the "Play 2.4 Migration Guide")

SqlContext is not a member of package org.apache.spark.sql

this is my build.sbt file:
name := "words"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.0",
"org.apache.spark" %% "spark-sql" % "1.3.0"
)
sbt.version=0.13.8-RC1
When I compile the program, I have the following error:
[error] D:\projects\bd\words\src\main\scala\test.scala:8:
type SqlContext is not a member of package org.apache.spark.sql
[error] val sqlContext = new org.apache.spark.sql.SqlContext(sc)
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
It's SQLContext not SqlContext.