I am trying to list all objects in a bucket, and then read some or all of them as CSV. I have spent two days now, trying to do both, but I can only get one working at a time if I'm using Google's libraries.
I think the problem lies in an incompatibility between Google's own libraries, but I'm not entirely sure. First, I think I should show how I'm doing each thing.
This is how I'm reading a single file. In my version of Scala, you can use a gs:// url with spark.read.csv:
val jsonKeyFile = "my-local-keyfile.json"
ss.sparkContext.hadoopConfiguration.set("google.cloud.auth.service.account.json.keyfile", jsonKeyFile)
spark.read
.option("header", "true")
.option("sep", ",")
.option("inferSchema", "false")
.option("mode", "FAILFAST")
.csv(gcsFile)
This actually works alone, and I get a working DF from it. Then the problem arises when I try to add Google's Storage library:
libraryDependencies += "com.google.cloud" % "google-cloud-storage" % "1.70.0"
If I try to run the same code again, I get this bad boy from the .csv call:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/05/14 16:38:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
An exception or error caused a run to abort: Class com.google.common.base.Suppliers$SupplierOfInstance does not implement the requested interface java.util.function.Supplier
java.lang.IncompatibleClassChangeError: Class com.google.common.base.Suppliers$SupplierOfInstance does not implement the requested interface java.util.function.Supplier
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getGcsFs(GoogleHadoopFileSystemBase.java:1488)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1659)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:683)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:646)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
...(lots more trace, probably irrelevant)
Then, you might ask, why don't you just not use the library? Well... This is the code that lists the objects in a bucket:
StorageOptions
.newBuilder()
.setCredentials(ServiceAccountCredentials.fromStream(
File(jsonKeyFile).inputStream()))
.build()
.getService
.list(bucket)
.getValues
.asScala
.map(irrelevant)
.toSeq
.toDF("irrelevant")
And I have not yet found a way to do this easily without the specified library.
I found out what caused the problem. Guava:27.1-android was a dependency of some library at some point, I don't know which and how it got there, but it was in use. In this version of Guava, the Supplier interface does not extend the Java Supplier interface.
I fixed it by adding Guava 27.1-jre to my dependencies. I don't know if the order matters, but I don't dare touch anything at this point. Here is where I placed it:
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % "test"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.1" % "provided"
libraryDependencies += "com.google.guava" % "guava" % "27.1-jre"
libraryDependencies += "com.google.cloud" % "google-cloud-storage" % "1.70.0"
//BQ samples as of 27feb2019 use hadoop2 but hadoop3 seems to work fine and are recommended elsewhere
libraryDependencies += "com.google.cloud.bigdataoss" % "bigquery-connector" % "hadoop3-0.13.16" % "provided"
libraryDependencies += "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop3-1.9.16" % "provided"
Hope this prevents some other poor soul from spending 2 days on this bs.
#Andy That's a life savior! after spending time investigating I've found that some of the libraries I'm using use different versions of guava so you should load this 27.1-jre version into your environment.
In my case using Gradle I had to do something like the following:
// This is here *the first line* to support GCS files [Do not remove or move!]
implementation group: 'com.google.guava', name: 'guava', version: '27.1-jre'
api(project(':other.project')) {
exclude group: 'com.google.guava', module: 'guava'
}
implementation(group: 'other.library', name: 'library.name', version: 'library.version') {
exclude group: 'com.google.guava', module: 'guava'
}
I have a Scala project that I build with sbt. It uses the sryza/spark-timeseries library.
I am trying to run the following simple code:
val tsAirPassengers = new DenseVector(Array(
112.0,118.0,132.0,129.0,121.0,135.0,148.0,148.0,136.0,119.0,104.0,118.0,115.0,126.0,
141.0,135.0,125.0,149.0,170.0,170.0,158.0,133.0,114.0,140.0,145.0,150.0,178.0,163.0,
172.0,178.0,199.0,199.0,184.0,162.0,146.0,166.0,171.0,180.0,193.0,181.0,183.0,218.0,
230.0,242.0,209.0,191.0,172.0,194.0,196.0,196.0,236.0,235.0,229.0,243.0,264.0,272.0,
237.0,211.0,180.0,201.0,204.0,188.0,235.0,227.0,234.0,264.0,302.0,293.0,259.0,229.0,
203.0,229.0,242.0,233.0,267.0,269.0,270.0,315.0,364.0,347.0,312.0,274.0,237.0,278.0,
284.0,277.0,317.0,313.0,318.0,374.0,413.0,405.0,355.0,306.0,271.0,306.0,315.0,301.0,
356.0,348.0,355.0,422.0,465.0,467.0,404.0,347.0,305.0,336.0,340.0,318.0,362.0,348.0,
363.0,435.0,491.0,505.0,404.0,359.0,310.0,337.0,360.0,342.0,406.0,396.0,420.0,472.0,
548.0,559.0,463.0,407.0,362.0,405.0,417.0,391.0,419.0,461.0,472.0,535.0,622.0,606.0,
508.0,461.0,390.0,432.0
))
val period = 12
val model = HoltWinters.fitModel(tsAirPassengers, period, "additive", "BOBYQA")
It builds fine, but when I try to run it, I get this error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
at com.cloudera.sparkts.models.HoltWintersModel.convolve(HoltWinters.scala:252)
at com.cloudera.sparkts.models.HoltWintersModel.initHoltWinters(HoltWinters.scala:277)
at com.cloudera.sparkts.models.HoltWintersModel.getHoltWintersComponents(HoltWinters.scala:190)
.
.
.
The error occurs on this line:
val model = HoltWinters.fitModel(tsAirPassengers, period, "additive", "BOBYQA")
My build.sbt includes:
name := "acme-project"
version := "0.0.1"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-hive" % "1.6.0",
"net.liftweb" %% "lift-json" % "2.5+",
"com.github.seratch" %% "awscala" % "0.3.+",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.2"
)
I have placed sparkts-0.4.0-SNAPSHOT.jar in the lib folder of my project. (I would have preferred to add a libraryDependency, but spark-ts does not appear to be on Maven Central.)
What is causing this run-time error?
The library requires Scala 2.11, not 2.10, and Spark 2.0, not 1.6.2, as you can see from
<scala.minor.version>2.11</scala.minor.version>
<scala.complete.version>${scala.minor.version}.8</scala.complete.version>
<spark.version>2.0.0</spark.version>
in pom.xml. You can try changing these and seeing if it still compiles, find which older version of sparkts is compatible with your versions, or update your project's Scala and Spark versions (don't miss spark-mllib_2.10 in this case).
Also, if you put the jar into lib folder, you also have to put its dependencies there (and their dependencies, etc.) or into libraryDependencies. Instead, publish sparkts into your local repository using mvn install (IIRC) and add it to libraryDependencies, which will allow SBT to resolve its dependencies.
I am having trouble using JodaTime in a spark scala program. I tried the solutions posted in the past in Stackoverflow and they don't seem to fix the issue for me.
When I try to spark-submit it comes back with an error like the following:
15/09/04 17:51:57 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://sparkDriver#100.100.100.81:56672]
Exception in thread "main" java.lang.NoClassDefFoundError: org/joda/time/DateTimeZone
at com.ttams.xrkqz.GenerateCsv$.main(GenerateCsv.scala:50)
at com.ttams.xrkqz.GenerateCsv.main(GenerateCsv.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
After sbt package, which seems to work fine, I invoke spark-submit like this ...
~/spark/bin/spark-submit --class "com.ttams.xrkqz.GenerateCsv" --master local target/scala-2.10/scala-xrkqz_2.10-1.0.jar
In my build.sbt file, I have
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
libraryDependencies ++= Seq ("joda-time" % "joda-time" % "2.8.2",
"org.joda" % "joda-convert" % "1.7"
)
I have tried multiple versions of joda-time and joda-convert but am not able to use spark-submit from the command line. However, it seems to work when i run within the ide (scalaide).
Let me know if you have any suggestions or ideas.
It seems that you are missing the dependencies in your class path. You could do a few things: One is you could manually add the joda time jars to spark submit with the --jars argument, the other is you could use the assembly plugin and build an assembly jar (you will likely want to mark spark-core as "provided" so it doesn't end up in your assembly) which will have all of your dependencies.
I'm new to Scala & sbt, and I'm trying to compile a project that imports akka.stm._.
When I try running sbt compile the compilation fails with a message point to that point.
I tried using
https://github.com/paulp/sbt-extras
so I'll have the exact sbt version defined in the "build.sbt" file, but it did not help.
I downloaded the akka-2.0.3.tgz and opened the files, but I don't understand exactly how to install them by default or how to tell sbt to use them.
I also noticed that the build.sbt file contains:
resolvers += "repo.codahale.com" at "http://repo.codahale.com"
libraryDependencies ++= Seq(
// "com.codahale" % "simplespec_2.9.0-1" % "0.3.4"
"com.codahale" % "simplespec_2.9.0-1" % "0.4.1"
// "se.scalablesolutions.akka" %% "akka-sbt-plugin" % "2.0-SNAPSHOT"
I tried uncommenting out the "se.scalablesolutions.akka" (assuming the programmer used the akka-library of that version), but it only printed the message:
[error] Error parsing expression. Ensure that there are no blank lines within a setting.
(There are no blank lines, I just deleted the '//' and replaced the double '%' with a single one)
How do I tell the sbt to find the akka jar files in their place? Can I add another resolver to solve this problem?
I know this kind of question doesn't fit in stackoverflow, but can you at least refer me to some manuals I should read inorder to solve this?
OK, first of all I want to apologize for the newbie question.
(Stackoverflow should make a separate "newbie" section)
First, the elements in the Seq part should be separated by comma.
Second, the akka snapshots were moved to http://repo.akka.io/snapshots so I fixed the build.sbt to:
resolvers += "repo.codahale.com" at "http://repo.codahale.com"
resolvers += "akka" at "http://repo.akka.io/snapshots"
libraryDependencies ++= Seq(
// "com.codahale" % "simplespec_2.9.0-1" % "0.3.4"
"com.codahale" % "simplespec_2.9.0-1" % "0.4.1",
"com.typesafe.akka" % "akka-stm" % "2.0-SNAPSHOT"
)
And the compilation finished successfully.
I don't know if this is the exact configuration in which the original compilation was done, but this issue doesn't disturb me at the moment.
I've been playing around with Redis and Scala separately, and thought it'd be neat to combine them in a simple Lift app.
I've done a fair bit of Googling and can't find any examples of a Lift app that uses Redis. Is there a reason for this?
What drivers/APIs do you recommend for using Redis w/Lift? I'm currently working with Jedis (https://github.com/xetorthio/jedis).
I use scalatra with jedis as the connector to redis, works fine as well. Java data types will be converted to equivalents in scala implicitly when scala.collection.JavaConversions._ is imported (in scala 2.8 or later). To use jedis, simply add this line to your project definition file in sbt 0.7.x:
val jedis = "redis.clients" % "jedis" % "2.0.0"
or this in sbt 0.10.x:
libraryDependencies += "redis.clients" % "jedis" % "2.0.0"
I have tested a couple of scala redis connectors - settled on https://github.com/debasishg/scala-redis for further testing.
Simply
val scalaredis = "net.debasishg" % "redisclient_2.9.0" % "2.3.1"
in SBT
According to http://mvnrepository.com/artifact/net.debasishg/redisclient_2.9.1,
libraryDependencies += "net.debasishg" %% "redisclient" % "2.7"