SBT project java.io.FileNotFoundException:FileNotFoundException: HADOOP_HOME unset - scala

I'm trying to use an AvroParquetWriter to convert a file in Avro format to a parquet file. I load up the schema
val schema:org.apache.Schema = ... getSchema(...)
val parquetFile = new Path("Location/for/parquetFile.txt")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,schema)
My code runs fine up until it gets to initializing the AvroParquetWriter. Then it throws this error:
> java.lang.RuntimeException: java.io.FileNotFoundException:
> java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are
> unset. -see https://wiki.apache.org/hadoop/WindowsProblems at
> org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:722) at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:256)
> at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:273)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:767)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:235)...etc
The advice it seems to give, and the advice I'm finding, is related to how to fix this if you are running a Hadoop cluster on your machine. However, I am not running a Hadoop cluster, nor am I aiming to. I have imported some of its libraries to use with various other pieces of my program in my SBT file, but this does not spin up a local cluster.
It just started doing this. Out of my 2 other colleagues, one is able to run this without issue, and the other just started getting the same issue as me. Here is (the relevant parts of) my build.sbt:
lazy val root = (project in file("."))
.settings(
commonSettings,
name := "My project",
version := "0.1",
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.9.0",
"com.typesafe.akka" %% "akka-actor" % "2.5.2",
"com.lightbend.akka" %% "akka-stream-alpakka-s3" % "0.9",
"com.enragedginger" % "akka-quartz-scheduler_2.12" % "1.6.0-akka-2.4.x",
"com.typesafe.akka" % "akka-agent_2.12" % "2.5.2",
"com.typesafe.akka" % "akka-remote_2.12" % "2.5.2",
"com.typesafe.akka" % "akka-stream_2.12" % "2.5.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.1",
"com.typesafe.akka" %% "akka-stream-kafka" % "0.16",
"com.typesafe.akka" %% "akka-persistence" % "2.5.2",
"org.iq80.leveldb" % "leveldb" % "0.7",
"org.fusesource.leveldbjni" % "leveldbjni-all" % "1.8",
"javax.mail" % "javax.mail-api" % "1.5.6",
"com.sun.mail" % "javax.mail" % "1.5.6",
"commons-io" % "commons-io" % "2.5",
"org.apache.avro" % "avro" % "1.8.1",
"net.liftweb" % "lift-json_2.12" % "3.1.0-M1",
"com.google.code.gson" % "gson" % "2.8.1",
"org.json4s" %% "json4s-jackson" % "3.5.2",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.149",
//"com.amazonaws" % "aws-java-sdk" % "1.11.286",
"org.scalikejdbc" %% "scalikejdbc" % "3.0.0",
"org.scalikejdbc" %% "scalikejdbc-config" % "3.0.0",
"org.scalikejdbc" % "scalikejdbc-interpolation_2.12" % "3.0.2",
"com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8",
"org.apache.commons" % "commons-pool2" % "2.4.2",
"commons-pool" % "commons-pool" % "1.6",
"com.jcraft" % "jsch" % "0.1.54",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.7.2",
"org.scalactic" %% "scalactic" % "3.0.4",
"mysql" % "mysql-connector-java" % "8.0.8-dmr",
"org.scalatest" %% "scalatest" % "3.0.4" % "test"
)
)
Any ideas as to why it cannot run the Hadoop-related dependencies?

The answer was to follow their suggestion-
I downloaded the latest version of the winutils.exe from
https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin
Then I manually created this directory structure in C:/Users/MyName/Hadoop/bin - note, the bin MUST be there. You can actually call the Hadoop/ directory whatever you want, but the bin/ must be one level within.
I placed the winutils.exe and placed it in the bin.
In my code I had to put this line above initializing the parquet writer (I'd imagine it can be anytime before it is initialized) to set the Hadoop line:
-
System.setProperty("hadoop.home.dir", "C:/Users/nhanak/Hadoop/")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,iInfo.schema)
Optional - if you want to just keep this within your project and not have it carry over to your local machine, or if others are going to be pulling this repo or you want to pack it in a jar to send off everywhere, etc. - create a directory structure within your project and store the winutils.exe inside of it.
-so, say you create the directory structure src/main/resources/HadoopResources/bin in your project, place the winutils.exe in the bin. Then, to make use of the winutils.exe you need to set the Hadoop home like this:
-
val file = new File("src/main/resources/HadoopResources")
System.setProperty("hadoop.home.dir", file.getAbsolutePath)

Related

How to specify a different resolver for certain dependencies

I am in a situation where I need to specify a custom resolver for my SBT project, but only to download 1 or 2 dependencies. I want all the other dependencies to be fetched from Maven repository.
Here is my build.sbt file:
...Project definition...
resolvers := Seq(
"Maven" at "https://repo1.maven.org/"
)
//Akka dependencies
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-actor" % akkaActorsVersion,
"com.typesafe.akka" %% "akka-testkit" % akkaActorsVersion % Test,
"com.typesafe.akka" %% "akka-stream" % akkaStreamsVersion,
"com.typesafe.akka" %% "akka-stream-testkit" % akkaStreamsVersion % Test,
"com.typesafe.akka" %% "akka-http" % akkaHttpVersion,
"com.typesafe.akka" %% "akka-http-testkit" % akkaHttpVersion % Test,
"com.datastax.cassandra" % "cassandra-driver-core" % "3.3.0",
"com.typesafe.akka" %% "akka-http-spray-json" % akkaHttpVersion,
"io.spray" %% "spray-json" % "1.3.5",
"de.heikoseeberger" %% "akka-http-circe" % "1.23.0",
"io.circe" %% "circe-generic" % "0.10.0",
"com.pauldijou" %% "jwt-core" % "0.13.0",
"com.pauldijou" %% "jwt-circe" % "0.13.0",
"org.slf4j" % "slf4j-simple" % "1.6.4",
"com.microsoft.azure" % "azure-storage" % "8.4.0",
"com.datastax.cassandra" % "cassandra-driver-extras" % "3.1.4",
"io.jvm.uuid" %% "scala-uuid" % "0.3.0",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.cassandraunit" % "cassandra-unit" % "3.1.1.0" % "test",
"io.monix" %% "monix" % "3.0.0-8084549",
"org.bouncycastle" % "bcpkix-jdk15on" % "1.48"
)
resolvers := Seq("Artifactory" at "http://10.3.1.6:8081/artifactory/libs-release-local/")
Credentials += Credentials("Artifactory Realm", "10.3.1.6", ARTIFACTORY_USER, ARTIFACTORY_PASSWORD)
libraryDependencies ++=
Seq(
"com.org" % "common-layer_2.11" % "0.3",
)
However the build fails with errors that say that SBT is trying to fetch libraries from Artifactory instead of from Maven.
For example the Cassandra driver dependency
unresolved dependency: com.datastax.cassandra#cassandra-driver-extras;3.1.4: Artifactory: unable to get resource for com/datastax/cassandra#cassandra-driver-extras;3.1.4: res=http://10.3.1.6:8081/artifactory/libs-release-local/com/datastax/cassandra/cassandra-driver-extras/3.1.4/cassandra-driver-extras-3.1.4.pom
I have searched the internet and the documentation and I don't see a clear way to handle this, even though I'm surprised because this seems like a common problem.
Any ideas about how I could enforce the priorities/ordering of resolvers in SBT?
Please note that when you are doing
resolvers := Seq("resolver" at "https://path")
You are overriding the existing user-defined additional resolvers. Therefore if you are doing:
resolvers := Seq("resolver1" at "https://path1")
resolvers := Seq("resolver2" at "https://path2")
You are ending up only with resolver2.
In order to have both resolvers, you need to do something like:
resolvers ++= Seq(
"resolver1" at "https://path1",
"resolver2" at "https://path2"
)
SBT search the dependencies according to the order of the given resolvers. This means that in the given example, it will search first at resolver1, and only if it doesn't find, it will go to resolver2.
Another thing you need to know, is that SBT has predefined resolvers.
You can read more about sbt resolvers at: https://www.scala-sbt.org/1.x/docs/Resolvers.html

Error:- java.lang.NoClassDefFoundError: scala/Product$class

Test project Build.sbt
scalaVersion := "2.12.8"
crossScalaVersions := Seq("2.11.12", "2.12.8")
val scalaTestVersion = "3.0.5"
val rocksDBVersion = "5.17.2"
val kafkaVersion = "2.2.0"
lazy val deps = Seq(
// "javax.ws.rs" % "javax.ws.rs-api" % "2.1" jar(),
"org.apache.kafka" % "kafka-clients" % kafkaVersion,
"org.apache.kafka" % "kafka-clients" % kafkaVersion classifier "test",
"org.apache.kafka" % "kafka-streams" % kafkaVersion,
"org.apache.kafka" % "kafka-streams" % kafkaVersion classifier "test",
"org.apache.kafka" %% "kafka" % kafkaVersion,
"org.apache.kafka" % "kafka-streams-test-utils" % kafkaVersion,
"org.scalatest" %% "scalatest" % scalaTestVersion % "test",
"org.rocksdb" % "rocksdbjni" % rocksDBVersion % "test"
)
Test Project plugins.sbt
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.8")
This is my Test application sbt configuration file and when I run sbt package it will create a jar file for me then I have to use that jar in my other project.
Other project sbt file
scalaVersion := "2.12.6"
object Dependencies {
val deps = Seq(
"org.json4s" %% "json4s-native" % "3.5.3",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"cool.graph" % "cuid-java" % "0.1.1",
"org.apache.kafka" % "kafka-streams" % "2.2.0",
"com.typesafe" % "config" % "1.3.3",
"org.scalaz" %% "scalaz-core" % "7.2.20",
"org.apache.commons" % "commons-lang3" % "3.8.1",
//Test
"org.scalatest" %% "scalatest" % "3.0.4" % Test,
"org.mockito" % "mockito-all" % "1.10.19" % Test,
"com.abc" %% "kafka-streams-test-kit" % "2.2.0" % Test
)
}
Other Project plugins.sbt
addSbtPlugin("se.marcuslonnberg" % "sbt-docker" % "1.5.0")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.8")
As I'm new in Scala and I don't know what i did wrong in that as when i put my test scala jar into another project It will give me the error.
Error:-
java.lang.NoClassDefFoundError: scala/Product$class
[error] at com.shepherd.kafka.streams.test.kit.MockedStreams$Builder.<init>(MockedStreams.scala:17)
[error] at com.shepherd.kafka.streams.test.kit.MockedStreams$.apply(MockedStreams.scala:15)
[error] at com.shepherd.integration.WindowedThresholdSpec.$anonfun$new$4(WindowedThresholdSpec.scala:38)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
Can anyone please help me on this as I'm new also I have see this one java.lang.NoClassDefFoundError: scala/Product$class but this not help me so I have put my problem here.
Try with both the way like uploaded jar directly into the code and pull by S3 but the error is still exists.
resolvers += "Java.net Maven2 Repository" at "https://repo1.maven.org/maven2/"
resolvers += "S3" at "https://s3-eu-west-1.amazonaws.com/com.abc/" -> Test jar file
NOTE:- I have Set the same scala version in my both the application "2.12.6" but issue is still exists.

How to convert RDD[some case class] to csv file using scala?

I have an RDD[some case class] and I want to convert it to a csv file. I am using spark 1.6 and scala 2.10.5 .
stationDetails.toDF.coalesce(1).write.format("com.databricks.spark.csv").save("data/myData.csv")
gives error
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:219)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
I am not able to add the dependencies for "com.databricks.spark.csv" in my build.sbt file.
dependencies I added in build.sbt file are:
libraryDependencies ++= Seq(
"org.apache.commons" % "commons-csv" % "1.1",
"com.univocity" % "univocity-parsers" % "1.5.1",
"org.slf4j" % "slf4j-api" % "1.7.5" % "provided",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"com.novocode" % "junit-interface" % "0.9" % "test"
)
I also tried this
stationDetails.toDF.coalesce(1).write.csv("data/myData.csv")
but it gives error : csv cannot be resolved.
Please change your build.sbt to below -
libraryDependencies ++= Seq(
"org.apache.commons" % "commons-csv" % "1.1",
"com.databricks" %% "spark-csv" % "1.4.0",
"com.univocity" % "univocity-parsers" % "1.5.1",
"org.slf4j" % "slf4j-api" % "1.7.5" % "provided",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"com.novocode" % "junit-interface" % "0.9" % "test"
)

SBT - Class not found, continuing with a stub

I am currently migrating my Play 2 Scala API project and encounter 10 warnings during the compilations indicating :
[warn] Class play.core.enhancers.PropertiesEnhancer$GeneratedAccessor not found - continuing with a stub.
All of them are the same, and I don't have any other indications. I've searched a bit for other similar cases, it's often because of the JDK version and so on but I'm already in 1.8.
Here's what I have in plugins.sbt :
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.5.3")
addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "0.8.0")
addSbtPlugin("com.sksamuel.scapegoat" %% "sbt-scapegoat" % "1.0.4")
and in build.sbt :
libraryDependencies ++= Seq(
cache,
ws,
"org.reactivemongo" %% "play2-reactivemongo" % "0.10.5.0.akka23",
"org.reactivemongo" %% "reactivemongo" % "0.10.5.0.akka23",
"org.mockito" % "mockito-core" % "1.10.5" % "test",
"org.scalatestplus" %% "play" % "1.2.0" % "test",
"com.amazonaws" % "aws-java-sdk" % "1.8.3",
"org.cvogt" %% "play-json-extensions" % "0.8.0",
javaCore,
"com.clever-age" % "play2-elasticsearch" % "1.1.0" excludeAll(
ExclusionRule(organization = "org.scala-lang"),
ExclusionRule(organization = "com.typesafe.play"),
ExclusionRule(organization = "org.apache.commons", artifact = "commons-lang3")
)
)
Don't hesitate if you need anything else :)
It's not something that blocks me but I'd prefer avoid these 10 warnings everytime I recompile my application.
Thank you ! :)
It seems something in your code is trying to use the Play enhancer and is failing to find it. Are you using Ebean or something that may require the enhancer?
You can try to add the plugin to your plugins.sbt
addSbtPlugin("com.typesafe.sbt" % "sbt-play-enhancer" % "1.1.0")
This should make the warning go away. You can then disable it if you like:
# In build.sbt
playEnhancerEnabled := false

sbt different libraryDependencies in test than in normal mode

Because of conflicting / transitive (elasticsearch / lucene / jackrabbit) dependencies i want to have different libraryDependencies in test than i have when normally running the app. I solved it with the setup below, but this requires running activator with -Dtest and this will prevent my app from running normally when i'm done testing. The other way around, i.e. running just activator, will run my app but will not run my test. So, not very convenient and i think this can be done much better (btw i'm very new to sbt/scala)
name := """example"""
version := "0.1"
lazy val root = (project in file(".")).enablePlugins(PlayJava)
scalaVersion := "2.11.1"
// fork in Test := true
javaOptions in Test += "-Dconfig.file=conf/application.test.conf"
javaOptions in Test += "-Dlogger.file=conf/test-logger.xml"
// run activator -Dtest
if (sys.props.contains("test")) {
Seq[Project.Setting[_]](
libraryDependencies ++= {
Seq(
javaJdbc,
javaEbean,
cache,
javaWs,
"org.webjars" %% "webjars-play" % "2.3.0-2",
"org.webjars" % "bootstrap" % "3.3.6",
"org.webjars" % "font-awesome" % "4.5.0",
"be.objectify" %% "deadbolt-java" % "2.3.3",
"org.apache.lucene" % "lucene-core" % "3.6.0",
"org.elasticsearch" % "elasticsearch" % "1.7.4" exclude("org.apache.lucene", "lucene-core"),
"javax.jcr" % "jcr" % "2.0",
"org.apache.jackrabbit" % "jackrabbit-core" % "2.11.2",
"org.apache.jackrabbit" % "jackrabbit-jcr2dav" % "2.11.2",
"org.apache.tika" % "tika-parsers" % "1.11",
"org.apache.tika" % "tika-core" % "1.11",
"commons-io" % "commons-io" % "2.4",
"com.typesafe.akka" % "akka-testkit_2.11" % "2.3.14" % "test"
)
}
)
} else {
Seq[Project.Setting[_]](
libraryDependencies ++= {
Seq(
javaJdbc,
javaEbean,
cache,
javaWs,
"org.webjars" %% "webjars-play" % "2.3.0-2",
"org.webjars" % "bootstrap" % "3.3.6",
"org.webjars" % "font-awesome" % "4.5.0",
"be.objectify" %% "deadbolt-java" % "2.3.3",
"org.elasticsearch" % "elasticsearch" % "1.7.4",
"javax.jcr" % "jcr" % "2.0",
"org.apache.jackrabbit" % "jackrabbit-core" % "2.11.2",
"org.apache.jackrabbit" % "jackrabbit-jcr2dav" % "2.11.2",
"org.apache.tika" % "tika-parsers" % "1.11",
"org.apache.tika" % "tika-core" % "1.11",
"commons-io" % "commons-io" % "2.4",
"com.typesafe.akka" % "akka-testkit_2.11" % "2.3.14" % "test"
)
}
)
}
//.. our private nexus repo left out here
resolvers += "JBoss Repository" at "https://repository.jboss.org/nexus/content/repositories"
resolvers += "JBoss Third-Party Repository" at "https://repository.jboss.org/nexus/content/repositories/thirdparty-releases"
resolvers += "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
resolvers += Resolver.url("Objectify Play Repository", url("http://deadbolt.ws/releases/"))(Resolver.ivyStylePatterns)
I don't have a setup where I can really test whether this works, but from how I understand sbt dependencies it should:
Dependencies can have a kind of scope called a configuration. Typically, this is used to define test only dependencies:
"com.typesafe.akka" % "akka-testkit_2.11" % "2.3.14" % "test"
But you should also be able to define compile time and run time only dependencies using "compile" and "runtime" instead.
sbt prints me a warning if I use dependencies with different versions. The problem is, that this will use a different version of a dependency to compile it and then to run it with tests. So it will be run against a different version than it was compiled with. There are of course libraries, where this will work, especially, if you run with a newer version that what you use to compile.
If you really need to compile your application twice with different dependencies and use one build for running and one for testing, I fear, there won't be a solution without extending sbt or something like that.
You could try to make two modules, one with the main code and one for testing and then try to cross-build two different versions of the first module. Sbt can easily cross-build over multiple Scala versions, but I don't think it can do it out of the box for multiple versions of a library.
Thanks #dth, you put me on the right track. The settings below worked for me:
libraryDependencies ++= {
Seq(
javaJdbc,
javaEbean,
cache,
javaWs,
"org.webjars" %% "webjars-play" % "2.3.0-2",
"org.webjars" % "bootstrap" % "3.3.6",
"org.webjars" % "font-awesome" % "4.5.0",
"be.objectify" %% "deadbolt-java" % "2.3.3",
"org.apache.lucene" % "lucene-core" % "3.6.0" % "compile,test",
"org.elasticsearch" % "elasticsearch" % "1.7.4" % "compile,runtime",
"org.elasticsearch" % "elasticsearch" % "1.7.4" % "test" exclude("org.apache.lucene", "lucene-core"),
"javax.jcr" % "jcr" % "2.0",
"org.apache.jackrabbit" % "jackrabbit-core" % "2.11.2",
"org.apache.jackrabbit" % "jackrabbit-jcr2dav" % "2.11.2",
"org.apache.tika" % "tika-parsers" % "1.11",
"org.apache.tika" % "tika-core" % "1.11",
"commons-io" % "commons-io" % "2.4",
"com.typesafe.akka" % "akka-testkit_2.11" % "2.3.14" % "test"
)
}