I have no clue why in the world this would be causing me this much grief but it is. How can I just very simply grab my txt file from uber jar packaged resources directory when I run spark-submit and pass it to a spark.read? Yes. IDE is simple and works. But running with spark-submit is plaguing me with the good old fashioned:
Path does not exist: file:/opt/spark/jars/<myjar>.jar!/datasets/mllib/sample_kmeans_data.txt
My folder structure is very standard:
src
main
resources
sample_kmeans_data.txt
My very vanilla loader:
val kmeansData =
getClass.getClassLoader.getResource("datasets/mllib/sample_kmeans_data.txt").getPath
val dataset: DataFrame = spark.read
.format("libsvm")
.load(kmeansData)
dataset.show
I have also confirmed datasets folder at root level after extracting jar and I've tried many different versions of classLoader, all leading to same error. Lastly, reading the file as a stream or input buffer without spark works fine and can clearly get to the file from jar with spark submit. I'm getting tripped up by what the loader in spark needs as just an input path from the jar.
Related
When I run my app in sbt console my-app/run (my-app is a module), I get this error
java.io.FileNotFoundException: file:.../target/bg-jobs/sbt_eea980c4/job-3/target/aaa32a5e/b2227e4f/my-app_2.13-0.1.0-SNAPSHOT.jar!/my_file.csv
My directory structure is my-app->src->main->resources->my_file.csv. The way I am accessing the file in my code is:
val file = new File(getClass.getResource("/my_file.csv").getPath)
What am I missing here?
sbt stuck your resources in a jar, so you can't access them as a file. You can use getClass().getResourceAsStream("/my_file.csv") instead.
I'm using the sbt assembly jar plugin to create a standalone jar file. My project folder structure would look like this:
MyProject
-src
- main
- scala
- mypackages and source files
- conf // contains application.conf, application.test.conf and so on
- test
-project // contains all the build related files
- README.md
I now want to be able to run the fat jar that I produce against a version of the application.conf that I specify as a System property!
So here is what I do in my unit test!
System.setProperty("environment", "test")
And this is how I load the config in one of the files in my src folder:
val someEnv = Option(System.getProperty("environment", "")).filter(_.nonEmpty) // gives me some(test)
val name = s"application.${someEnv.get}.conf"
I can see that the environment variable is set and I get the environment passed it. But later on I load the application.test.conf as below:
ConfigFactory.load(name).resolve()
It however loads just the edfault application.conf and not the one that I specify!
What is wrong in my case? Where should I put the conf folder? I'm trying to run it against my unit test which is inside the test folder!
I believe you need to specify the full name of the configuration file. The .conf is optional. Try
ConfigFactory.load(s"application.${someEnv.get}").resolve()
The docs for ConfigFactory.load(String) indicate you need to supply
name (optionally without extension) of a resource on classpath
Ok! Here is what I had to do! Change the name of the folder where the config file is located. I originally had it as conf and I had to rename it to resources and bang it worked!
I have a Scala script that I want to call from sbt. This Scala script refers to some dependencies. One of those dependencies uses a properties file. This properties file is provided by the run time as this dependency is run as a separate application.
Just to have the possibility to run that property-using dependency as a standalone, I wrote this Scala script that I want to call from sbt.
val fis = new FileInputStream("my.properties") // Fails here
val props = new Properties()
When I run the above code, it fails with an exception in my dependency where the properties file is loaded.
How to make this properties file available to the script under sbt?
Place the file my.properties in src/main/resources and use Source.fromURL(getClass.getResource("/my.properties")) instead (as it gives you more flexibility in where you can place the file on file systems as long as it's on CLASSPATH).
As a helper, use the following code to learn about the place where the file is expected when a "bare" File* types are in use:
println(new java.io.File("my.properties").getAbsolutePath)
Since the current working directory is the top-level directory of a project, the file is searched in $PROJECT_ROOT_DIR/my.properties.
I'm just starting with Scala and have run into a problem that has me stumped, but I'm guessing that I'm missing something easy.
I was following instructions to use the Clapper ClassFinder:
http://thoughts.inphina.com/2011/09/15/building-a-plugin-based-architecture-in-scala/
val classpath = List("./plugins").map(new File(_))
val finder = ClassFinder(classpath)
val classes = finder.getClasses
val classMap = ClassFinder.classInfoMap(classes)
After executing the first line, I see that classpath is set simply to
List(.\plugins)
I'm running this on windows, so the swapping of the slash seems to be OK.
But I expected to see a list of File objects, although I am not sure about this Scala syntax, and perhaps I'm missing something in the Scala IDE. The value for classes shows an "empty iterator".
It seems not to be finding any files in the path that I specified. I tried using an absolute path, but I had the same results. I have a single jar file in the plugins directory that I'm hoping it will find. The plugins directory is at the root of the Play2 project I'm using.
Edit ---
I did find that when I explicitly list the path to one jar that it is able to find it:
val classpath = List("./plugins/myPlugin.jar").map(new File(_))
But I want to find all jar files in the directory.
The following didn't work:
val classpath = List("./plugins/*").map(new File(_))
Nor did this:
val classpath = List("./plugins/*.jar").map(new File(_))
Judging by this issue on the ClassFinder repo on Github it may be a bug.
I think you need to create an explicit list of jar files or to list the ones contained in your folder like:
val classpath =(new File("./plugins")).listFiles.filter(_.getName.endsWith(".jar"))
EDIT: from a cursory glance at ClassFinder's source on GitHub I think it's not a bug. ClassFinder searches for .class files either in jars or in zip files or directly in folders but it looks like it does not mix these things recursively (i.e. if you give it a folder it will look for classes directly in the folder but it won't look for classes in jars in the folder)
if you objective is to list all jar files, you can use following code:
val classpath = List("./plugins").map(path => Option(new File(path).listFiles).getOrElse(Array.empty[java.io.File]) filter(file => file.isFile && file.getName.endsWith(".jar"))).flatten
I have directory structure like this
src
main
resources
text.txt
scala
hello
world.scala
test
same as main folder
pom.xml
When in IDE (Intellij10), I could access it with relative path ("src/main/resource/text.txt") but it seems I can not do that when I compile in jar. How to read that file ?
also, I found that test.txt is copy into root of jar. Is this normal behavior ? Since I fear this will be clash with other resources file in src/test/resources.
thanks
From http://www.java-forums.org/advanced-java/5356-text-image-files-within-jar-files.html -
Once the file is inside the jar, you cannot access it with standard FileReader streams since it is treated as a resource. You will need to use Class.getResourceAsStream().
The test.txt being copied into the root is not normal behavior and is probably a setting with your IDE.
8 years later, I am also facing the same question. To ease the life of future developers, here is the answer:
Being copied into the root is normal behaviour, as:
the resources folder is like a src folder and so the content is
copied, not the folder itself.
Now concerning the how-to question:
import scala.io.Source
val name = "text.txt"
val source: Source = Source.fromInputStream(getClass.getClassLoader.getResourceAsStream(name))
// Add the new line character as a separator as by getLines removes it
val resourceAsString: String = source.getLines.mkString("\n")
// Don't forget to close
source.close