Loading files from JAR in Scala - scala

I have the following code structure:
Projects/
classes/
performance/AcPerformance.class
resources/
Aircraft/
allAircraft.txt
I have the contents of the classes folder in a JAR and my AcPerformance scala code is trying to read the contents of the Aircraft folder text files. My code:
val AircraftPerf = getClass.getResource("resources/Aircraft").getFile
val dataDir = new File(AircraftPerf)
val acFile = new File(dataDir, "allAircraft.txt")
for (line <- linesFromResource(acFile)) {
// read in lines
}
When I try to run the code I get the following error:
Caused by: java.io.FileNotFoundException: C:\Projects\file:\C:\Projects\libraries\aircraft.jar!\Aircraft\allAircraft.txt (The filename, directory name, or volume label syntax is incorrect)
Is this the correct way to read the contents of a JAR? THanks!

No, URL's getFile isn't going to do what you want here—the path it gives you isn't a file system path that you could use in a File constructor. You'd be best off using getResourceAsStream and the full path to the resource:
val in = getClass.getResourceAsStream("/resources/Aircraft/allAircraft.txt")
Note that you also need to preface the path with / to make it absolute—in your current version you're looking for a resources directory under performance.

Related

Merging configurations for spark using typesafe library and extraJavaOptions

I'm trying to merge 2 config file (or create a config file based on a single reference file) using
lazy val finalConfig:
Option(System.getProperty("user.resource"))
.map(ConfigFactory.load)
.map(_.withFallback(ConfigFactory.load(System.getProperty("config.resource"))).resolve())
.getOrElse(ConfigFactory.load(System.getProperty("config.resource")))
I'm defining my java variable inside spark using spark-submit ....... --conf spark.driver.extraJavaOptions=-Dconfig.resource=./reference.conf,-Duser.resource=./user.conf ...
My goal is to be able to point a file that is not inside my jar to be used by System.getProperty("..") in my code. I changed the folder for testing (cd ..) and keep getting the same error so I guess spark doesn't care about my java arguments..?
Is there a way to point to a file (or even 2 files in my case) so that they can be merged?
I also tried to include the reference.conf file but not the user.conf file: it recognizes the reference.conf but not the user.conf that i gave with --conf spark.driver.extraJavaOptions=-Duser.resource=./user.conf .
Is there a way to do that? Thanks if you can help
I don't see you doing ConfigFactory.parseFile to loaded a file containing properties.
Typesafe automatically read any .properties file in the class path, all -D parameters passed in to the JVM and then merges them.
I am reading an external property file which is not part of the jar as following. The file "application.conf" is placed on the same directory where the jar is kept.
val applicationRootPath = System.getProperty("user.dir")
val config = Try {
ConfigFactory.parseFile(new File(applicationRootPath + "/" + "application.conf"))
}.getOrElse(ConfigFactory.empty())
appConfig = config.withFallback(ConfigFactory.load()).resolve
ConfigFactory.load() already contains all the properties present on the properties files in the class path and -d parameters. I am giving priority to my external "application.conf" and falling back on default values. For matching keys "application.conf" take precedence over other sources.

Spark-SQL: access file in current worker node directory

I need to read a file using spark-sql, and the file is in the current directory.
I use this command to decompress a list of files I have stored on HDFS.
val decompressCommand = Seq(laszippath, "-i", inputFileName , "-o", "out.las").!!
The file is outputted in the current worker node directory, and I know this because executing "ls -a"!! through scala I can see that the file is there. I then try to access it with the following command:
val dataFrame = sqlContext.read.las("out.las")
I assumed that the sql context would try to find the file in the current directory, but it doesn't. Also, it doesn't throw an error but a warning stating that the file could not be found (so spark continues to run).
I attempted to add the file using: sparkContext.addFile("out.las") and then access the location using: val location = SparkFiles.get("out.las") but this didn't work either.
I even ran the command val locationPt = "pwd"!! and then did val fullLocation = locationPt + "/out.las" and attempted to use that value but it didn't work either.
The actual exception that gets thrown is the following:
User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'x' given input columns: [];
org.apache.spark.sql.AnalysisException: cannot resolve 'x' given input columns: []
And this happens when I try to access column "x" from a dataframe. I know that column 'X' exists because I've downloaded some of the files from HDFS, decompressed them locally and ran some tests.
I need to decompress files one by one because I have 1.6TB of data and so I cannot decompress it at one go and access them later.
Can anyone tell me what I can do to access files which are being outputted to the worker node directory? Or maybe should I be doing it some other way?
So I managed to do it now. What I'm doing is I'm saving the file to HDFS, and then retrieving the file using the sql context through hdfs. I overwrite "out.las" each time in HDFS so that I don't have take too much space.
I have used the hadoop API before to get to files, I dunno if it will help you here.
val filePath = "/user/me/dataForHDFS/"
val fs:FileSystem = FileSystem.get(new java.net.URI(filePath + "out.las"), sc.hadoopConfiguration)
And I've not tested the below, but I'm pretty sure I'm passing the java array to scala illegally. But just giving an idea of what to do afterward.
var readIn: Array[Byte] = Array.empty[Byte]
val fileIn: FSDataInputStream = fs.open(file)
val fileIn.readFully(0, readIn)

Unable to access file in relative path in Scala for test resource

I have setup my SCALA project using Maven, now I am writing test and need to access a file under sub directory of resource path is like:
src/test/resource/abc/123.sql
Now I am doing following:
val relativepath = "/setup/setup/script/test.sql"
val path = getClass.getResource(relativepath).getPath
println(path)
but this is pointing to src/main/resource folder instead of test resource, anyone has the idea what I am doing wrong here?
Just like in Java, it is a good practice to put your resource files under src/main/resources and src/test/resources, as Scala provides a nice API from retrieving resource files.
Considering you put your test.sql file under src/test/resources/setup/setup/script/test.sql, you can easily read the file by doing the following:
Scala 2.12
import scala.io.Source
val relativePath = "setup/setup/script/test.sql"
val sqlFile : Iterator[String] = Source.fromResource(relativePath).getLines
Prior Scala versions
import scala.io.Source
val relativePath = "setup/setup/script/test.sql"
val stream : InputStream = getClass.getResourceAsStream(relativePath)
val sqlFile : Iterator[String] = Source.fromInputStream(stream).getLines
Doing so, you can even have the same file put under the same relative path in src/main/resources. When trying to access the resource file in a test, the file from the src/test/resources will be considered.
I hope this is helpful.

How to do File creation and manipulation in functional style?

I need to write a program where I run a set of instructions and create a file in a directory. Once the file is created, when the same code block is run again, it should not run the same set of instructions since it has already been executed before, here the file is used as a guard.
var Directory: String = "Dir1"
var dir: File = new File("Directory");
dir.mkdir();
var FileName: String = Directory + File.separator + "samplefile" + ".log"
val FileObj: File = new File(FileName)
if(!FileObj.exists())
// blahblah
else
{
// set of instructions to create the file
}
When the programs runs initially, the file won't be present, so it should run the set of instructions in else and also create the file, and after the first run, the second run it should exit since the file exists.
The problem is that I do not understand new File, and when the file is created? Should I use file.CreateNewFile? Also, how to write this in functional style using case?
It's important to understand that a java.io.File is not a physical file on the file system, but a representation of a pathname -- per the javadoc: "An abstract representation of file and directory pathnames". So new File(...) has nothing to do with creating an actual file - you are just defining a pathname, which may or may not correspond to an existing file.
To create an empty file, you can use:
val file = new File("filepath/filename")
file.createNewFile();
If running on JRE 7 or higher, you can use the new java.nio.file API:
val path = Paths.get("filepath/filename")
Files.createFile(path)
If you're not happy with the default IO APIs, you an consider a number of alternative. Scala-specific ones that I know of are:
scala-io
rapture.io
Or you can use libraries from the Java world, such as Google Guava or Apache Commons IO.
Edit: One thing I did not consider initially: I understood "creating a file" as "creating an empty file"; but if you intend to write something immediately in the file, you generally don't need to create an empty file first.

Can't load a file in Play (always not found)

I can't load a file in Play:
val filePath1 = "/views/layouts/mylayout.scala.html"
val filePath2 = "views/layouts/mylayout.scala.html"
Play.getExistingFile(filePath1)
Play.getExistingFile(filePath2)
Play.resourceAsStream(filePath1)
Play.resourceAsStream(filePath2)
None of these works, they all return None.
You are essentially trying to read a source file at runtime. Which is not something you should usually do. If you want to read a file at runtime then I'd recommend putting it somewhere that will end up in the classpath and then use Play.resourceAsStream to read the file. The files in the conf directory and non-compiled files in the app dir should end up in the classpath.