Cannot Use Relative Path in Scala.IO.Source - scala

I am trying to read a CSV file into Scala. I can read fine using the absolute path, but would like to be able to use the relative path.
val filename = "sample_data/yahoo_finance/AAPL/AAPL_Historical.csv"
for (line <- Source.fromFile(filename).getLines()) { println(line) }
throws the error:
java.io.FileNotFoundException: sample_data\yahoo_finance\AAPL\AAPL_Historical.csv
(The system cannot find the path specified)
However:
val filename = "C:/Users/hansb/Desktop/Scala Project/src/main/" +
"resources/sample_data/yahoo_finance/AAPL/AAPL_Historical.csv"
for (line <- Source.fromFile(filename).getLines()) { println(line) }
works just fine.
My understanding was that scala.io.Source knew to look in the resources folder for the relative path.
What am I missing?
Working code using Phasmid's suggestion:
val relativePath = "/sample_data/yahoo_finance/AAPL/AAPL_Historical.csv"
val csv = getClass.getResource(relativePath)
for (line <- Source.fromURL(csv).getLines()){ println(line) }

This is one of the worst things about Java (and, thus, Scala). I imagine many millions of hours have been spent on this kind of problem.
If you want to get a resource from a relative path (i.e. from the class path) you need to treat the resource as a resource. So, something like the following:
getClass.getResource("AAPL_Historical.csv")
while yields a URL which can then convert into a Stream, or whatever. This form will expect to find the resource in the same (relative) folder as the class, but in the resources rather than scala directory.
If you want to put the resource into the top level of the resources folder, then use:
getClass.getResource("/AAPL_Historical.csv")
It may be that there is some other magic which works but I haven't found it.

Related

'Undefined constant: "eq" simpdata.ML' while trying to load Imperative_Quicksort theory in scala-isabelle of Isabelle/HOL

I am trying to use https://github.com/dominique-unruh/scala-isabelle for loading and parsing https://isabelle.in.tum.de/library/HOL/HOL-Imperative_HOL/Imperative_Quicksort.html - Imperative_Quicksort.thy. I am using scala-isabelle code from IntelliJ (path to source is C:\Workspace-IntelliJ\scala-isabelle):
val isabelleHome = "C:\\Homes\\Isabelle2020\\Isabelle2020"
// Differs from example in README: we skip building to make tests faster
val setup = Isabelle.Setup(isabelleHome = Path.of(isabelleHome), logic = "HOL", build=false)
implicit val isabelle: Isabelle = new Isabelle(setup)
// Load the Isabelle/HOL theory "Main" and create a context object
//val ctxt = Context("Main")
//val ctxt = Context("Imperative_Quicksort")
//val ctxt = Context("C:\\Homes\\Isabelle2020\\Isabelle2020\\src\\HOL\\Imperative_HOL\\ex\\Imperative_Quicksort")
val ctxt = Context("HOL.Imperative_HOL.ex.Imperative_Quicksort")
Such configuration give strange error message for loading some required theories, e.g.
Exception in thread "main" de.unruh.isabelle.control.IsabelleException: No such file: "/cygdrive/c/Workspace-IntelliJ/scala-isabelle/Old_Datatype.thy"
The error(s) above occurred for theory "HOL-Library.Old_Datatype" (line 10 of "/cygdrive/c/Workspace-IntelliJ/scala-isabelle/Countable.thy")
(required by "HOL.Imperative_HOL.ex.Imperative_Quicksort" via "HOL.Imperative_HOL.ex.Imperative_HOL" via "HOL.Imperative_HOL.ex.Array" via "HOL.Imperative_HOL.ex.Heap_Monad" via "HOL.Imperative_HOL.ex.Heap" via "HOL-Library.Countable")
at de.unruh.isabelle.control.Isabelle.de$unruh$isabelle$control$Isabelle$$parseIsabelle(Isabelle.scala:268)
My guess is - that theories that are imported using quotes give such error messages and I resolve such error messages one-by-one by copying the required theories in my C:\Workspace-IntelliJ\scala-isabelle. Not good, but I am trying to load this theory, so - if it works, it is fine.
At the end simpldata.ML was required but there are 5 simpdata.ML in Isabelle source (ZF / sequents / HOL/Tools / FOL / FOLP). I copied from FOL (because simpdata.ML was required by FOL.thy) but now I have error message:
Exception in thread "main" de.unruh.isabelle.control.IsabelleException: Failed to load theory "ZF.Bool" (unresolved "ZF.pair")
...
Undefined constant: "eq" (line 12 of "/cygdrive/c/Workspace-IntelliJ/scala-isabelle/simpdata.ML")completionline=12offset=313end_offset=315file=/cygdrive/c/Workspace-IntelliJ/scala-isabelle/simpdata.MLid=258:2:::IFOL.eq::constant:IFOL.eq::Pure.eq::constant:Pure.eq
At command "ML_file" (line 11 of "/cygdrive/c/Workspace-IntelliJ/scala-isabelle/pair.thy")
at de.unruh.isabelle.control.Isabelle.de$unruh$isabelle$control$Isabelle$$parseIsabelle(Isabelle.scala:268)
I tried to copy other simpldata.ML but they gave similar messages for different undefined constants. So - whats wrong with eq? My guess is that is is very basic function.
How to resolve this undefined eq? By some other import? But FOL/simpdata.ML does not have any imports and neither the lack of some source file is reported. How to proceed from here?
My intention was to load Imperative_Quicksort as the Context scala-isabelle varible, then I will try to reflect/digest the resulting class tree of Context and I will use graph neural networks to encode this class tree (I suppose that is represents the abstract syntax tree of Imperative_Quicksort theory).
I am aware that there is Isabelle mailing group but this is pretty technical question, so - it can/should be resolved here in SO.
Facts added 1
simpdata.ML is just include file which is included in FOL.thy with
ML_file \<open>simpdata.ML\<close>
So, simpdata.ML uses the imports and definitions of the enclosing FOL.thy file and it has eq definition indeed (and which is some 100 lines before its use in simpdata.ML include):
ML \<open>
structure Blast = Blast
(
structure Classical = Cla
val Trueprop_const = dest_Const \<^const>\<open>Trueprop\<close>
val equality_name = \<^const_name>\<open>eq\<close>
val not_name = \<^const_name>\<open>Not\<close>
val notE = #{thm notE}
val ccontr = #{thm ccontr}
val hyp_subst_tac = Hypsubst.blast_hyp_subst_tac
);
val blast_tac = Blast.blast_tac;
\<close>
So, maybe there are problems with some load order...
It appeared that 2 theory files (pair.thy and FOL.thy - that I have copied to my IntelliJ workspace) used their respective simpdata.ML includes from their respective directories. So - I have copied simpdata.ML for pair.thy as simpdata_pair.ML and modified include command in pair.thy as:
ML_file \<open>simpdata_pair.ML\<close>
This resolved my issue and allowed me to proceed with importing theory is this funny manner.

Access resources inside library

I have a project L which is published as a library.
Inside L, there is a resources folder and a snippet of code that access the content.
I created a second module in L to add a main as a test, and it works.
Now when I include L into my other project, it doesnt find the folder located inside the resources of L.
in all the below path is equal to folder_i_am_looking_for and I generate varieties with /$path, $path/, $path and /$path/. ( I am desperate at this point )
I tried:
getClass.getClassLoader.getResource(path).getPath,
getClass.getClassLoader.getResource(path).toExternalForm,
getClass.getResource(path).getPath,
getClass.getResource(path).toExternalForm,
as well as
val location: URL = this.getClass.getProtectionDomain.getCodeSource.getLocation
Paths
.get(location.toURI)
.resolve("../classes/" + path)
.normalize()
.toFile
.getAbsoluteFile
.toString
from https://stackoverflow.com/a/54142960
and for each generated path, I've tried to :
input ::
input.split('!').last ::
input.split("file:/").last ::
input.split("file:").last ::
Nil
from https://stackoverflow.com/a/5346020
each time , i get path that looks like :
/path/to/projectFolder/jar:file:/path/to/library/lib_2.11-version.jar!/folder
I let you imagine all the variety of different path I get with all the above operations.
None of them find the directory.
I am testing all path simultaneously with ZIO.validateFirstPar so if you have any idea, I can add more to test.
Thank you.

How to check a file/folder is present using pyspark without getting exception

I am trying to keep a check for the file whether it is present or not before reading it from my pyspark in databricks to avoid exceptions? I tried below code snippets but i am getting exception when file is not present
from pyspark.sql import *
from pyspark.conf import SparkConf
SparkSession.builder.config(conf=SparkConf())
try:
df = sqlContext.read.format('com.databricks.spark.csv').option("delimiter",",").options(header='true', inferschema='true').load('/FileStore/tables/HealthCareSample_dumm.csv')
print("File Exists")
except IOError:
print("file not found")`
When i have file, it reads file and "prints File Exists" but when the file is not there it will throw "AnalysisException: 'Path does not exist: dbfs:/FileStore/tables/HealthCareSample_dumm.csv;'"
Thanks #Dror and #Kini. I run spark on cluster, and I must add sc._jvm.java.net.URI.create("s3://" + path.split("/")[2]), here s3 is the prefix of the file system of your cluster.
def path_exists(path):
# spark is a SparkSession
sc = spark.sparkContext
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(
sc._jvm.java.net.URI.create("s3://" + path.split("/")[2]),
sc._jsc.hadoopConfiguration(),
)
return fs.exists(sc._jvm.org.apache.hadoop.fs.Path(path))
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(sc._jsc.hadoopConfiguration())
fs.exists(sc._jvm.org.apache.hadoop.fs.Path("path/to/SUCCESS.txt"))
The answer posted by #rosefun worked for me but it took lot of time for me to get it working. So I am giving some details about how that solution is working and what are the stuffs you should avoid.
def path_exists(path):
# spark is a SparkSession
sc = spark.sparkContext
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(
sc._jvm.java.net.URI.create("s3://" + path.split("/")[2]),
sc._jsc.hadoopConfiguration(),
)
return fs.exists(sc._jvm.org.apache.hadoop.fs.Path(path))
The function is same and it works fine to check whether a file exists or not in the S3 bucket path that you provided.
You will have to change this function based on how you are specifying your path value to this function.
path = f"s3://bucket-name/import/data/"
pathexists = path_exists(path)
if the path variable that you are defining is having the s3 prefix in the path then it would work.
Also the portion of the code which split the string gets you just the bucket name as follows:
path.split("/")[2] will give you `bucket-name`
but if you don't have s3 prefix in the path then you will have to use the function by changing some code and which is as below:
def path_exists(path):
# spark is a SparkSession
sc = spark.sparkContext
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(
sc._jvm.java.net.URI.create("s3://" + path),
sc._jsc.hadoopConfiguration(),
)
return fs.exists(sc._jvm.org.apache.hadoop.fs.Path("s3://" + path))
Looks like you should change except IOError: to except AnalysisException:.
Spark throws different errors/exception than regular python in a lot of cases. It’s not doing typical python io operations when reading a file, so makes sense for it to throw a different exception.
nice to see you on StackOverFlow.
I second dijksterhuis's solution, with one exception -
Analysis Exception is very general exception in Spark, and may be resulted for various reasons, not only due to missing file.
If you want to check whether the file exists or not, you'll need to bypass Spark's FS abstraction, and access the storage system directly (Whether is s3, posix, or something else). The down side of this solution is the lack of abstraction - once you'll change your underlying FS, you'll need to change your code as well.
You can validate existence of a file as seen here:
import os
if os.path.isfile('/path/file.csv'):
print("File Exists")
my_df = spark.read.load("/path/file.csv")
...
else:
print("File doesn't exists")
dbutils.fs.ls(file_location)
Do not import dbutils. It's already there when you start your cluster.

Netcdf error when calling http in scala

I am trying to get http://dd.weather.gc.ca/ensemble/naefs/grib2/raw/12/018/CMC_naefs-geps-raw_RH_TGL_2m_latlon0p5x0p5_2018070712_P018_allmbrs.grib2 file with NetCDF:
def read(path: String): NetcdfDataset = {
NetcdfDataset.openDataset(path)
}
but I get
java.nio.file.InvalidPathException: Illegal char <:> at index 4:
http://dd.weather.gc.ca/ensemble/naefs/grib2/raw/12/018/CMC_naefs-geps-raw_RH_TGL_2m_latlon0p5x0p5_2018070712_P018_allmbrs.grib2
I have "edu.ucar" % "netcdfAll" % "4.6.3". What should I do to get this file? I already tried to load grib2 file from disk with this method and it goes OK.
It looks like the function NetcdfDataset.openDataset doesn't accept URLs, but only local paths. I suggest you download the .grib2 file to your computer and then pass the path to the downloaded file to the openDataset function,
e.g.
NetcdfDataset.openDataset("/home/kuba/Downloads/CMC_naefs-geps-raw_RH_TGL_2m_latlon0p5x0p5_2018070712_P018_allmbrs.grib2")

Gatling where to place JSON files?

I'm having issues with loading a JSON file in Gatling. It works with an absolute path but not with a relative. Where should JSON files be stored? I've tried /home/dev/gatling-charts-highcharts-bundle-2.3.0/user-files/data but the file could not be found.
Piece of my code:
def addCredential(status_code: Option[Seq[Int]], username: Option[String]) = {
feed(random_user)
.exec(http("[POST] /users/[user]/credentials")
.post("/users/%s/credentials".format(username getOrElse "${username}"))
.body(RawFileBody("credential.json")).asJSON
.check(status.in(202, 404, 409)))
}
The file credential.json can be found if I give the absolute path but this is not optimal because several people use the simulations.
You can configure the folder where the bodies are located in gatling.conf.
directory {
bodies = user-files/bodies # Folder where bodies are located
}
Then you can put your file in the configured path /your-project/user-files/bodies/credential.json.