Private assets in Play 2.1 - scala

In a Play 2.1 application, where is the proper place to store private assets?
By "private asset", I mean a data file that is used by the application but not accessible to the user.
For example, if I have a text file (Foo.json) that contains sample data that is parsed every time the application starts, what would be the proper directory in the project to store it?
Foo.json needs to be included in the deployment, and needs to be uniformly accessible from the code in both development and production.

Some options:
Usually the files goes to conf folder. ie: conf/privatefiles/Foo.json
If they are subject of often change you can consider adding to your application.conf path to the external folder somwhere in the filesystem (full path), in such case you'll be able to edit the content easily without redeploying the apps: /home/scrapdog/privatefiles/Foo.json
You can store them in database as well, benefits are the same as in previous option - easy editing.
In all cases consider using memory cache to avoid reading it from filesystem/database every time when required.

I simply use a folder called data at the application root. You can use the name you want or better, store the actual name in the configuration file.
To resolve its path, I use the following snippet:
lazy val rootPath = {
import play.api.Play.current
play.api.Play.application.path.getPath
}
lazy val dataPath = rootPath + "/data/"

You can do what I did, I got the answer from #Marius Soutier here. Please upvote his answer there if you like it:
You can put "internal" documents in the conf folder, it's the equivalent to resources in standard sbt projects.
Basically create a dir under conf called json and to access it, you'd use Play.resourceAsStream(). Note that this gives you a java.io.InputStream because your file will be part of the JAR created by activator dist.
My example is using it in a view but you can modify it as you want.
Play.resourceAsStream("json/Foo.json") map { inputStream =>
Ok(views.html.xxx(XXX.do_something_with_stream(inputStream)))
} getOrElse (InternalServerError)
You can also use Play.resource(), this will give you a java.net.URL, you can use getFile() to get the java.io.File out of it.
Play.resource("json/Foo.json") map { fileURL =>
Ok(views.html.xxx(XXX.do_something_with_file(fileURL.getFile())))
} getOrElse (InternalServerError)

Related

Load CSV file as dataframe from resources within an Uber Jar

So, I made an Scala Application to run in Spark, and created the Uber Jar using sbt> assembly.
The file I load is a lookup needed by the application, thus the idea is to package it together. It works fine from within InteliJ using the path "src/main/resources/lookup01.csv"
I am developing in Windows, testing locally, to after deploy it to a remote test server.
But when I call spark-submit on the Windows machine, I get the error :
"org.apache.spark.sql.AnalysisException: Path does not exist: file:/H:/dev/Spark/spark-2.4.3-bin-hadoop2.7/bin/src/main/resources/"
Seems it tries to find the file in the sparkhome location instead of from inside the JAr file.
How could I express the Path so it works looking the file from within the JAR package?
Example code of the way I load the Dataframe. After loading it I transform it into other structures like Maps.
val v_lookup = sparkSession.read.option( "header", true ).csv( "src/main/resources/lookup01.csv")
What I would like to achieve is getting as way to express the path so it works in every environment I try to run the JAR, ideally working also from within InteliJ while developing.
Edit: scala version is 2.11.12
Update:
Seems that to get a hand in the file inside the JAR, I have to read it as a stream, the bellow code worked, but I cant figure out a secure way to extract the headers of the file such as SparkSession.read.option has.
val fileStream = scala.io.Source.getClass.getResourceAsStream("/lookup01.csv")
val inputDF = sparkSession.sparkContext.makeRDD(scala.io.Source.fromInputStream(fileStream).getLines().toList).toDF
When the makeRDD is applied, I get the RDD and then can convert it to a dataframe, but it seems I lost the ability tu use the option from "read" that parsed out the headers as the schema.
Any way around it when using makeRDD ?
Other problem with this is that seems that I will have to manually parse the lines into columns.
You have to get the correct path from classPath
Considering that your file is under src/main/resources:
val path = getClass.getResource("/lookup01.csv")
val v_lookup = sparkSession.read.option( "header", true ).csv(path)
So, it all points to that after the file is inside JAR, it can only be accessed as a inputstream to read the chunk of data from within the compressed file.
I arrived at a solution, even though its not pretty it does what I need, that is to read a csv file, take the 2 first columns and make it into a dataframe and after load it inside a key-value structure (in this case i created a case class to hold these pairs).
I am considering migrating these lookups to a HOCON file, that may make the process less convoluted to load these lookups
import sparkSession.implicits._
val fileStream = scala.io.Source.getClass.getResourceAsStream("/lookup01.csv")
val input = sparkSession.sparkContext.makeRDD(scala.io.Source.fromInputStream(fileStream).getLines().toList).toDF()
val myRdd = input.map {
line =>
val col = utils.Utils.splitCSVString(line.getString(0))
KeyValue(col(0), col(1))
}
val myDF = myRdd.rdd.map(x => (x.key, x.value)).collectAsMap()
fileStream.close()

"How to embed resources" or "How to access a Resource"

I am struggling with embedded resources or resources in general with Dynamics365. My goal is to add a xml-file as resource to a model and use that resource in some testcode.
I tried to add the xml as resource-element but it seems this does not embedd the xml into the compiled dll so i don't know how to pick up that xml-file in my testcode. Currently my testcode loads the xml from "C:\Temp\test.xml" where i copied my xml to, but thats not a viable solution and i thought adding the xml as resource would be ok. Or is there a better approach to this scenario ?
You can use class SysResource to interact with resources. I used the following code in one of my unit tests to load the content of a file resource into a file and create a CommaStreamIo instance from that file. You should be able to modify that to do your stuff with an xml file.
ResourceNode textFileResourceNode = SysResource::getResourceNode(resourceStr(MyTextFileResourceName));
str textFilename = SysResource::saveToTempFile(textFileResourceNode);
CommaStreamIo commaStreamIo = CommaStreamIo::constructForRead(File::UseFileFromURL(textFilename));
Also take a look at reading a resource into a string.
You could also take a look at how some of the standard resources are used. For example, there are several .xslt file resources that are used to transform bank statement formats.

Is there a way to know filenames generated with MultiResourceItemWriter?

I'm writing a spring-batch application with spring-boot support and I'm looking for a way to know which files were generated by MultiResourceItemWriter. The first solution I have in mind is to have a folder for only the files generated and check the content, but if there is something already implemented on spring-batch would be great!
The intention is to encrypt and then upload each file to an sftp server.
The file names generated by the MultiResourceItemWriter are the combination of the resource name + the suffix created by the ResourceSuffixCreator. For example, if you create the writer like the following:
MultiResourceItemWriter<String> writer = new MultiResourceItemWriter<>();
writer.setResource(new FileSystemResource(new File("data.txt")));
writer.setResourceSuffixCreator(index -> "part" + index);
Then the generated files will be data.txt.part1, data.txt.part2, etc.
MultiResourceItemWriter doesn't perform write directly but delegate this job to other components.
All those components are ResourceAwareItemWriterItemStream implementors so you may write a ResourceAwareItemWriterItemStreamDelegate, intercept setResource() method and store resource into current step execution-context as a collection.
If you want to pass this list of resources to next steps you may use an ExecutionContextPromotionListener.

How to load a spark-nlp pre-trained model from disk

From the spark-nlp Github page I downloaded a .zip file containing a pre-trained NerCRFModel. The zip contains three folders: embeddings, fields, and metadata.
How do I load that into a Scala NerCrfModel so that I can use it? Do I have to drop it into HDFS or the host where I launch my Spark Shell? How do I reference it?
you just need to provide the path where the folders you mentioned are contained,
import com.johnsnowlabs.nlp.annotators.ner.crf.NerCrfModel
val path = "path/to/unziped/file/folder"
val model = NerCrfModel.read.load(path)
// use your model
model.setInputCols(someCol)
model.transform(yourData) // which contains 'someCol',
As long as I remember, you can place the folder in local FS or distributed FS, hope this helps other users as well!.
best,
Alberto.

Exporting an JAR file in Eclipse and referencing a file

I have a project with a image stored as a logo that I wish to use.
URL logoPath = new MainApplication().getClass().getClassLoader().getResource("img/logo.jpg");
Using that method I get the URL for the file and convert it to string. I then have to substring that by 5 to get rid of this output "file:/C:/Users/Stephen/git/ILLA/PoC/bin/img/logo.jpg"
However when I export this as a jar and run it I run into trouble. The URL now reads /ILLA.jar!/ and my image is just blank. I have a gut feeling that it's tripping me up so how do I fix this?
Cheers
You are almost there.
Images in a jar are treated as resources. You need to refer to them using the classpath
Just use getClass().getResource: something like:
getClass().getResource("/images/logo.jpg"));
where "images" is a package inside the jar file, with the path as above
see the leading / in the call - this will help accessing the path correctly (using absolute instead of relative). Just make sure the path is correct
Also see:
How to includes all images in jar file using eclipse
See here: Create a file object from a resource path to an image in a jar file
String imgName = "/resources/images/image.jpg";
InputStream in = getClass().getResourceAsStream(imgName);
ImageIcon img = new ImageIcon(ImageIO.read(in));
Note it looks like you need to use a stream for a resource inside an archive.