How to read text files from sub directory in scala - scala

i have sub directory under resources (Memory)
src
test
resources
Memory
where i put all my text files, i know how to read file from resources directly with:
val sdf = "file1.txt"
val file:File = new File("file1.txt")
....
but using sub directory i dont know how. i am trying to do something like
val path = scala.reflect.io.Path("src/test/resources/Memory/file1.txt")
but i get java.lang.NullPointerException: null

Related

Creating temporary resource test files in Scala

I am currently writing tests for a function that takes file paths and loads a dataset from them. I am not able to change the function. To test it currently I am creating files for each run of the test function. I am worried that simply making files and then deleting them is a bad practice. Is there a better way to create temporary test files in Scala?
import java.io.{File, PrintWriter}
val testFile = new File("src/main/resources/temp.txt" )
val pw = new PrintWriter(testFile)
val testLines = List("this is a text line", "this is the next text line")
testLines.foreach(pw.write)
pw.close
// test logic here
testFile.delete()
I would generally prefer java.nio over java.io. You can create a temporary file like so:
import java.nio.Files
Files.createTempFile()
You can delete it using Files.delete. To ensure that the file is deleted even in the case of an error, you should put the delete call into a finally block.

Adding to an existing zip file in scala

I have a zip file in hdfs and i need to add a file to the zip and save in the same HDFS location. Any examples would be appreciated.
I have the following code.
val filePattern = s"${hdfsFolderPath}/${filePath}.txt"
val zipFilePath = hdfsWrapper.getFileNameFromPattern(s"${targetFilePath}/*.zip")
if (hdfsWrapper.filter(filePattern).size() > 0)
{
Try
{
val zipEntry = new ZipEntry(filePattern)
val zos: ZipOutputStream = new ZipOutputStream(new FileOutputStream(zipFilePath))
zos.putNextEntry(zipEntry)
zos.closeEntry()
zos.close()
}
}
Would like to know if above code is right?
I believe that your code would result in the zip file being replaced with a new one containing just the new file. See Appending files to a zip file with Java for an example of adding a file to a zip archive.
I'm not extremely familiar with HDFS, but I suspect that you can't write to that directly either, you probably have to create the new zip file and then replace it in HDFS

Cannot Use Relative Path in Scala.IO.Source

I am trying to read a CSV file into Scala. I can read fine using the absolute path, but would like to be able to use the relative path.
val filename = "sample_data/yahoo_finance/AAPL/AAPL_Historical.csv"
for (line <- Source.fromFile(filename).getLines()) { println(line) }
throws the error:
java.io.FileNotFoundException: sample_data\yahoo_finance\AAPL\AAPL_Historical.csv
(The system cannot find the path specified)
However:
val filename = "C:/Users/hansb/Desktop/Scala Project/src/main/" +
"resources/sample_data/yahoo_finance/AAPL/AAPL_Historical.csv"
for (line <- Source.fromFile(filename).getLines()) { println(line) }
works just fine.
My understanding was that scala.io.Source knew to look in the resources folder for the relative path.
What am I missing?
Working code using Phasmid's suggestion:
val relativePath = "/sample_data/yahoo_finance/AAPL/AAPL_Historical.csv"
val csv = getClass.getResource(relativePath)
for (line <- Source.fromURL(csv).getLines()){ println(line) }
This is one of the worst things about Java (and, thus, Scala). I imagine many millions of hours have been spent on this kind of problem.
If you want to get a resource from a relative path (i.e. from the class path) you need to treat the resource as a resource. So, something like the following:
getClass.getResource("AAPL_Historical.csv")
while yields a URL which can then convert into a Stream, or whatever. This form will expect to find the resource in the same (relative) folder as the class, but in the resources rather than scala directory.
If you want to put the resource into the top level of the resources folder, then use:
getClass.getResource("/AAPL_Historical.csv")
It may be that there is some other magic which works but I haven't found it.

How to save RDD data into json files, not folders

I am receiving the streaming data myDStream (DStream[String]) that I want to save in S3 (basically, for this question, it doesn't matter where exactly do I want to save the outputs, but I am mentioning it just in case).
The following code works well, but it saves folders with the names like jsonFile-19-45-46.json, and then inside the folders it saves files _SUCCESS and part-00000.
Is it possible to save each RDD[String] (these are JSON strings) data into the JSON files, not the folders? I thought that repartition(1) had to make this trick, but it didn't.
myDStream.foreachRDD { rdd =>
// datetimeString = ....
rdd.repartition(1).saveAsTextFile("s3n://mybucket/keys/jsonFile-"+datetimeString+".json")
}
AFAIK there is no option to save it as a file. Because it's a distributed processing framework and it's not a good practice write on single file rather than each partition writes it's own files in the specified path.
We can pass only output directory where we wanted to save the data. OutputWriter will create file(s)(depends on partitions) inside specified path with part- file name prefix.
As an alternative to rdd.collect.mkString("\n") you can use hadoop Filesystem library to cleanup output by moving part-00000 file into it's place. Below code works perfectly on local filesystem and HDFS, but I'm unable to test it with S3:
val outputPath = "path/to/some/file.json"
rdd.saveAsTextFile(outputPath + "-tmp")
import org.apache.hadoop.fs.Path
val fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration)
fs.rename(new Path(outputPath + "-tmp/part-00000"), new Path(outputPath))
fs.delete(new Path(outputPath + "-tmp"), true)
For JAVA I implemented this one. Hope it helps:
val fs = FileSystem.get(spark.sparkContext().hadoopConfiguration());
File dir = new File(System.getProperty("user.dir") + "/my.csv/");
File[] files = dir.listFiles((d, name) -> name.endsWith(".csv"));
fs.rename(new Path(files[0].toURI()), new Path(System.getProperty("user.dir") + "/csvDirectory/newData.csv"));
fs.delete(new Path(System.getProperty("user.dir") + "/my.csv/"), true);

Can't write into a .txt file in netbeans

I have a file called SAVE.txt. It is in the same package as the class k. The problem is I can't write anything in the .txt file using the following code inside k:
File saveButton = new File ("SAVE.txt");
BufferedWriter output = new BufferedWriter (new FileWriter (saveButton));
output.write("something");
output.close();
Can anyone help me with this?
bw = new BufferedWriter(new FileWriter("filepath",true));
bw.write("Hello World!");
bw.write("\n");
bw.write("Hello World 2 !\n");
bw.write("Hello World 3 !" + "\n");
bw.close();
Try this?
Did you try something easy like this:
FileWriter f = new FileWriter("test.txt");
f.write("hello");
f.close();
When you write new File ("SAVE.txt"), since you specified a relative path, it refers to a file SAVE.txt in the current working directory. The current directory is in general completely separate from the directory corresponding to your Java package.
When you run code in Netbeans, it should be possible to specify the working directory (look in the project settings). Set it to some well-defined location, like the root of your project. Now specify the path relative to that working directory. For example, you could use new File ("out/SAVE.txt").