How to delete the last line of the file in scala? - scala

I am trying to append to a file such that I first want to delete the last line and then start appending. But, I can't figure how to delete the last line of the file.
I am appending the file as follows:
val fw = new FileWriter("src/file.txt", true) ;
fw.write("new item");
Can anybody please help me?
EDIT:
val lines_list = Source.fromFile("src/file.txt").getLines().toList
val new_lines = lines_list.dropRight(1)
val pw = new PrintWriter(new File("src/file.txt" ))
(t).foreach(pw.write) pw.write("\n")
pw.close()
After following your method, I am trying to write back to the file, but when I do this, all the contents, with the last line deleted come in a single line, however I want them to come in separate lines.

For very large files a simple solution relies in OS related tools, for instance sed (stream editor), and so consider a call like this,
import sys.process._
Seq("sed","-i","$ d","src/file1.txt")!
which will remove the last line of the text file. This approach is not so Scalish yet it solves the problem without leaving Scala.

Return random access file in position without last line.
import java.io.{RandomAccessFile, File}
def randomAccess(file: File) = {
val random = new RandomAccessFile(file, "rw")
val result = findLastLine(random, 0, 0)
random.seek(result)
random
}
def findLastLine(random: RandomAccessFile, position: Long, previous: Long): Long = {
val pointer = random.getFilePointer
if (random.readLine == null) {
previous
} else {
findLastLine(random, previous, pointer)
}
}
val file = new File("build.sbt")
val random = randomAccess(file)
And test:
val line = random.readLine()
logger.debug(s"$line")

My scala is way off, so people can probably give you a nicer solution:
import scala.io.Source
import java.io._
object Test00 {
def main(args: Array[String]) = {
val lines = Source.fromFile("src/file.txt").getLines().toList.dropRight(1)
val pw = new PrintWriter(new File("src/out.txt" ))
(lines :+ "another line").foreach(pw.println)
pw.close()
}
}
Sorry for the hardcoded appending, i used it just to test that everything worked fine.

Related

remove header from csv while reading from from txt or csv file in spark scala

I am trying to remove header from given input file. But I couldn't make it.
Th is what I have written. Can someone help me how to remove headers from the txt or csv file.
import org.apache.spark.{SparkConf, SparkContext}
object SalesAmount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(getClass.getName).setMaster("local")
val sc = new SparkContext(conf)
val salesRDD = sc.textFile(args(0),2)
val salesPairRDD = salesRDD.map(rec => {
val fieldArr = rec.split(",")
(fieldArr(1), fieldArr(3).toDouble)
})
val totalAmountRDD = salesPairRDD.reduceByKey(_+_).sortBy(_._2,false)
val discountAmountRDD = totalAmountRDD.map(t => {
if (t._2 > 1000) (t._1,t._2 * 0.9)
else t
})
discountAmountRDD.foreach(println)
}
}
Skipping the first row when manually parsing text files using the RDD API is a bit tricky:
val salesPairRDD =
salesRDD
.mapPartitionsWithIndex((i, it) => if (i == 0) it.drop(1) else it)
.map(rec => {
val fieldArr = rec.split(",")
(fieldArr(1), fieldArr(3).toDouble)
})
The header line will be the first item in the first partition, so mapPartitionsWithIndex is used to iterate over the partitions and to skip the first item if the partition index is 0.

Not able to read file in scala

Link of my screenshot I am a beginner in Scala, trying to read the file but getting the java.io.FileNotFoundException,can someone help.
package standardscala
case class TempData(day :Int,doy :Int, month:Int, year :Int, precip :Double, snow :Double, tave :Double, tmax :Double, tmin :Double )
object TempData {
def main(args: Array[String]): Unit = {
val source = scala.io.Source.fromFile("DATA/MN212.csv")
val lines = source.getLines().drop(1) // to get the lines of files,drop(1) to drop the header
val data= lines.map { line => val p = line.split(",")
TempData(p(0).toInt,p(1).toInt,p(2).toInt,p(4).toInt,p(5).toDouble,p(6).toDouble,p(7).toDouble,p(8).toDouble,p(9).toDouble)
}.toArray
source.close() //Closing the connection
data.take(5) foreach println
}
}
Try to use absolute path, and the problem will disappear.
One option would be to move your csv file into a resources folder and load it as a resource like:
val f = new File(getClass.getClassLoader.getResource("your/csv/file.csv").getPath)
Or you could try loading it from an absolute path!
Please read this post about reading CSV by Alvin Alexander, writer of the Scala Cookbook:
object CSVDemo extends App {
println("Month, Income, Expenses, Profit")
val bufferedSource = io.Source.fromFile("/tmp/finance.csv")
for (line <- bufferedSource.getLines) {
val cols = line.split(",").map(_.trim)
// do whatever you want with the columns here
println(s"${cols(0)}|${cols(1)}|${cols(2)}|${cols(3)}")
}
bufferedSource.close
}
As Silvio Manolo pointed out, you should not use fromFile with absolute path, as you code will require the same file hierarchy to run. In a first draft, this is acceptable so you can move on and test the real job!

Read from GZIPInputStream to String without using Source

I am using Scala. I need to read a large gzip file and turn it into string. And I need to remove the first line.
This is how I read the file:
val fis = new FileInputStream(filename)
val gz = new GZIPInputStream(fis)
And then I tried with this Source.fromInputStream(gz).getLines.drop(1).mkString("")
. But it causes out of memory error.
Therefore, I think of reading line by line and maybe put it into byte array. Then I can just convert it into a single String in the end.
But I have no idea how to do this. Any suggestion? Or any better method is also welcome.
If your gzipped file is huge, you can go with BufferedReader. Here is an example. It copies all chars from gzipped file to uncompressed, but it skips the first line.
import java.util.zip.GZIPInputStream
import java.io._
import java.nio.charset.StandardCharsets
import scala.annotation.tailrec
import scala.util.Try
val bufferSize = 4096
val pathToGzFile = "/tmp/text.txt.gz"
val pathToOutputFile = "/tmp/text_without_first_line.txt"
val charset = StandardCharsets.UTF_8
val inStream = new FileInputStream(pathToGzFile)
val outStream = new FileOutputStream(pathToOutputFile)
try {
val inGzipStream = new GZIPInputStream(inStream)
val inReader = new InputStreamReader(inGzipStream, charset)
val outWriter = new OutputStreamWriter(outStream, charset)
val bufferedReader = new BufferedReader(inReader)
val closeables = Array[Closeable](inGzipStream, inReader,
outWriter, bufferedReader)
// Read first line, so copy method will not get this - it will be skipped
val firstLine = bufferedReader.readLine()
println(s"First line: $firstLine")
#tailrec
def copy(in: Reader, out: Writer, buffer: Array[Char]): Unit = {
// Copy while it's not end of file
val readChars = in.read(buffer, 0, buffer.length)
if (readChars > 0) {
out.write(buffer, 0, readChars)
copy(in, out, buffer)
}
}
// Copy chars from bufferReader to outWriter using buffer
copy(bufferedReader, outWriter, Array.ofDim[Char](bufferSize))
// Close all closeabes
closeables.foreach(c => Try(c.close()))
}
finally {
Try(inStream.close())
Try(outStream.close())
}

Writing data generated in scala to a text file

I was hoping somebody could help, I'm new to scala and I'm having some issues writing my output to a text file.
I have a data table and I've written some code to read it in one line at a time, do what I want it to do, and now I need it to write that line to a text file.
So for example, I have the following table of data type
Name, Date, goX, goY, stopX, stopY
1, 12/01/01, 1166, 2299, 3300, 4477
My code, takes the first characters of goX and goY and creates a new number, in this instance 1.2 and does the same for stopX and stopY so in this case you get 3.4
What I want to get in the text file is essentially the following:
go, stop
1.2, 3.4
and I want it to go through hundreds of lines doing this until I have a long list of on and off in the text file.
My current code is as follows, this is almost certainly not the most elegant solution but it is my first ever scala/java code:
import scala.io.Source
object FT2 extends App {
for(line<-Source.fromFile("C://Users//Data.csv").getLines){
var array = line.split(",")
val gox = (array(2));
val xStringGo = gox.toString
val goX =xStringGo.dropRight(1|2)
val goy = (array(3));
val yStringGo = goy.toString
val goY = yStringGo.dropRight(1|2)
val goXY = goX+"."+goY
val stopx = (array(4));
val xStringStop = stopx.toString
val stopX =xStringStop.dropRight(1|2)
val stopy = (array(3));
val yStringStop = stopy.toString
val stopY = yStringStop.dropRight(1|2)
val stopXY = stopX+"."+stopY
val GoStop = List(goXY,stopXY)
//This is where I want to print GoStop to a text file
}
Any help is much appreciated!
This should do it:
import java.io._
val data = List("everything", "you", "want", "to", "write", "to", "the", "file")
val file = "whatever.txt"
val writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file)))
for (x <- data) {
writer.write(x + "\n") // however you want to format it
}
writer.close()
But you can make it a little nicer by creating a method that will automatically close stuff for you:
def using[T <: Closeable, R](resource: T)(block: T => R): R = {
try { block(resource) }
finally { resource.close() }
}
using(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file)))) {
writer =>
for (x <- data) {
writer.write(x + "\n") // however you want to format it
}
}
So:
using(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("output.txt")))) {
writer =>
for(line <- io.Source.fromFile("input.txt").getLines) {
writer.write(line + "\n") // however you want to format it
}
}

How to close enumerated file?

Say, in an action I have:
val linesEnu = {
val is = new java.io.FileInputStream(path)
val isr = new java.io.InputStreamReader(is, "UTF-8")
val br = new java.io.BufferedReader(isr)
import scala.collection.JavaConversions._
val rows: scala.collection.Iterator[String] = br.lines.iterator
Enumerator.enumerate(rows)
}
Ok.feed(linesEnu).as(HTML)
How to close readers/streams?
There is a onDoneEnumerating callback that functions like finally (will always be called whether or not the Enumerator fails). You can close the streams there.
val linesEnu = {
val is = new java.io.FileInputStream(path)
val isr = new java.io.InputStreamReader(is, "UTF-8")
val br = new java.io.BufferedReader(isr)
import scala.collection.JavaConversions._
val rows: scala.collection.Iterator[String] = br.lines.iterator
Enumerator.enumerate(rows).onDoneEnumerating {
is.close()
// ... Anything else you want to execute when the Enumerator finishes.
}
}
The IO tools provided by Enumerator give you this kind of resource management out of the box—e.g. if you create an enumerator with fromStream, the stream is guaranteed to get closed after running (even if you only read a single line, etc.).
So for example you could write the following:
import play.api.libs.iteratee._
val splitByNl = Enumeratee.grouped(
Traversable.splitOnceAt[Array[Byte], Byte](_ != '\n'.toByte) &>>
Iteratee.consume()
) compose Enumeratee.map(new String(_, "UTF-8"))
def fileLines(path: String): Enumerator[String] =
Enumerator.fromStream(new java.io.FileInputStream(path)).through(splitByNl)
It's a shame that the library doesn't provide a linesFromStream out of the box, but I personally would still prefer to use fromStream with hand-rolled splitting, etc. over using an iterator and providing my own resource management.