How to replace a particular text in txt file using scala [closed] - scala

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a text file looks like below
a~ϕ~b~ϕ~c~ϕ~d~ϕ~e
1~ϕ~2~ϕ~3~ϕ~4~ϕ~5
I want the the below output to be written into the text file
a,b,c,d,e
1,2,3,4,5

Here is a another approach which replaces ~ϕ~ seperator with , using a temporary file to store the intermediate results of replacement:
import java.io.{File, PrintStream}
import scala.io.{Codec, Source}
object ReplaceIOStringExample {
val Sep = "~ϕ~"
def main(args: Array[String]): Unit = {
replaceFile("/tmp/test.data")
}
def replaceFile(path: String) : Unit = {
val inputStream = Source.fromFile(path)(Codec.UTF8)
val outputLines = inputStream.getLines()
new PrintStream(path + ".bak") {
outputLines.foreach { line =>
val formatted = line.split(Sep).mkString(",") + "\n"
write(formatted.getBytes("UTF8"))
}
}
//delete old file
new File(path).delete()
//rename .bak file to the initial file name
new File(path + ".bak").renameTo(new File(path))
}
}
Notice that val outputLines = inputStream.getLines() will return an Iterator[String] which means that we read the file lazily. This approach allows us to format each line and write it back to the output file avoiding storing the whole file in the memory.

You can replace with replaceAll with regular expression or
"a~ϕ~b~ϕ~c~ϕ~d~ϕ~e".filter(c => c.isLetter || c.isDigit).mkString(",")

Related

How to do Future Scala? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 12 months ago.
Improve this question
I don't understand why Future doesn't work? Can someone help me? I got this code from official scala website. It's doesn't compile error Future.type doesn't take parameters
import scala.concurrent.Future
import java.lang.Thread.sleep
object Future extends App {
def getStockPrice(stockSymbol: String): Future[Double] = Future {
val r = scala.util.Random
val randomSleepTime = r.nextInt(3000)
val randomPrice = r.nextDouble * 1000
sleep(randomSleepTime)
randomPrice
}
}
Change the object name from Future to something else. You also need an implicit ExecutionContext available. Following will work.
import scala.concurrent.Future
import java.lang.Thread.sleep
import scala.concurrent.ExecutionContext.Implicits.global
object FutureApp extends App {
def getStockPrice(stockSymbol: String): Future[Double] = Future {
val r = scala.util.Random
val randomSleepTime = r.nextInt(3000)
val randomPrice = r.nextDouble * 1000
sleep(randomSleepTime)
randomPrice
}
}

How to implement my current Scala server to use Akka concurrency? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I currently have a Scala echo server, how to do I implement Akka into the code so it would serve HTML files that I've implemented?
Don't even know how to begin to work this out, would I have to import something like an Akka thing? Or to change my dependencies? So instead of using imports Java and Scala would I change it to Akka instead?
My echo server
import java.net._
import java.io._
import scala.concurrent.{ExecutionContext, Future}
import scala.util.control.Breaks
object EchoServer {
def read_and_write(in: BufferedReader, out: BufferedWriter): Unit = {
out.write(in.readLine())
out.flush()
in.close()
out.close()
}
def serve(server: ServerSocket): Unit = {
val s = server.accept()
val in = new BufferedReader(new InputStreamReader(s.getInputStream))
val out = new BufferedWriter(new OutputStreamWriter(s.getOutputStream))
read_and_write(in, out)
s.close()
}
def getListOfFiles(dir: String): List[File] = {
val file = new File(dir)
if (file.exists && file.isDirectory) {
file.listFiles.filter(_.isFile).toList
} else {
List[File]()
}
}
def main(args: Array[String]) {
val server = new ServerSocket(9999)
var readString = "C:\\Users\\Desktop\\EchoServer\\src\\main\\htmlFiles"
println("What file are you looking for with extension?")
implicit val ec = ExecutionContext.global
while (true) {
Future {
implicit val ec = ExecutionContext.global
val userInput = scala.io.StdIn.readLine()
val result = getListOfFiles(readString)
val loop = new Breaks
try {
result.foreach { file =>
if (file.getName.endsWith(".html") == (userInput.endsWith(".html"))) {
if (file.getName.equals(userInput)) {
println(file.getName + " file found")
}
else {
println("file not found")
}
}
}
}
catch {
case e: FileNotFoundException => println("Couldn't find that file.")
}
serve(server)
}
}
}
}
I had implemented futures, not sure if it serves HTML files correctly even. How would I change this to akka concurrency?

Is there a way to split by Custom Delimiter in Spark(with Scala) and not read line by line, to read a set of key, value pairs?

I have an input .txt file in the format.
Record
ID||1
Word||ABC
Language||English
Count||2
Record
ID||2
Word||DEF
Language||French
Count||4
and so on.
I'm new to Apache Spark/Scala.
I see that there are options to read a file line by line by using the .textFile method or to read a whole file by .wholeTextFile method. We can also read files which are in CSV format.
But let's say I want to read such a file and create a case class out of it, which would have the members id, word, language, count, how can I go about this?
Assuming your input format is consistent (no random whitespaces, always terminates with "Record\n"), the following code works.
The key is in hadoop configuration's "textinputformat.record.delimiter"
case class Foo(ID : Long, Word : String, Language : String, Count : Long)
.
val conf = new SparkConf()
conf.setMaster("local[*]")
conf.setAppName("stackOverflow")
val sc = new SparkContext(conf)
sc.hadoopConfiguration.set("textinputformat.record.delimiter","Record\n")
val rdd = sc.textFile("C:\\TEMP\\stack.txt")
.flatMap(record => {
if (record.isEmpty) None //needed to remove first empty string delimited by "Record\n"
else {
val lines = record.split("\n").map(_.split("\\|\\|"))
//lines.foreach(x=>println(x.mkString(",")))
Some(Foo(
lines(0)(1).toLong,
lines(1)(1),
lines(2)(1),
lines(3)(1).toLong
))
}
})
rdd.foreach(println)
The output is
Foo(2,DEF,French,4)
Foo(1,ABC,English,2)

Spark: How to get String value while generating output file

I have two files
--------Student.csv---------
StudentId,City
101,NDLS
102,Mumbai
-------StudentDetails.csv---
StudentId,StudentName,Course
101,ABC,C001
102,XYZ,C002
Requirement
StudentId in first should be replaced with StudentName and Course in the second file.
Once replaced I need to generate a new CSV with complete details like
ABC,C001,NDLS
XYZ,C002,Mumbai
Code used
val studentRDD = sc.textFile(file path);
val studentdetailsRDD = sc.textFile(file path);
val studentB = sc.broadcast(studentdetailsRDD.collect)
//Generating CSV
studentRDD.map{student =>
val name = getName(student.StudentId)
val course = getCourse(student.StudentId)
Array(name, course, student.City)
}.mapPartitions{data =>
val stringWriter = new StringWriter();
val csvWriter =new CSVWriter(stringWriter);
csvWriter.writeAll(data.toList)
Iterator(stringWriter.toString())
}.saveAsTextFile(outputPath)
//Functions defined to get details
def getName(studentId : String) {
studentB.value.map{stud =>if(studentId == stud.StudentId) stud.StudentName}
}
def getCourse(studentId : String) {
studentB.value.map{stud =>if(studentId == stud.StudentId) stud.Course}
}
Problem
File gets generated but the values are object representations instead of String value.
How can I get the string values instead of objects ?
As suggested in another answer, Spark's DataFrame API is especially suitable for this, as it easily supports joining two DataFrames, and writing CSV files.
However, if you insist on staying with RDD API, looks like the main issue with your code is the lookup functions: getName and getCourse basically do nothing, because their return type is Unit; Using an if without an else means that for some inputs there's no return value, which makes the entire function return Unit.
To fix this, it's easier to get rid of them and simplify the lookup by broadcasting a Map:
// better to broadcast a Map instead of an Array, would make lookups more efficient
val studentB = sc.broadcast(studentdetailsRDD.keyBy(_.StudentId).collectAsMap())
// convert to RDD[String] with the wanted formatting
val resultStrings = studentRDD.map { student =>
val details = studentB.value(student.StudentId)
Array(details.StudentName, details.Course, student.City)
}
.map(_.mkString(",")) // naive CSV writing with no escaping etc., you can also use CSVWriter like you did
// save as text file
resultStrings.saveAsTextFile(outputPath)
Spark has great support for join and write to file. Join only takes 1 line of code and write also only takes 1.
Hand write those code can be error proven, hard to read and most likely super slow.
val df1 = Seq((101,"NDLS"),
(102,"Mumbai")
).toDF("id", "city")
val df2 = Seq((101,"ABC","C001"),
(102,"XYZ","C002")
).toDF("id", "name", "course")
val dfResult = df1.join(df2, "id").select("id", "city", "name")
dfResult.repartition(1).write.csv("hello.csv")
There will be a directory created. There is only 1 file in the directory which is the finally result.

How to delete the last line of the file in scala?

I am trying to append to a file such that I first want to delete the last line and then start appending. But, I can't figure how to delete the last line of the file.
I am appending the file as follows:
val fw = new FileWriter("src/file.txt", true) ;
fw.write("new item");
Can anybody please help me?
EDIT:
val lines_list = Source.fromFile("src/file.txt").getLines().toList
val new_lines = lines_list.dropRight(1)
val pw = new PrintWriter(new File("src/file.txt" ))
(t).foreach(pw.write) pw.write("\n")
pw.close()
After following your method, I am trying to write back to the file, but when I do this, all the contents, with the last line deleted come in a single line, however I want them to come in separate lines.
For very large files a simple solution relies in OS related tools, for instance sed (stream editor), and so consider a call like this,
import sys.process._
Seq("sed","-i","$ d","src/file1.txt")!
which will remove the last line of the text file. This approach is not so Scalish yet it solves the problem without leaving Scala.
Return random access file in position without last line.
import java.io.{RandomAccessFile, File}
def randomAccess(file: File) = {
val random = new RandomAccessFile(file, "rw")
val result = findLastLine(random, 0, 0)
random.seek(result)
random
}
def findLastLine(random: RandomAccessFile, position: Long, previous: Long): Long = {
val pointer = random.getFilePointer
if (random.readLine == null) {
previous
} else {
findLastLine(random, previous, pointer)
}
}
val file = new File("build.sbt")
val random = randomAccess(file)
And test:
val line = random.readLine()
logger.debug(s"$line")
My scala is way off, so people can probably give you a nicer solution:
import scala.io.Source
import java.io._
object Test00 {
def main(args: Array[String]) = {
val lines = Source.fromFile("src/file.txt").getLines().toList.dropRight(1)
val pw = new PrintWriter(new File("src/out.txt" ))
(lines :+ "another line").foreach(pw.println)
pw.close()
}
}
Sorry for the hardcoded appending, i used it just to test that everything worked fine.