Reading empty files with scala - scala

I wrote the following function in scala which reads a file into a list of strings. My aim is to make sure that if the file input is empty that the returned list is empty too. Any idea how to do this in an elegant way:
def linesFromFile(file: String): List[String] = {
def materialize(buffer: BufferedReader): List[String] = materializeReverse(buffer, Nil).reverse
def materializeReverse(buffer: BufferedReader, accumulator: List[String]): List[String] = {
buffer.readLine match {
case null => accumulator
case line => materializeReverse(buffer, line :: accumulator)
}
}
val buffer = new BufferedReader(new FileReader(file))
materialize(buffer)
}

Your code should work, but it's rather inefficient in memory usage: you read in the entire file into memory, then waste more memory and processing putting the lines in the right order.
Using the Source.fromFile method in the standard library is your best bet (which also supports various file encodings), as specified in other comments/answers.
However, if you must roll your own, I think using a Stream (a lazy form of list) makes more sense than a List. You can return each line one at a time, and can terminate the stream when the end of the file is reached. This can be done as follows:
import java.io.{BufferedReader, FileReader}
def linesFromFile(file: String): Stream[String] = {
// The value of buffer is available to the following helper function. No need to pass as
// an argument.
val buffer = new BufferedReader(new FileReader(file))
// Helper: retrieve next line from file. Called only when next value requested.
def materialize: Stream[String] = {
// Uncomment to demonstrate non-recursive nature of this method.
//println("Materialize called!")
// Read the next line and wrap in an option. This avoids the hated null.
Option(buffer.readLine) match {
// If we've seen the end of the file, return an empty stream. We're done reading.
case None => {
buffer.close()
Stream.empty
}
// Otherwise, prepend the line read to another call to this helper.
case Some(line) => line #:: materialize
}
}
// Start the process.
materialize
}
Although it looks like materialize is recursive, in fact it is only called when another value needs to be retrieved, so you do not need to worry about stack overflows or recursion. You can verify this by uncommenting the println call.
For example (in a Scala REPL session):
$ scala
Welcome to Scala 2.12.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171).
Type in expressions for evaluation. Or try :help.
scala> import java.io.{BufferedReader, FileReader}
import java.io.{BufferedReader, FileReader}
scala> def linesFromFile(file: String): Stream[String] = {
|
| // The value of buffer is available to the following helper function. No need to pass as
| // an argument.
| val buffer = new BufferedReader(new FileReader(file))
|
| // Helper: retrieve next line from file. Called only when next value requested.
| def materialize: Stream[String] = {
|
| // Uncomment to demonstrate non-recursive nature of this method.
| println("Materialize called!")
|
| // Read the next line and wrap in an option. This avoids the hated null.
| Option(buffer.readLine) match {
|
| // If we've seen the end of the file, return an empty stream. We're done reading.
| case None => {
| buffer.close()
| Stream.empty
| }
|
| // Otherwise, prepend the line read to another call to this helper.
| case Some(line) => line #:: materialize
| }
| }
|
| // Start the process.
| materialize
| }
linesFromFile: (file: String)Stream[String]
scala> val stream = linesFromFile("TestFile.txt")
Materialize called!
stream: Stream[String] = Stream(Line 1, ?)
scala> stream.head
res0: String = Line 1
scala> stream.tail.head
Materialize called!
res1: String = Line 2
scala> stream.tail.head
res2: String = Line 2
scala> stream.foreach(println)
Line 1
Line 2
Materialize called!
Line 3
Materialize called!
Line 4
Materialize called!
Note how materialize is only called when we attempt to read another line from the file. Furthermore, it is not called if we've already retrieved a line (for example, both Line 1 and Line 2 in the output are only preceded by Materialize called! when first referenced).
To your point about empty files, in this case, an empty stream is returned:
scala> val empty = linesFromFile("EmptyFile.txt")
Materialize called!
empty: Stream[String] = Stream()
scala> empty.isEmpty
res3: Boolean = true

If by "empty" you mean contains absolutely nothing, your aim is already attained.
If however you meant "contains only whitespaces", you can filter out lines containing only whitespaces from your final list by modifying materializeReverse.
def materializeReverse(buffer: BufferedReader, accumulator: List[String]): List[String] = {
buffer.readLine match {
case null => accumulator
case line if line.trim.isEmpty => materializeReverse(buffer, accumulator)
case line => materializeReverse(buffer, line :: accumulator)
}
}

You can try this method
val result = Source.fromFile("C:\\Users\\1.txt").getLines.toList
Hope that helps. Please ask if you need any further clarification.

Related

Confused about cats-effect Async.memoize

I'm fairly new to cats-effect, but I think I am getting a handle on it. But I have come to a situation where I want to memoize the result of an IO, and it's not doing what I expect.
The function I want to memoize transforms String => String, but the transformation requires a network call, so it is implemented as a function String => IO[String]. In a non-IO world, I'd simply save the result of the call, but the defining function doesn't actually have access to it, as it doesn't execute until later. And if I save the constructed IO[String], it won't actually help, as that IO would repeat the network call every time it's used. So instead, I try to use Async.memoize, which has the following documentation:
Lazily memoizes f. For every time the returned F[F[A]] is bound, the
effect f will be performed at most once (when the inner F[A] is bound
the first time).
What I expect from memoize is a function that only ever executes once for a given input, AND where the contents of the returned IO are only ever evaluated once; in other words, I expect the resulting IO to act as if it were IO.pure(result), except the first time. But that's not what seems to be happening. Instead, I find that while the called function itself only executes once, the contents of the IO are still evaluated every time - exactly as would occur if I tried to naively save and reuse the IO.
I constructed an example to show the problem:
def plus1(num: Int): IO[Int] = {
println("foo")
IO(println("bar")) *> IO(num + 1)
}
var fooMap = Map[Int, IO[IO[Int]]]()
def mplus1(num: Int): IO[Int] = {
val check = fooMap.get(num)
val res = check.getOrElse {
val plus = Async.memoize(plus1(num))
fooMap = fooMap + ((num, plus))
plus
}
res.flatten
}
println("start")
val call1 = mplus1(2)
val call2 = mplus1(2)
val result = (call1 *> call2).unsafeRunSync()
println(result)
println(fooMap.toString)
println("finish")
The output of this program is:
start
foo
bar
bar
3
Map(2 -> <function1>)
finish
Although the plus1 function itself only executes once (one "foo" printed), the output "bar" contained within the IO is printed twice, when I expect it to also print only once. (I have also tried flattening the IO returned by Async.memoize before storing it in the map, but that doesn't do much).
Consider following examples
Given the following helper methods
def plus1(num: Int): IO[IO[Int]] = {
IO(IO(println("plus1")) *> IO(num + 1))
}
def mPlus1(num: Int): IO[IO[Int]] = {
Async.memoize(plus1(num).flatten)
}
Let's build a program that evaluates plus1(1) twice.
val program1 = for {
io <- plus1(1)
_ <- io
_ <- io
} yield {}
program1.unsafeRunSync()
This produces the expected output of printing plus1 twice.
If you do the same but instead using the mPlus1 method
val program2 = for {
io <- mPlus1(1)
_ <- io
_ <- io
} yield {}
program2.unsafeRunSync()
It will print plus1 just once confirming that memoization is working.
The trick with the memoization is that it should be evaluated only once to have the desired effect. Consider now the following program that highlights it.
val memIo = mPlus1(1)
val program3 = for {
io1 <- memIo
io2 <- memIo
_ <- io1
_ <- io2
} yield {}
program3.unsafeRunSync()
And it outputs plus1 twice as io1 and io2 are memoized separately.
As for your example, the foo is printed once because you're using a map and update the value when it's not found and this happens only once. The bar is printed every time when IO is evaluated as you lose the memoization effect by calling res.flatten.

Using variable second times in function not return the same value?

I start learning Scala, and i wrote that code. And I have question, why val which is constant? When i pass it second time to the same function return other value? How write pure function in scala?
Or any comment if that counting is right?
import java.io.FileNotFoundException
import java.io.IOException
import scala.io.BufferedSource
import scala.io.Source.fromFile
object Main{
def main(args: Array[String]): Unit = {
val fileName: String = if(args.length == 1) args(0) else ""
try {
val file = fromFile(fileName)
/* In file tekst.txt is 4 lines */
println(s"In file $fileName is ${countLines(file)} lines")
/* In file tekst.txt is 0 lines */
println(s"In file $fileName is ${countLines(file)} lines")
file.close
}
catch{
case e: FileNotFoundException => println(s"File $fileName not found")
case _: Throwable => println("Other error")
}
}
def countLines(file: BufferedSource): Long = {
file.getLines.count(_ => true)
}
}
val means that you cannot assign new value to it. If this is something immutable - a number, immutable collection, tuple or case class of other immutable things - then your value will not change over its lifetime - if this is val inside a function, when you assign value to it, it will stay the same until you leave that function. If this is value in class, it will stay the same between all calls to this class. If this is object it will stay the same over whole program life.
But, if you are talking about object which are mutable on their own, then the only immutable part is the reference to object. If you have a val of mutable.MutableList, then you can swap it with another mutable.MutableList, but you can modify the content of the list. Here:
val file = fromFile(fileName)
/* In file tekst.txt is 4 lines */
println(s"In file $fileName is ${countLines(file)} lines")
/* In file tekst.txt is 0 lines */
println(s"In file $fileName is ${countLines(file)} lines")
file.close
file is immutable reference to BufferedSource. You cannot replace it with another BufferedSource - but this class has internal state, it counts how many lines from file it already read, so the first time you operate on it you receive total number of lines in file, and then (since file is already read) 0.
If you wanted that code to be purer, you should contain mutability so that it won't be observable to the user e.g.
def countFileLines(fileName: String): Either[String, Long] = try {
val file = fromFile(fileName)
try {
Right(file.getLines.count(_ => true))
} finally {
file.close()
}
} catch {
case e: FileNotFoundException => Left(s"File $fileName not found")
case _: Throwable => Left("Other error")
}
println(s"In file $fileName is ${countLines(fileName)} lines")
println(s"In file $fileName is ${countLines(fileName)} lines")
Still, you are having side effects there, so ideally it should be something written using IO monad, but for now remember that you should aim for referential transparency - if you could replace each call to countLines(file) with a value from val counted = countLines(file) it would be RT. As you checked, it isn't. So replace it with something that wouldn't change behavior if it was called twice. A way to do it is to call whole computations twice without any global state preserved between them (e.g. internal counter in BufferedSource). IO monads make that easier, so go after them once you feel comfortable with syntax itself (to avoid learning too many things at once).
file.getLines returns Iterator[String] and iterator is consumable meaning we can iterate over it only once, for example, consider
val it = Iterator("a", "b", "c")
it.count(_ => true)
// val res0: Int = 3
it.count(_ => true)
// val res1: Int = 0
Looking at the implementation of count
def count(p: A => Boolean): Int = {
var res = 0
val it = iterator
while (it.hasNext) if (p(it.next())) res += 1
res
}
notice the call to it.next(). This call advances the state of the iterator and if it happens then we cannot go back to previous state.
As an alternative you could try length instead of count
val it = Iterator("a", "b", "c")
it.length
// val res0: Int = 3
it.length
// val res0: Int = 3
Looking at the definition of length which just delegates to size
def size: Int = {
if (knownSize >= 0) knownSize
else {
val it = iterator
var len = 0
while (it.hasNext) { len += 1; it.next() }
len
}
}
notice the guard
if (knownSize >= 0) knownSize
Some collections know their size without having to compute it by iterating over them. For example,
Array(1,2,3).knownSize // 3: I know my size in advance
List(1,2,3).knownSize // -1: I do not know my size in advance so I have to traverse the whole collection to find it
So if the underlying concrete collection of the Iterator knows its size, then call to length will short-cuircuit and it.next() will never execute, which means the iterator will not be consumed. This is the case for default concrete collection used by Iterator factory which is Array
val it = Iterator("a", "b", "c")
it.getClass.getSimpleName
// res6: Class[_ <: Iterator[String]] = class scala.collection.ArrayOps$ArrayIterator
however it is not true for BufferedSource. To workaround the issue consider creating an new iterator each time countLines is called
def countLines(fileName: String): Long = {
fromFile(fileName).getLines().length
}
println(s"In file $fileName is ${countLines(fileName)} lines")
println(s"In file $fileName is ${countLines(fileName)} lines")
// In file build.sbt is 22 lines
// In file build.sbt is 22 lines
Final point regarding value definitions and immutability. Consider
object Foo { var x = 42 } // object contains mutable state
val foo = Foo // value definition
foo.x
// val res0: Int = 42
Foo.x = -11 // mutation happening here
foo.x
// val res1: Int = -11
Here identifier foo is an immutable reference to mutable object.

Access code file and line number from Scala macro?

How can I access the name of the code file and line number in a Scala macro? I looked at SIP-19 and it says it can be easily implemented using macros...
EDIT:
To clarify, I want the code file and line number of the caller. I already have a debug macro and I want to modify it to print the line number and file name of whoever calls debug
You want c.macroApplication.pos, where c is for Context.
c.enclosingPosition finds the nearest macro on the stack that has a position. (See the other answer.) For instance, if your assert macro generates a tree for F"%p: $msg" but doesn't assign a position, the F macro would be positionless.
Example from a string interpolator macro, F"%p":
/* Convert enhanced conversions to something format likes.
* %Q for quotes, %p for position, %Pf for file, %Pn line number,
* %Pc column %Po offset.
*/
private def downConvert(parts: List[Tree]): List[Tree] = {
def fixup(t: Tree): Tree = {
val Literal(Constant(s: String)) = t
val r = "(?<!%)%(p|Q|Pf|Po|Pn|Pc)".r
def p = c.macroApplication.pos
def f(m: Match): String = m group 1 match {
case "p" => p.toString
case "Pf" => p.source.file.name
case "Po" => p.point.toString
case "Pn" => p.line.toString
case "Pc" => p.column.toString
case "Q" => "\""
}
val z = r.replaceAllIn(s, f _)
Literal(Constant(z)) //setPos t.pos
}
parts map fixup
}
If you mean file name and line number of the current position in the source code, for 2.10, my answer to that SO question is what your looking for:
def $currentPosition:String = macro _currentPosition
def _currentPosition(c:Context):c.Expr[String]={ import c.universe._
val pos = c.enclosingPosition
c.Expr(Literal(Constant(s"${pos.source.path}: line ${pos.line}, column ${pos.column}")))
}
That should work with 2.11 as well, although this way of creating the AST seems deprecated.
You can also have a look at that excerpt of my project Scart; it's how I use this technique to emit traces for debugging purposes.
The example in 'Writing Scala Compiler Plugins' shows how to access the line name and current number of the current position, as the others answers have mentioned.
http://www.scala-lang.org/old/node/140
In addition to the answers above, you can also get the position from the AST returned from a CompilationUnit.
For example:
def apply(unit: CompilationUnit) {
// Get the AST
val tree = unit.body
// Get the Position
// Scala.util.parsing.input.Position
val myPos = tree.pos
// Do something with the pos
unit.warning(pos, "Hello world")
}

Finding the values last values of a key in HashMap in Scala - how to write it in a functional style

I want to search the hashmap after a key and if the key is found give me the last value of the found value of the key. Here is my solution so far:
import scala.collection.mutable.HashMap
object Tmp extends Application {
val hashmap = new HashMap[String, String]
hashmap += "a" -> "288 | object | L"
def findNameInSymboltable(name: String) = {
if (hashmap.get(name) == None)
"N"
else
hashmap.get(name).flatten.last.toString
}
val solution: String = findNameInSymboltable("a")
println(solution) // L
}
Is there maybe a functional style of it which save me the overhead of locs?
Couldn't quite get your example to work. But maybe something like this would do the job?
hashmap.getOrElse("a", "N").split(" | ").last
The "getOrElse" will at least save you the if/else check.
In case your "N" is intended for display and not for computation, you can drag ouround the fact that there is no such "a" in a None until display:
val solution = // this is an Option[String], inferred
hashmap.get("a"). // if None the map is not done
map(_.split(" | ").last) // returns an Option, perhaps None
Which can alse be written:
val solution = // this is an Option[String], inferred
for(x <- hashmap.get("a"))
yield {
x.split(" | ").last
}
And finally:
println(solution.getOrElse("N"))

How should I match a pattern in Scala?

I need to do a pattern in Scala, this is a code:
object Wykonaj{
val doctype = DocType("html", PublicID("-//W3C//DTD XHTML 1.0 Strict//EN","http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"), Nil)
def main(args: Array[String]) {
val theUrl = "http://axv.pl/rss/waluty.php"
val xmlString = Source.fromURL(new URL(theUrl)).mkString
val xml = XML.loadString(xmlString)
val zawartosc= (xml \\ "description")
val pattern="""<descrition> </descrition>""".r
for(a <-zawartosc) yield a match{
case pattern=>println(pattern)
}
}
}
The problem is, I need to do val pattern=any pattern, to get from
<description><![CDATA[ <img src="http://youbookmarks.com/waluty/pic/waluty/AUD.gif"> dolar australijski 1AUD | 2,7778 | 210/A/NBP/2010 ]]> </description>
only it dolar australijski 1AUD | 2,7778 | 210/A/NBP/2010.
Try
import scala.util.matching.Regex
//...
val Pattern = new Regex(""".*; ([^<]*) </description>""")
//...
for(a <-zawartosc) yield a match {
case Pattern(p) => println(p)
}
It's a bit of a kludge (I don't use REs with Scala very often), but it seems to work. The CDATA is stringified as > entities, so the RE tries to find text after a semicolon and before a closing description tag.
val zawartosc = (xml \\ "description")
val pattern = """.*(dolar australijski.*)""".r
val allMatches = (for (a <- zawartosc; text = a.text) yield {text}) collect {
case pattern(value) => value }
val result = allMatches.headOption // or .head
This is mostly a matter of using the right regular expression. In this case you want to match the string that contains dolar australijski. It has to allow for extra characters before dolar. So use .*. Then use the parens to mark the start and end of what you need. Refer to the Java api for the full doc.
With respect to the for comprehension, I convert the XML element into text before doing the match and then collect the ones that match the pattern by using the collect method. Then the desired result should be the first and only element.