Using variable second times in function not return the same value? - scala

I start learning Scala, and i wrote that code. And I have question, why val which is constant? When i pass it second time to the same function return other value? How write pure function in scala?
Or any comment if that counting is right?
import java.io.FileNotFoundException
import java.io.IOException
import scala.io.BufferedSource
import scala.io.Source.fromFile
object Main{
def main(args: Array[String]): Unit = {
val fileName: String = if(args.length == 1) args(0) else ""
try {
val file = fromFile(fileName)
/* In file tekst.txt is 4 lines */
println(s"In file $fileName is ${countLines(file)} lines")
/* In file tekst.txt is 0 lines */
println(s"In file $fileName is ${countLines(file)} lines")
file.close
}
catch{
case e: FileNotFoundException => println(s"File $fileName not found")
case _: Throwable => println("Other error")
}
}
def countLines(file: BufferedSource): Long = {
file.getLines.count(_ => true)
}
}

val means that you cannot assign new value to it. If this is something immutable - a number, immutable collection, tuple or case class of other immutable things - then your value will not change over its lifetime - if this is val inside a function, when you assign value to it, it will stay the same until you leave that function. If this is value in class, it will stay the same between all calls to this class. If this is object it will stay the same over whole program life.
But, if you are talking about object which are mutable on their own, then the only immutable part is the reference to object. If you have a val of mutable.MutableList, then you can swap it with another mutable.MutableList, but you can modify the content of the list. Here:
val file = fromFile(fileName)
/* In file tekst.txt is 4 lines */
println(s"In file $fileName is ${countLines(file)} lines")
/* In file tekst.txt is 0 lines */
println(s"In file $fileName is ${countLines(file)} lines")
file.close
file is immutable reference to BufferedSource. You cannot replace it with another BufferedSource - but this class has internal state, it counts how many lines from file it already read, so the first time you operate on it you receive total number of lines in file, and then (since file is already read) 0.
If you wanted that code to be purer, you should contain mutability so that it won't be observable to the user e.g.
def countFileLines(fileName: String): Either[String, Long] = try {
val file = fromFile(fileName)
try {
Right(file.getLines.count(_ => true))
} finally {
file.close()
}
} catch {
case e: FileNotFoundException => Left(s"File $fileName not found")
case _: Throwable => Left("Other error")
}
println(s"In file $fileName is ${countLines(fileName)} lines")
println(s"In file $fileName is ${countLines(fileName)} lines")
Still, you are having side effects there, so ideally it should be something written using IO monad, but for now remember that you should aim for referential transparency - if you could replace each call to countLines(file) with a value from val counted = countLines(file) it would be RT. As you checked, it isn't. So replace it with something that wouldn't change behavior if it was called twice. A way to do it is to call whole computations twice without any global state preserved between them (e.g. internal counter in BufferedSource). IO monads make that easier, so go after them once you feel comfortable with syntax itself (to avoid learning too many things at once).

file.getLines returns Iterator[String] and iterator is consumable meaning we can iterate over it only once, for example, consider
val it = Iterator("a", "b", "c")
it.count(_ => true)
// val res0: Int = 3
it.count(_ => true)
// val res1: Int = 0
Looking at the implementation of count
def count(p: A => Boolean): Int = {
var res = 0
val it = iterator
while (it.hasNext) if (p(it.next())) res += 1
res
}
notice the call to it.next(). This call advances the state of the iterator and if it happens then we cannot go back to previous state.
As an alternative you could try length instead of count
val it = Iterator("a", "b", "c")
it.length
// val res0: Int = 3
it.length
// val res0: Int = 3
Looking at the definition of length which just delegates to size
def size: Int = {
if (knownSize >= 0) knownSize
else {
val it = iterator
var len = 0
while (it.hasNext) { len += 1; it.next() }
len
}
}
notice the guard
if (knownSize >= 0) knownSize
Some collections know their size without having to compute it by iterating over them. For example,
Array(1,2,3).knownSize // 3: I know my size in advance
List(1,2,3).knownSize // -1: I do not know my size in advance so I have to traverse the whole collection to find it
So if the underlying concrete collection of the Iterator knows its size, then call to length will short-cuircuit and it.next() will never execute, which means the iterator will not be consumed. This is the case for default concrete collection used by Iterator factory which is Array
val it = Iterator("a", "b", "c")
it.getClass.getSimpleName
// res6: Class[_ <: Iterator[String]] = class scala.collection.ArrayOps$ArrayIterator
however it is not true for BufferedSource. To workaround the issue consider creating an new iterator each time countLines is called
def countLines(fileName: String): Long = {
fromFile(fileName).getLines().length
}
println(s"In file $fileName is ${countLines(fileName)} lines")
println(s"In file $fileName is ${countLines(fileName)} lines")
// In file build.sbt is 22 lines
// In file build.sbt is 22 lines
Final point regarding value definitions and immutability. Consider
object Foo { var x = 42 } // object contains mutable state
val foo = Foo // value definition
foo.x
// val res0: Int = 42
Foo.x = -11 // mutation happening here
foo.x
// val res1: Int = -11
Here identifier foo is an immutable reference to mutable object.

Related

Functional way to generate file names in Scala

Ok, I want to generate temp file names. So, I created a class with var tempFileName and fileNo such that it creates files like
BSirCN_0.txt
BSirCN_1.txt
BSirCN_2.txt
But, to do this I have to keep count and the way I am going it is calling next() function of the class which returns the filename in sequence (should return BSirCN_4 in the above case. Now this goes against FP as I am modifying the state i.e. the count of names in the Object. How do I do it in a functional way. One way I can think of is keeping count where the function is called and just concatenate. Any other ways?
Just return a new object:
case class FileGenerator(tempFileName: String, fileNo: Long = 0) {
lazy val currentFileName = tempFileName + "_" + fileNo
lazy val next = FileGenerator(tempFileName, fileNo + 1)
}
You can then do:
val generator = FileGenerator("BSirCN")
val first = generator.currentFileName
val next = generator.next.currentFileName
You can avoid the mutations using an Iterator (or any other kind of infinite & lazy collection).
final class TempFileNamesGenerator(prefix: String) {
private[this] val generator =
Iterator
.from(start = 0)
.map(i => s"${prefix}_${i}.txt")
def next(): String =
generator.next()
}
val generator = new TempFileNamesGenerator(prefix = "BSirCN")
generator.next() // BSirCN_0.txt
generator.next() // BSirCN_1.txt
generator.next() // BSirCN_2.txt
A similar solution to one proposed by #Luis but using streams:
def namesStream(prefix: String, suffix: String): Stream[String] = Stream.from(0).map(n => s"$prefix$n$suffix")
Then use it like this:
val stream = namesStream("BSirCN_", ".txt")
stream.take(5) // BSirCN_1.txt, BSirCN_2.txt, BSirCN_3.txt, BSirCN_4.txt, BSirCN_5.txt
// or
stream.drop(10).take(2) // BSirCN_11.txt, BSirCN_12.txt

Scala writing and reading from a file inside a while loop

I have an application where I am to write values into a file and read them back into the program in a while loop. This fails because the file is only written when I exit the loop and not in every iteration. Therefore, in the next iterations, I cannot access values that were supposed to be written into the file in the previous iterations. How can I make every iteration write into the file as opposed to writing all values at the end of the while loop?
My application uses Scalafix. It reads in a test suite Scala file and duplicates its test cases at each iteration. The important details are explained by my series of 8 comments. Is there something about the working of the FileWriter that makes it wait until the last round of the loop to write back to file as it does not write back to file at every iteration of the loop?
object Printer{
//1 . This is my filePrinter which I call at every iteration to print the new test file with its test cases duplicated.
def saveFile(filename:String, data: String): Unit ={
val fileWritter: FileWriter = new FileWriter(filename)
val bufferWritter: BufferedWriter = new BufferedWriter(fileWritter)
bufferWritter.write(data)
bufferWritter.flush()
bufferWritter.close()
}
}
object Main extends App {
//2. my loop starts here.
var n = 2
do {
// read in a semanticDocument (function provided below)
val ( sdoc1,base,filename)=SemanticDocumentBuilder.buildSemanticDocument()
implicit val sdoc = sdoc1 //4. P3 is a scalafix "patch" that collects all the test cases of
// test suite and duplicates them. It works just fine, see the next comment.
val p3 =sdoc.tree.collect {
case test#Term.ApplyInfix(Term.ApplyInfix(_,Term.Name(smc), _,
List(Lit.String(_))), Term.Name("in"), _, params) =>
Patch.addRight(test,"\n" +test.toString())
}.asPatch
//5. I collect the test cases in the next line and print
//out how many they are. At this moment, I have not
// applied the duplicate function, so they are still as
//originally read from the test file.
val staticAnalyzer = new StaticAnalyzer()
val testCases: List[Term.ApplyInfix] =
staticAnalyzer.collectTestCases()
println("Tests cases count: "+ testCases.length)
val r3 = RuleName(List(RuleIdentifier("r3")))
val map:Map[RuleName, Patch] = Map(r3->p3)
val r = PatchInternals(map, v0.RuleCtx(sdoc.tree), None)
//6. After applying the p3 patch in the previous three lines,
//I indeed print out the newly created test suite file
//and it contains each test case duplicated as shown
// by the below println(r._1.getClass).
println(r._1.getClass)
//7. I then call the my save file (see this function above - first lines of this code)
Printer.saveFile(base+"src/test/scala/"+filename,r._1)
n-=1
//8. Since I have saved my file with the duplicates,
//I would expect that it will save the file back to the
//file (overwrite the original file as I have not used "append = true".
//I would then expect that the next length of test cases will
//have doubled but this is never the case.
//The save function with FileWriter only works in the last loop.
//Therefore, no matter the number of loops, it only doubles once!
println("Loop: "+ n)
} while(n>0)
}
**Edit factored out the reading in of the semanticDocument ** This function simply returns a SemanticDocument and two strings representing my file path and filename.
object SemanticDocumentBuilder{
def buildSemanticDocument(): (SemanticDocument,String,String) ={
val base = "/Users/soft/Downloads/simpleAkkaProject/"
val local = new File(base)
//val dependenceisSBTCommand = s"sbt -ivy ./.ivy2 -Dsbt.ivy.home=./.ivy2 -Divy.home=./.ivy2
//val sbtCmd = s"sbt -ivy ./ivy2 -Dsbt.ivy.home=./ivy2 -Divy.home=./ivy2 -Dsbt.boot.directo
val result = sys.process.Process(Seq("sbt","semanticdb"), local).!
val jars = FileUtils.listFiles(local, Array("jar"), true).toArray(new Array[File](0))
.toList
.map(f => Classpath(f.getAbsolutePath))
.reduceOption(_ ++ _)
val classes = FileUtils.listFilesAndDirs(local, TrueFileFilter.INSTANCE, DirectoryFileFilte
.toList
.filter(p => p.isDirectory && !p.getAbsolutePath.contains(".sbt") && p.getAbsolutePath.co
.map(f => Classpath(f.getAbsolutePath))
.reduceOption(_ ++ _)
val classPath = ClassLoader.getSystemClassLoader.asInstanceOf[URLClassLoader].getURLs
.map(url => Classpath(url.getFile))
.reduceOption(_ ++ _)
val all = (jars ++ classes ++ classPath).reduceOption(_ ++ _).getOrElse(Classpath(""))
val symbolTable = GlobalSymbolTable(all)
val filename = "AkkaQuickstartSpec.scala"
val root = AbsolutePath(base).resolve("src/test/scala/")
println(root)
val abspath = root.resolve(filename)
println(root)
val relpath = abspath.toRelative(AbsolutePath(base))
println(relpath)
val sourceFile = new File(base+"src/test/scala/"+filename)
val input = Input.File(sourceFile)
println(input)
if (n == firstRound){
doc = SyntacticDocument.fromInput(input)
}
//println(doc.tree.structure(30))
var documents: Map[String, TextDocument] = Map.empty
Locator.apply(local.toPath)((path, db) => db.documents.foreach({
case document#TextDocument(_, uri, text, md5, _, _, _, _, _) if !md5.isEmpty => { // skip
if (n == firstRound){
ast= sourceFile.parse[Source].getOrElse(Source(List()))
}
documents = documents + (uri -> document)
println(uri)
}
println(local.canWrite)
if (editedSuite != null){
Printer.saveFile(sourceFile,editedSuite)
}
}))
//println(documents)
val impl = new InternalSemanticDoc(doc, documents(relpath.toString()), symbolTable)
implicit val sdoc = new SemanticDocument(impl)
val symbols = sdoc.tree.collect {
case t# Term.Name("<") => {
println(s"symbol for $t")
println(t.symbol.value)
println(symbolTable.info(t.symbol.value))
}
}
(sdoc,base,filename)
}
}
In saveFile you need to close the fileWriter after closing the bufferedWriter. You don't need to flush because close will do this for you.
You should also close all the other File objects that you create in the loop, because they may be holding on to stale file handles. (e.g. local, ast)
More generally, clean up the code by putting code in functions with meaningful names. There is also a lot of code that can be outside the loop. Doing this will make it easier to see what is going on and allow you create a Minimal, Complete, and Verifiable example. As it stands it is really difficult to work out what is going on.

Cats Writer Vector is empty

I wrote this simple program in my attempt to learn how Cats Writer works
import cats.data.Writer
import cats.syntax.applicative._
import cats.syntax.writer._
import cats.instances.vector._
object WriterTest extends App {
type Logged2[A] = Writer[Vector[String], A]
Vector("started the program").tell
val output1 = calculate1(10)
val foo = new Foo()
val output2 = foo.calculate2(20)
val (log, sum) = (output1 + output2).pure[Logged2].run
println(log)
println(sum)
def calculate1(x : Int) : Int = {
Vector("came inside calculate1").tell
val output = 10 + x
Vector(s"Calculated value ${output}").tell
output
}
}
class Foo {
def calculate2(x: Int) : Int = {
Vector("came inside calculate 2").tell
val output = 10 + x
Vector(s"calculated ${output}").tell
output
}
}
The program works and the output is
> run-main WriterTest
[info] Compiling 1 Scala source to /Users/Cats/target/scala-2.11/classes...
[info] Running WriterTest
Vector()
50
[success] Total time: 1 s, completed Jan 21, 2017 8:14:19 AM
But why is the vector empty? Shouldn't it contain all the strings on which I used the "tell" method?
When you call tell on your Vectors, each time you create a Writer[Vector[String], Unit]. However, you never actually do anything with your Writers, you just discard them. Further, you call pure to create your final Writer, which simply creates a Writer with an empty Vector. You have to combine the writers together in a chain that carries your value and message around.
type Logged[A] = Writer[Vector[String], A]
val (log, sum) = (for {
_ <- Vector("started the program").tell
output1 <- calculate1(10)
foo = new Foo()
output2 <- foo.calculate2(20)
} yield output1 + output2).run
def calculate1(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate1").tell
output = 10 + x
_ <- Vector(s"Calculated value ${output}").tell
} yield output
class Foo {
def calculate2(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate2").tell
output = 10 + x
_ <- Vector(s"calculated ${output}").tell
} yield output
}
Note the use of for notation. The definition of calculate1 is really
def calculate1(x: Int): Logged[Int] = Vector("came inside calculate1").tell.flatMap { _ =>
val output = 10 + x
Vector(s"calculated ${output}").tell.map { _ => output }
}
flatMap is the monadic bind operation, which means it understands how to take two monadic values (in this case Writer) and join them together to get a new one. In this case, it makes a Writer containing the concatenation of the logs and the value of the one on the right.
Note how there are no side effects. There is no global state by which Writer can remember all your calls to tell. You instead make many Writers and join them together with flatMap to get one big one at the end.
The problem with your example code is that you're not using the result of the tell method.
If you take a look at its signature, you'll see this:
final class WriterIdSyntax[A](val a: A) extends AnyVal {
def tell: Writer[A, Unit] = Writer(a, ())
}
it is clear that tell returns a Writer[A, Unit] result which is immediately discarded because you didn't assign it to a value.
The proper way to use a Writer (and any monad in Scala) is through its flatMap method. It would look similar to this:
println(
Vector("started the program").tell.flatMap { _ =>
15.pure[Logged2].flatMap { i =>
Writer(Vector("ended program"), i)
}
}
)
The code above, when executed will give you this:
WriterT((Vector(started the program, ended program),15))
As you can see, both messages and the int are stored in the result.
Now this is a bit ugly, and Scala actually provides a better way to do this: for-comprehensions. For-comprehension are a bit of syntactic sugar that allows us to write the same code in this way:
println(
for {
_ <- Vector("started the program").tell
i <- 15.pure[Logged2]
_ <- Vector("ended program").tell
} yield i
)
Now going back to your example, what I would recommend is for you to change the return type of compute1 and compute2 to be Writer[Vector[String], Int] and then try to make your application compile using what I wrote above.

How to use getOrElseUpdate in scala.collection.mutable.HashMap?

The example code counts each word's occurrences in given input file:
object Main {
def main(args: Array[String]) {
val counts = new scala.collection.mutable.HashMap[String, Int]
val in = new Scanner(new File("input.txt"))
while (in.hasNext()) {
val s: String = in.next()
counts(s) = counts.getOrElse(s, 0) + 1 // Here!
}
print(counts)
}
}
Can the highlighted with comment line be rewritten using the getOrElseUpdate method?
P.S. I am only at the 4th part of the "Scala for the impatient", so please don't teach me now about functional Scala which, I am sure, can be more beautiful here.
Thanks.
If you look at the doc you'll see the next:
If given key is already in this map, returns associated value.
Otherwise, computes value from given expression op, stores with key in
map and returns that value.
, but you need to modify map anyway, so getOrElseUpdate is useless here.
You can define default value, which will return if key doesn't exist. And use it the next way:
import scala.collection.mutable.HashMap
object Main {
def main(args: Array[String]) {
val counts = HashMap[String, Int]().withDefaultValue(0)
val in = new Scanner(new File("input.txt"))
while (in.hasNext()) {
val s: String = in.next()
counts(s) += 1
}
print(counts)
}
}

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}