My first Scala program and I am stuck.
So basically i am trying to override the potential None value of "last" to a 0l in the declaration of past.
import java.util.Date;
object TimeUtil {
var timerM = Map( "" -> new Date().getTime() );
def timeit(seq:String, comment:String) {
val last = timerM.get(seq)
val cur = new Date().getTime()
timerM += seq -> cur;
println( timerM )
if( last == None ) return;
val past = ( last == None ) ? 0l : last ;
Console.println("Time:" + seq + comment + ":" + (cur - past)/1000 )
}
def main(args : Array[String]) {
timeit("setup ", "mmm")
timeit("setup ", "done")
}
}
You should probably try and distil the problem a little more, but certainly include your actual error!
However, there is no ternary operator (? :) in Scala so instead you can use:
if (pred) a else b //instead of pred ? a : b
This is because (almost - see Kevin's comments below) everything in scala is an expression with a return type. This is not true in Java where some things are statements (with no type). With scala, it's always better to use composition, rather than forking (i.e. the if (expr) return; in your example). So I would re-write this as:
val last = timerM.get(seq)
val cur = System.currentTimeMillis
timerM += (seq -> cur)
println(timerM)
last.foreach{l => println("Time: " + seq + comment + ":" + (l - past)/1000 ) }
Note that the ternary operation in your example is extraneous anyway, because last cannot be None at that point (you have just returned if it were). My advice is to use explicit types for a short while, as you get used to Scala. So the above would then be:
val last: Option[Long] = timerM.get(seq)
val cur: Long = System.currentTimeMillis
timerM += (seq -> cur)
println(timerM)
last.foreach{l : Long => println("Time: " + seq + comment + ":" + (l - past)/1000 ) }
(It seems in your comment that you might be trying to assign a Long value to last, which would be an error of course, because last is of type Option[Long], not Long)
You have a couple of "code smells" in there, which suggest that a better, shinier design is probably just around the corner:
You're using a mutable variable, var instead of val
timeit works exclusively by side effects, it modifies state outside of the function and successive calls with the same input can have different results.
seq is slightly risky as a variable name, it's far too close to the (very common & popular) Seq type from the standard library.
So going back to first principles, how else might the same results be achieved in a more "idiomatic" style? Starting with the original design (as I understand it):
the first call to timeit(seq,comment) just notes the current time
subsequent calls with the same value of seq println the time elapsed since the previous call
Basically, you just want to time how long a block of code takes to run. If there was a way to pass a "block of code" to a function, then maybe, just maybe... Fortunately, scala can do this, just use a by-name param:
def timeit(block: => Unit) : Long = {
val start = System.currentTimeMillis
block
System.currentTimeMillis - start
}
Just check out that block parameter, it looks a bit like a function with no arguments, that's how by-name params are written. The last expression of the function System.currentTimeMillis - start is used as the return value.
By wrapping the parameter list in {} braces instead of () parentheses, you can make it look like a built-in control structure, and use it like this:
val duration = timeit {
do stuff here
more stuff
do other stuff
}
println("setup time:" + duration + "ms")
Alternatively, you can push the println behaviour back into the timeit function, but that makes life harder if you later want to reuse it for timing something without printing it to the console:
def timeit(description: String)(block: => Unit) : Unit = {
val start = System.currentTimeMillis
block
val duration System.currentTimeMillis - start
println(description + " took " + duration + "ms")
}
This is another trick, multiple parameter blocks. It allows you to use parentheses for the first block, and braces for the second:
timeit("setup") {
do stuff here
more stuff
do other stuff
}
// will print "setup took XXXms"
Of course, there's a million other variants you can make on this pattern, with varying degrees of sophistication/complexity, but it should be enough to get you started...
Related
I am experimenting with below code:
def TestRun(n: Int): Unit = {
(1 to n)
.grouped(4)
.map(grp => { println("Group length is: " + grp.length)})
}
TestRun(100)
And I am a bit surprised that I am not able to see any output of println after executing the program. Code compiled successfully and ran, but without any expected output.
Kindly point me what mistake I am doing.
The reason there is no output is that Range gives an Iterator which is lazy. This means that it won't create any data until it is asked. Likewise the grouped and map methods also return a lazy Iterator, so the result is a Iterator that will return a set of values only when asked. TestRun never asks for the data, so it is never generated.
One way round this is to use foreach rather than map because foreach is eager (the opposite of lazy) and will take each value from the Iterator in turn.
Another way would be to force the Iterator to become a concrete collection using something like toList:
def TestRun(n: Int): Unit = {
(1 to n)
.grouped(4)
.map(grp => { println("Group length is: " + grp.length)})
.toList
}
TestRun(100)
I'm new to Scala and trying to understand the syntax the pattern matching constructs, specifically from examples in Unfiltered (http://unfiltered.databinder.net/Try+Unfiltered.html).
Here's a simple HTTP server that echos back Hello World! and 2 parts of the path if the path is 2 parts long:
package com.hello
import unfiltered.request.GET
import unfiltered.request.Path
import unfiltered.request.Seg
import unfiltered.response.ResponseString
object HelloWorld {
val sayhello = unfiltered.netty.cycle.Planify {
case GET(Path(Seg(p :: q :: Nil))) => {
ResponseString("Hello World! " + p + " " + q);
}
};
def main(args: Array[String]) {
unfiltered.netty.Http(10000).plan(sayhello).run();
}
}
Also for reference the source code for the Path, Seg, and GET/Method objects:
package unfiltered.request
object Path {
def unapply[T](req: HttpRequest[T]) = Some(req.uri.split('?')(0))
def apply[T](req: HttpRequest[T]) = req.uri.split('?')(0)
}
object Seg {
def unapply(path: String): Option[List[String]] = path.split("/").toList match {
case "" :: rest => Some(rest) // skip a leading slash
case all => Some(all)
}
}
class Method(method: String) {
def unapply[T](req: HttpRequest[T]) =
if (req.method.equalsIgnoreCase(method)) Some(req)
else None
}
object GET extends Method("GET")
I was able to break down how most of it works, but this line leaves me baffled:
case GET(Path(Seg(p :: q :: Nil))) => {
I understand the purpose of the code, but not how it gets applied. I'm very interested in learning the ins and outs of Scala rather than simply implementing an HTTP server with it, so I've been digging into this for a couple hours. I understand that it has something to do with extractors and the unapply method on the GET, Path, and Seg objects, I also knows that when I debug it hits unapply in GET before Path and Path before Seg.
I don't understand the following things:
Why can't I write GET.unapply(req), but I can write GET(req) or GET() and it will match any HTTP GET?
Why or how does the compiler know what values get passed to each extractor's unapply method? It seems that it will just chain them together unless one of them returns a None instead of an Some?
How does it bind the variables p and q? It knows they are Strings, it must infer that from the return type of Seg.unapply, but I don't understand the mechanism that assigns p the value of the first part of the list and q the value of the second part of the list.
Is there a way to rewrite it to make it more clear what's happening? When I first looked at this example, I was confused by the line
val sayhello = unfiltered.netty.cycle.Planify {, I dug around and rewrote it and found out that it was implicitly creating a PartialFunction and passing it to Planify.apply.
One way to understand it is to rewrite this expression the way that it gets rewritten by the Scala compiler.
unfiltered.netty.cycle.Planify expects a PartialFunction[HttpRequest[ReceivedMessage], ResponseFunction[NHttpResponse]], that is, a function that may or may not match the argument. If there's no match in either of the case statements, the request gets ignored. If there is a match -- which also has to pass all of the extractors -- the response will be returned.
Each case statement gets an instance of HttpRequest[ReceivedMessage]. Then, it applies it with left associativity through a series of unapply methods for each of the matchers:
// The request passed to us is HttpRequest[ReceivedMessage]
// GET.unapply only returns Some if the method is GET
GET.unapply(request) flatMap { getRequest =>
// this separates the path from the query
Path.unapply(getRequest) flatMap { path =>
// splits the path by "/"
Seg.unapply(path) flatMap { listOfParams =>
// Calls to unapply don't end here - now we build an
// instance of :: class, which
// since a :: b is the same as ::(a, b)
::.unapply(::(listOfParams.head, listOfParams.tail)) flatMap { case (p, restOfP) =>
::.unapply(::(restOfP.head, Nil)) map { case (q, _) =>
ResponseString("Hello World! " + p + " " + q)
}
}
}
}
}
Hopefully, this gives you an idea of how the matching works behind the scenes. I'm not entirely sure if I got the :: bit right - comments are welcome.
Is it possible to do anything after a pattern match in a foreach statement?
I want to do a post match step e.g. to set a variable. I also want to force a Unit return as my foreach is String => Unit, and by default Scala wants to return the last statement.
Here is some code:
Iteratee.foreach[String](_ match {
case "date" => out.push("Current date: " + new Date().toString + "<br/>")
case "since" => out.push("Last command executed: " + (ctm - last) + "ms before now<br/>")
case unknow => out.push("Command: " + unknown + " not recognized <br/>")
} // here I would like to set "last = ctm" (will be a Long)
)
UPDATED:
New code and context. Also new questions added :) They are embedded in the comments.
def socket = WebSocket.using[String] { request =>
// Comment from an answer bellow but what are the side effects?
// By convention, methods with side effects takes an empty argument list
def ctm(): Long = System.currentTimeMillis
var last: Long = ctm
// Command handlers
// Comment from an answer bellow but what are the side effects?
// By convention, methods with side effects takes an empty argument list
def date() = "Current date: " + new Date().toString + "<br/>"
def since(last: Long) = "Last command executed: " + (ctm - last) + "ms before now<br/>"
def unknown(cmd: String) = "Command: " + cmd + " not recognized <br/>"
val out = Enumerator.imperative[String] {}
// How to transform into the mapping strategy given in lpaul7's nice answer.
lazy val in = Iteratee.foreach[String](_ match {
case "date" => out.push(date)
case "since" => out.push(since(last))
case unknown => out.push(unknown)
} // Here I want to update the variable last to "last = ctm"
).mapDone { _ =>
println("Disconnected")
}
(in, out)
}
I don't know what your ctm is, but you could always do this:
val xs = List("date", "since", "other1", "other2")
xs.foreach { str =>
str match {
case "date" => println("Match Date")
case "since" => println("Match Since")
case unknow => println("Others")
}
println("Put your post step here")
}
Note you should use {} instead of () when you want use a block of code as the argument of foreach().
I will not answer your question, but I should note that reassigning variables in Scala is a bad practice. I suggest you to rewrite your code to avoid vars.
First, transform your strings to something else:
val strings = it map {
case "date" => "Current date: " + new Date().toString + "<br/>"
case "since" => "Last command executed: " + (ctm - last) + "ms before now<br/>"
case unknow => "Command: " + unknown + " not recognized <br/>"
}
Next, push it
strings map { out.push(_) }
It looks like your implementation of push has side effects. Bad for you, because such methods makes your program unpredictable. You can easily avoid side effects by making push return a tuple:
def push(s: String) = {
...
(ctm, last)
}
And using it like:
val (ctm, last) = out.push(str)
Update:
Of course side effects are needed to make programs useful. I only meant that methods depending on outer variables are less predictable than pure one, it is hard to reason about it. It is easier to test methods without side effects.
Yes, you should prefer vals over vars, it makes your program more "functional" and stateless. Stateless algorithms are thread safe and very predictable.
It seems like your program is stateful by nature. At least, try to stay as "functional" and stateless as you can :)
My suggested solution of your problem is:
// By convention, methods with side effects takes an empty argument list
def ctm(): Long = // Get current time
// Command handlers
def date() = "Current date: " + new Date().toString + "<br/>"
def since(last: Long) = "Last command executed: " + (ctm() - last) + "ms before now<br/>"
def unknown(cmd: String) = "Command: " + unknown + " not recognized <br/>"
// In your cmd processing loop
// First, map inputs to responses
val cmds = inps map {
case "date" => date()
case "since" => since(last)
case unk => unknown(unk)
}
// Then push responses and update state
cmds map { response =>
out.push(response)
// It is a good place to update your state
last = ctm()
}
It is hard to test this without context of your code, so you should fit it to your needs yourself. I hope I've answered your question.
I am trying to figure out memory-efficient AND functional ways to process a large scale of data using strings in scala. I have read many things about lazy collections and have seen quite a bit of code examples. However, I run into "GC overhead exceeded" or "Java heap space" issues again and again.
Often the problem is that I try to construct a lazy collection, but evaluate each new element when I append it to the growing collection (I don't now any other way to do so incrementally). Of course, I could try something like initializing an initial lazy collection first and and yield the collection holding the desired values by applying the ressource-critical computations with map or so, but often I just simply do not know the exact size of the final collection a priori to initial that lazy collection.
Maybe you could help me by giving me hints or explanations on how to improve following code as an example, which splits a FASTA (definition below) formatted file into two separate files according to the rule that odd sequence pairs belong to one file and even ones to aother one ("separation of strands"). The "most" straight-forward way to do so would be in a imperative way by looping through the lines and printing into the corresponding files via open file streams (and this of course works excellent). However, I just don't enjoy the style of reassigning to variables holding header and sequences, thus the following example code uses (tail-)recursion, and I would appreciate to have found a way to maintain a similar design without running into ressource problems!
The example works perfectly for small files, but already with files at around ~500mb the code will fail with the standard JVM setups. I do want to process files of "arbitray" size, say 10-20gb or so.
val fileName = args(0)
val in = io.Source.fromFile(fileName) getLines
type itType = Iterator[String]
type sType = Stream[(String, String)]
def getFullSeqs(ite: itType) = {
//val metaChar = ">"
val HeadPatt = "(^>)(.+)" r
val SeqPatt = "([\\w\\W]+)" r
#annotation.tailrec
def rec(it: itType, out: sType = Stream[(String, String)]()): sType =
if (it hasNext) it next match {
case HeadPatt(_,header) =>
// introduce new header-sequence pair
rec(it, (header, "") #:: out)
case SeqPatt(seq) =>
val oldVal = out head
// concat subsequences
val newStream = (oldVal._1, oldVal._2 + seq) #:: out.tail
rec(it, newStream)
case _ =>
println("something went wrong my friend, oh oh oh!"); Stream[(String, String)]()
} else out
rec(ite)
}
def printStrands(seqs: sType) {
import java.io.PrintWriter
import java.io.File
def printStrand(seqse: sType, strand: Int) {
// only use sequences of one strand
val indices = List.tabulate(seqs.size/2)(_*2 + strand - 1).view
val p = new PrintWriter(new File(fileName + "." + strand))
indices foreach { i =>
p.print(">" + seqse(i)._1 + "\n" + seqse(i)._2 + "\n")
}; p.close
println("Done bro!")
}
List(1,2).par foreach (s => printStrand(seqs, s))
}
printStrands(getFullSeqs(in))
Three questions arise for me:
A) Let's assume one needs to maintain a large data structure obtained by processing the initial iterator you get from getLines like in my getFullSeqs method (note the different size of in and the output of getFullSeqs), because transformations on the whole(!) data is required repeatedly, because one does not know which part of the data one will require at any step. My example might not be the best, but how to do so? Is it possible at all??
B) What when the desired data structure is not inherently lazy, say one would like to store the (header -> sequence) pairs into a Map()? Would you wrap it in a lazy collection?
C) My implementation of constructing the stream might reverse the order of the inputted lines. When calling reverse, all elements will be evaluated (in my code, they already are, so this is the actual problem). Is there any way to post-process "from behind" in a lazy fashion? I know of reverseIterator, but is this already the solution, or will this not actually evaluate all elements first, too (as I would need to call it on a list)? One could construct the stream with newVal #:: rec(...), but I would lose tail-recursion then, wouldn't I?
So what I basically need is to add elements to a collection, which are not evaluated by the process of adding. So lazy val elem = "test"; elem :: lazyCollection is not what I am looking for.
EDIT: I have also tried using by-name parameter for the stream argument in rec .
Thank you so much for your attention and time, I really appreciate any help (again :) ).
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
FASTA is defined as a sequential set of sequences delimited by a single header line. A header is defined as a line starting with ">". Every line below the header is called part of the sequence associated with the header. A sequence ends when a new header is present. Every header is unique. Example:
>HEADER1
abcdefg
>HEADER2
hijklmn
opqrstu
>HEADER3
vwxyz
>HEADER4
zyxwv
Thus, sequence 2 is twice as big as seq 1. My program would split that file into a file A containing
>HEADER1
abcdefg
>HEADER3
vwxyz
and a second file B containing
>HEADER2
hijklmn
opqrstu
>HEADER4
zyxwv
The input file is assumed to consist of an even number of header-sequence pairs.
The key to working with really large data structures is to hold in memory only that which is critical to perform whatever operation you need. So, in your case, that's
Your input file
Your two output files
The current line of text
and that's it. In some cases you can need to store information such as how long a sequence is; in such events, you build the data structures in a first pass and use them on a second pass. Let's suppose, for example, that you decide that you want to write three files: one for even records, one for odd, and one for entries where the total length is less than 300 nucleotides. You would do something like this (warning--it compiles but I never ran it, so it may not actually work):
final def findSizes(
data: Iterator[String], sz: Map[String,Long] = Map(),
currentName: String = "", currentSize: Long = 0
): Map[String,Long] = {
def currentMap = if (currentName != "") sz + (currentName->currentSize) else sz
if (!data.hasNext) currentMap
else {
val s = data.next
if (s(0) == '>') findSizes(data, currentMap, s, 0)
else findSizes(data, sz, currentName, currentSize + s.length)
}
}
Then, for processing, you use that map and pass through again:
import java.io._
final def writeFiles(
source: Iterator[String], targets: Array[PrintWriter],
sizes: Map[String,Long], count: Int = -1, which: Int = 0
) {
if (!source.hasNext) targets.foreach(_.close)
else {
val s = source.next
if (s(0) == '>') {
val w = if (sizes.get(s).exists(_ < 300)) 2 else (count+1)%2
targets(w).println(s)
writeFiles(source, targets, sizes, count+1, w)
}
else {
targets(which).println(s)
writeFiles(source, targets, sizes, count, which)
}
}
}
You then use Source.fromFile(f).getLines() twice to create your iterators, and you're all set. Edit: in some sense this is the key step, because this is your "lazy" collection. However, it's not important just because it doesn't read all memory in immediately ("lazy"), but because it doesn't store any previous strings either!
More generally, Scala can't help you that much from thinking carefully about what information you need to have in memory and what you can fetch off disk as needed. Lazy evaluation can sometimes help, but there's no magic formula because you can easily express the requirement to have all your data in memory in a lazy way. Scala can't interpret your commands to access memory as, secretly, instructions to fetch stuff off the disk instead. (Well, not unless you write a library to cache results from disk which does exactly that.)
One could construct the stream with newVal #:: rec(...), but I would
lose tail-recursion then, wouldn't I?
Actually, no.
So, here's the thing... with your present tail recursion, you fill ALL of the Stream with values. Yes, Stream is lazy, but you are computing all of the elements, stripping it of any laziness.
Now say you do newVal #:: rec(...). Would you lose tail recursion? No. Why? Because you are not recursing. How come? Well, Stream is lazy, so it won't evaluate rec(...).
And that's the beauty of it. Once you do it that way, getFullSeqs returns on the first interaction, and only compute the "recursion" when printStrands asks for it. Unfortunately, that won't work as is...
The problem is that you are constantly modifying the Stream -- that's not how you use a Stream. With Stream, you always append to it. Don't keep "rewriting" the Stream.
Now, there are three other problems I could readily identify with printStrands. First, it calls size on seqs, which will cause the whole Stream to be processed, losing lazyness. Never call size on a Stream. Second, you call apply on seqse, accessing it by index. Never call apply on a Stream (or List) -- that's highly inefficient. It's O(n), which makes your inner loop O(n^2) -- yes, quadratic on the number of headers in the input file! Finally, printStrands keeps a reference to seqs throughout the execution of printStrand, preventing processing elements from being garbage collected.
So, here's a first approximation:
def inputStreams(fileName: String): (Stream[String], Stream[String]) = {
val in = (io.Source fromFile fileName).getLines.toStream
val SeqPatt = "^[^>]".r
def demultiplex(s: Stream[String], skip: Boolean): Stream[String] = {
if (s.isEmpty) Stream.empty
else if (skip) demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = false)
else s.head #:: (s.tail takeWhile (SeqPatt findFirstIn _ nonEmpty)) #::: demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = true)
}
(demultiplex(in, skip = false), demultiplex(in, skip = true))
}
The problem with the above, and I'm showing that code just to further guide in the issues of lazyness, is that the instant you do this:
val (a, b) = inputStreams(fileName)
You'll keep a reference to the head of both streams, which prevents garbage collecting them. You can't keep a reference to them, so you have to consume them as soon as you get them, without ever storing them in a "val" or "lazy val". A "var" might do, but it would be tricky to handle. So let's try this instead:
def inputStreams(fileName: String): Vector[Stream[String]] = {
val in = (io.Source fromFile fileName).getLines.toStream
val SeqPatt = "^[^>]".r
def demultiplex(s: Stream[String], skip: Boolean): Stream[String] = {
if (s.isEmpty) Stream.empty
else if (skip) demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = false)
else s.head #:: (s.tail takeWhile (SeqPatt findFirstIn _ nonEmpty)) #::: demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = true)
}
Vector(demultiplex(in, skip = false), demultiplex(in, skip = true))
}
inputStreams(fileName).zipWithIndex.par.foreach {
case (stream, strand) =>
val p = new PrintWriter(new File("FASTA" + "." + strand))
stream foreach p.println
p.close
}
That still doesn't work, because stream inside inputStreams works as a reference, keeping the whole stream in memory even while they are printed.
So, having failed again, what do I recommend? Keep it simple.
def in = (scala.io.Source fromFile fileName).getLines.toStream
def inputStream(in: Stream[String], strand: Int = 1): Stream[(String, Int)] = {
if (in.isEmpty) Stream.empty
else if (in.head startsWith ">") (in.head, 1 - strand) #:: inputStream(in.tail, 1 - strand)
else (in.head, strand) #:: inputStream(in.tail, strand)
}
val printers = Array.tabulate(2)(i => new PrintWriter(new File("FASTA" + "." + i)))
inputStream(in) foreach {
case (line, strand) => printers(strand) println line
}
printers foreach (_.close)
Now this won't keep anymore in memory than necessary. I still think it's too complex, however. This can be done more easily like this:
def in = (scala.io.Source fromFile fileName).getLines
val printers = Array.tabulate(2)(i => new PrintWriter(new File("FASTA" + "." + i)))
def printStrands(in: Iterator[String], strand: Int = 1) {
if (in.hasNext) {
val next = in.next
if (next startsWith ">") {
printers(1 - strand).println(next)
printStrands(in, 1 - strand)
} else {
printers(strand).println(next)
printStrands(in, strand)
}
}
}
printStrands(in)
printers foreach (_.close)
Or just use a while loop instead of recursion.
Now, to the other questions:
B) It might make sense to do so while reading it, so that you do not have to keep two copies of the data: the Map and a Seq.
C) Don't reverse a Stream -- you'll lose all of its laziness.
For example suppose I have
for (line <- myData) {
println("}, {")
}
Is there a way to get the last line to print
println("}")
Can you refactor your code to take advantage of built-in mkString?
scala> List(1, 2, 3).mkString("{", "}, {", "}")
res1: String = {1}, {2}, {3}
Before going any further, I'd recommend you avoid println in a for-comprehension. It can sometimes be useful for tracking down a bug that occurs in the middle of a collection, but otherwise leads to code that's harder to refactor and test.
More generally, life usually becomes easier if you can restrict where any sort of side-effect occurs. So instead of:
for (line <- myData) {
println("}, {")
}
You can write:
val lines = for (line <- myData) yield "}, {"
println(lines mkString "\n")
I'm also going to take a guess here that you wanted the content of each line in the output!
val lines = for (line <- myData) yield (line + "}, {")
println(lines mkString "\n")
Though you'd be better off still if you just used mkString directly - that's what it's for!
val lines = myData.mkString("{", "\n}, {", "}")
println(lines)
Note how we're first producing a String, then printing it in a single operation. This approach can easily be split into separate methods and used to implement toString on your class, or to inspect the generated String in tests.
I agree fully with what has been said before about using mkstring, and distinguishing the first iteration rather than the last one. Would you still need to distinguish on the last, scala collections have an init method, which return all elements but the last.
So you can do
for(x <- coll.init) workOnNonLast(x)
workOnLast(coll.last)
(init and last being sort of the opposite of head and tail, which are the first and and all but first). Note however than depending on the structure, they may be costly. On Vector, all of them are fast. On List, while head and tail are basically free, init and last are both linear in the length of the list. headOption and lastOption may help you when the collection may be empty, replacing workOnlast by
for (x <- coll.lastOption) workOnLast(x)
You may take the addString function of the TraversableOncetrait as an example.
def addString(b: StringBuilder, start: String, sep: String, end: String): StringBuilder = {
var first = true
b append start
for (x <- self) {
if (first) {
b append x
first = false
} else {
b append sep
b append x
}
}
b append end
b
}
In your case, the separator is }, { and the end is }
If you don't want to use built-in mkString function, you can make something like
for (line <- lines)
if (line == lines.last) println("last")
else println(line)
UPDATE: As didierd mentioned in comments, this solution is wrong because last value can occurs several times, he provides better solution in his answer.
It is fine for Vectors, because last function takes "effectively constant time" for them, as for Lists, it takes linear time, so you can use pattern matching
#tailrec
def printLines[A](l: List[A]) {
l match {
case Nil =>
case x :: Nil => println("last")
case x :: xs => println(x); printLines(xs)
}
}
Other answers are rightfully pointed to mkString, and for a normal amount of data I would also use that.
However, mkString builds (accumulates) the end-result in-memory through a StringBuilder. This is not always desirable, depending on the amount of data we have.
In this case, if all we want is to "print" we don't need to build the big-result first (and maybe we even want to avoid this).
Consider the implementation of this helper function:
def forEachIsLast[A](iterator: Iterator[A])(operation: (A, Boolean) => Unit): Unit = {
while(iterator.hasNext) {
val element = iterator.next()
val isLast = !iterator.hasNext // if there is no "next", this is the last one
operation(element, isLast)
}
}
It iterates over all elements and invokes operation passing each element in turn, with a boolean value. The value is true if the element passed is the last one.
In your case it could be used like this:
forEachIsLast(myData) { (line, isLast) =>
if(isLast)
println("}")
else
println("}, {")
}
We have the following advantages here:
It operates on each element, one by one, without necessarily accumulating the result in memory (unless you want to).
Because it does not need to load the whole collection into memory to check its size, it's enough to ask the Iterator if it's exhausted or not. You could read data from a big file, or from the network, etc.