How to correctly get current loop count from a Iterator in scala - scala

I am looping over the following lines from a csv file to parse them. I want to identify the first line since its the header. Whats the best way of doing this instead of making a var counter holder.
var counter = 0
for (line <- lines) {
println(CsvParser.parse(line, counter))
counter++
}
I know there is got to be a better way to do this, newbie to Scala.

Try zipWithIndex:
for (line <- lines.zipWithIndex) {
println(CsvParser.parse(line._1, line._2))
}
#tenshi suggested the following improvement with pattern matching:
for ((line, count) <- lines.zipWithIndex) {
println(CsvParser.parse(line, count))
}

I totally agree with the given answer, still that I've to point something important out and initially I planned to put in a simple comment.
But it would be quite long, so that, leave me set it as a variant answer.
It's prefectly true that zip* methods are helpful in order to create tables with lists, but they have the counterpart that they loop the lists in order to create it.
So that, a common recommendation is to sequence the actions required on the lists in a view, so that you combine all of them to be applied only producing a result will be required. Producing a result is considered when the returnable isn't an Iterable. So is foreach for instance.
Now, talking about the first answer, if you have lines to be the list of lines in a very big file (or even an enumeratee on it), zipWithIndex will go through all of 'em and produce a table (Iterable of tuples). Then the for-comprehension will go back again through the same amount of items.
Finally, you've impacted the running lenght by n, where n is the length of lines and added a memory footprint of m + n*16 (roughtly) where m is the lines' footprint.
Proposition
lines.view.zipWithIndex map Function.tupled(CsvParser.parse) foreach println
Some few words left (I promise), lines.view will create something like scala.collection.SeqView that will hold all further "mapping" function producing new Iterable, as are zipWithIndex and map.
Moreover, I think the expression is more elegant because it follows the reader and logical.
"For lines, create a view that will zip each item with its index, the result as to be mapped on the result of the parser which must be printed".
HTH.

Related

An efficient and composable foreach in Scala

Goal: Efficiently do something for each element in a list, and then return the original list, so that I can do something else with original list. For example, let lst be a very large list, and suppose we do many operations to it before applying our foreach. What I want to do is something like this:
lst.many_operations().foreach(x => f(x)).something_else()
However, foreach returns a unit. I seek a way to iterate through the list and return the original list supplied, so that I can do something_else() to it. To reduce the memory impact, I need to avoid saving the result of lst.many_operations() to a variable.
An obvious, but imperfect, solution is to replace foreach with map. Then the code looks like:
lst
.many_operations()
.map(x => {
f(x)
x
}).something_else()
However, this is not good because map constructs a new list, effectively duplicating the very large list that it iterated through.
What is the right way to do this in Scala?
The simplest way seems to be:
lst.foreach(many_operations)
lst.foreach(something_else)
However: using side effects is really not a good idea. I would urge you to revisit your design to use explicit pure transformations rather than side effects and mutations.
To address your concern about having multiple lists in memory at the same time, you can use view or iterator to emulate streaming processing, and discard intermediate results you do not need to use again:
val newList = lst.iterator
.map(foo)
.map(bar)
.map(baz)
.toList
(lst will get garbage collected if you do not reference it again).

Scala Array.view memory usage

I'm learning Scala and have been trying some LeetCode problems with it, but I'm having trouble with the memory limit being exceeded. One problem I have tried goes like this:
A swap is defined as taking two distinct positions in an array and swapping the values in them.
A circular array is defined as an array where we consider the first element and the last element to be adjacent.
Given a binary circular array nums, return the minimum number of swaps required to group all 1's present in the array together at any location.
and my attempted solution looks like
object Solution {
def minSwaps(nums: Array[Int]): Int = {
val count = nums.count(_==1)
if (count == 0) return 0
val circular = nums.view ++ nums.view
circular.sliding(count).map(_.count(_==0)).min
}
}
however, when I submit it, I'm hit with Memory Limit Exceeded for one of the test case where nums is very large.
My understanding is that, because I'm using .view, I shouldn't be allocating over O(1) memory. Is that understanding incorrect? To be clear, I realise this is the most time efficient way of solving this, but I didn't expect it to be memory inefficient.
The version used is Scala 2.13.7, in case that makes a difference.
Update
I did some inspection of the types and it seems circular is only a View unless I replace ++ with concat which makes it IndexedSeqView, why is that, I thought ++ was just an alias for concat?
If I make the above change, and replace circular.sliding(count) with (0 to circular.size - count).view.map(i => circular.slice(i, i + count)) it "succeeds" in hitting the time limit instead, so I think sliding might not be optimised for IndexedSeqView.

Lazily generate partial sums in Scala

I want to produce a lazy list of partial sums and stop when I have found a "suitable" sum. For example, I want to do something like the following:
val str = Stream.continually {
val i = Random.nextInt
println("generated " + i)
List(i)
}
str
.take(5)
.scanLeft(List[Int]())(_ ++ _)
.find(l => !l.forall(_ > 0))
This produces output like the following:
generated -354822103
generated 1841977627
z: Option[List[Int]] = Some(List(-354822103))
This is nice because I've avoided producing the entire list of lists before finding a suitable list. However, it's suboptimal because I generated one extra random number that I don't need (i.e., the second, positive number in this test run). I know I can hand code a solution to do what I want, but is there a way to use the core scala collection library to achieve this result without writing my own recursion?
The above example is just a toy, but the real application involves heavy-duty network traffic for each "retry" as I build up a map until the map is "complete".
EDIT: Note that even substituting take(1) for find(...) results in the generation of a random number even though the returned value List() does not depend on the number. Does anyone know why the number is being generated in this case? I would think scanLeft does not need to fetch an element of the iterable receiving the call to scanLeft in this case.

Best way to represent a readline loop in Scala?

Coming from a C/C++ background, I'm not very familiar with the functional style of programming so all my code tends to be very imperative, as in most cases I just can't see a better way of doing it.
I'm just wondering if there is a way of making this block of Scala code more "functional"?
var line:String = "";
var lines:String = "";
do {
line = reader.readLine();
lines += line;
} while (line != null)
How about this?
val lines = Iterator.continually(reader.readLine()).takeWhile(_ != null).mkString
Well, in Scala you can actually say:
val lines = scala.io.Source.fromFile("file.txt").mkString
But this is just a library sugar. See Read entire file in Scala? for other possiblities. What you are actually asking is how to apply functional paradigm to this problem. Here is a hint:
Source.fromFile("file.txt").getLines().foreach {println}
Do you get the idea behind this? foreach line in the file execute println function. BTW don't worry, getLines() returns an iterator, not the whole file. Now something more serious:
lines filter {_.startsWith("ab")} map {_.toUpperCase} foreach {println}
See the idea? Take lines (it can be an array, list, set, iterator, whatever that can be filtered and which contains an items having startsWith method) and filter taking only the items starting with "ab". Now take every item and map it by applying toUpperCase method. Finally foreach resulting item print it.
The last thought: you are not limited to a single type. For instance say you have a file containing integer number, one per line. If you want to read that file, parse the number and sum them up, simply say:
lines.map(_.toInt).sum
To add how the same can be achieved using the formerly new nio files which I vote to use because it has several advantages:
val path: Path = Paths.get("foo.txt")
val lines = Source.fromInputStream(Files.newInputStream(path)).getLines()
// Now we can iterate the file or do anything we want,
// e.g. using functional operations such as map. Or simply concatenate.
val result = lines.mkString
Don't forget to close the stream afterwards.
I find that Stream is a pretty nice approach: it create a re-traversible (if needed) sequence:
def loadLines(in: java.io.BufferedReader): Stream[String] = {
val line = in.readLine
if (line == null) Stream.Empty
else Stream.cons(line, loadLines(in))
}
Each Stream element has a value (a String, line, in this case), and calls a function (loadLines(in), in this example) which will yield the next element, lazily, on demand. This makes for a good memory usage profile, especially with large data sets -- lines aren't read until they're needed, and aren't retained unless something is actually still holding onto them. Yet you can also go back to a previous Stream element and traverse forward again, yielding the exact same result.

Fill List with values from a for loop in Scala

I'm pretty new to scala and I am not able to solve this (pretty) trivial problem.
I know I can instantiate a List with predefined values like this:
val myList = List(1,2)
I want to fill a List with all Integers from 1 to 100000 . My Goal is not to use a var for the List and use a loop to fill the list.
Is there any "functional" way of doing this?
Either of these will do the trick. (If you try them in the REPL, though, be advised that it's going to try to print all million hundred thousand entries, which is generally not going to work.)
List.range(1,100001)
(1 to 100000).toList
I am also very new to Scala, it's pretty awesome isn't it.
Rex has the absolutely correct answer, but as food for thought: if you want a list that is not evaluated up front (perhaps the computations involved in evaluating the items in the list is expensive, or you just want to make things lazy), you can use a Stream.
Stream.from(0,1).takeWhile(_<=100000)
This can be used in most situations where you'd use a List.