views in collections in scala - scala

I understand that a view is a light-weight collection and that it is lazy. I would like to understand what makes a view light weight.
Say I have a list of 1000 random numbers. I'll like to find even numbers in this list and pick only 1st 10 even numbers. I believe using a view here is better because we can avoid creating an intermediate list esp because I'll pick only 1st 10 even numbers. Initially, I thought that the the optimization is achieved because the function I'll use in the filter method will not get executed till the method force is called but this isn't correct I believe. I am struggling to understand what makes using the view better in this scenario. Or have I picked a wrong example?
val r = scala.util.Random
val l:List[Int] = List.tabulate(1000)(x=>r.nextInt())
//without view, I'll get an intermediate list. The function x%2==0 will be on each elemenet of l
val l1 = l.filter(x=>(x%2 == 0))
//this will give size of l2. I got size as 508 but yours could be different depending on the random numbers generated in your case
l1.size
//pick 1st 10 even numbers
val l2 = l1.take(10)
//using view. I thought that x%2==0 will not be executed right now
val lv1 = l.view.filter(x=>(x%2 == 0))
lv1: scala.collection.SeqView[Int,List[Int]] = SeqViewF(...)
lv1.size //this is same as l1 size so my assumption that x%2==0 will not be executed is wrong else lv1.size will not be same as l1.size
val lv2 = lv1.take(10).force
**Question 1 - if I use view, how is the processing optimised?
Question 2 - lv1 is of type SeqViewF, F is related to filter but what does it mean?
Question 3 - what do the elements of lv1 look like (l1 for example are integers)**

You wrote:
lv1.size //this is same as l1 size so my assumption that x%2==0 will
not be executed is wrong else lv1.size will not be same as l1.size
Your assumption is actually correct it's just that your means of measuring the difference is faulty.
val l:List[Int] = List.fill(10)(util.Random.nextInt) // ten random Ints
// print every Int that gets tested in the filter
val lv1 = l.view.filter{x => println(x); x%2 == 0} // no lines printed
lv1.size // ten Ints sent to STDOUT
So, as you see, taking the size of your view also forces its completion.

Yeah, that's not a very fitting example. What you are doing is better done with an iterator: list.filter(_ % 2 == 0).take(10). This doesn't create intermediate collections, and does not scan the list past the first 10 even elements (view wouldn't either, it's just a bit of an overcomplication for this case).
A view is a sequence of delayed operations. It has a reference to the collection, and a bunch of operations to be applied when it is forced. The way operations to be applied are recorded is rather complicated, and not really important. You guessed right - SeqViewF means a view of a sequence with a filter applied. If you map over it, you'll get a SeqViewFM etc.
When would this be needed?
One example is when you need to "massage" a sequence that you are passing somewhere else. Suppose, you have a function, that combines elements of a sequence you pass in somehow:
def combine(s: Seq[Int]) = s.iterator.zipWithIndex.map {
case(x, i) if i % 2 == 0 => x
case(x, _) => -x
}.sum
Now, suppose, you have a huge stream of numbers, and you want to combine only even ones, while dropping the others. You can use your existing function for that:
val result = combine(stream.view.filter(_ % 2 == 0))
Of course, if combine parameter was declared as iterator to begin with, you would not need the view again, but that is not always possible, sometimes you just have to use some standard interface, that just wants a sequence.
Here is a fancier example, that also takes advantage of the fact that the elements are computed on access:
def notifyUsers(users: Seq[User]) = users
.iterator
.filter(_.needsNotification)
.foreach(_.notify)
timer.schedule(60 seconds) { notifyUsers(userIDs.view.map(getUser)) }
So, I have some ids of the users that may need to be notified of some external events. I have them stored in userIDs.
Every minute a task runs, that finds all users that need to be notified, and sends a notification to each of them.
Here is the trick: notifyUsers takes a collection of User as a parameter. But what we are really passing in is a view, composed of the initial set of user ids, and a .map operation, getting the User object for each of them. As a result, every time the task runs, a new User object will be obtained for each id (perhaps, from the database), so, if the _needsNotification flag gets changed, the new value is picked up.
Surely, I could change notifyUsers to receive the list of ids, and do getUser on its own instead, but that wouldn't be as neat. First, this way, it is easier to unit-test - I can just pass an a list of test objects directly in, without bothering to mock out getUser. And second, a generic utility like this is more useful - a User could be a trait, for example, that could be representing many different domain objects.

Related

Scala Array.view memory usage

I'm learning Scala and have been trying some LeetCode problems with it, but I'm having trouble with the memory limit being exceeded. One problem I have tried goes like this:
A swap is defined as taking two distinct positions in an array and swapping the values in them.
A circular array is defined as an array where we consider the first element and the last element to be adjacent.
Given a binary circular array nums, return the minimum number of swaps required to group all 1's present in the array together at any location.
and my attempted solution looks like
object Solution {
def minSwaps(nums: Array[Int]): Int = {
val count = nums.count(_==1)
if (count == 0) return 0
val circular = nums.view ++ nums.view
circular.sliding(count).map(_.count(_==0)).min
}
}
however, when I submit it, I'm hit with Memory Limit Exceeded for one of the test case where nums is very large.
My understanding is that, because I'm using .view, I shouldn't be allocating over O(1) memory. Is that understanding incorrect? To be clear, I realise this is the most time efficient way of solving this, but I didn't expect it to be memory inefficient.
The version used is Scala 2.13.7, in case that makes a difference.
Update
I did some inspection of the types and it seems circular is only a View unless I replace ++ with concat which makes it IndexedSeqView, why is that, I thought ++ was just an alias for concat?
If I make the above change, and replace circular.sliding(count) with (0 to circular.size - count).view.map(i => circular.slice(i, i + count)) it "succeeds" in hitting the time limit instead, so I think sliding might not be optimised for IndexedSeqView.

Use of View in Scala

I was solving the problem Filter Positions in a List:(For a given list with integers, return a new list removing the elements at odd positions)
I came up with
arr.zipWithIndex.filter(_._2 %2 == 1).map(_._1)
But someone has suggested that below code would be faster
arr.view.zipWithIndex.filter{ _._2 % 2 != 0 }.map { _._1}.force.toList
I know, View creates unconditional collection(Lazies evaluation), But
at which step (Method call) it will help us.
Meaning:
arr.view, creates view on which zipWithIndex will work, and zipWithIndex will process each element to create Map of value and Index. I guess till now, no optimization.
filter method has to work on every element then it only it can skip or select it.
I am not sure, how adding view in this case would help.
Using view means that all the operations can happen at once, instead of one at a time.
arr.zipWithIndex.filter(_._2 %2 == 1).map(_._1)
This works, but it creates 3 new lists in the process, first it runs the zipWithIndex producing a new list with the result. Then passes than new list to the filter, creating another list, and finally calls map on that list producing the final list.
So in that we creates two intermediate lists that we don't really need.
arr.view.zipWithIndex.filter{ _._2 % 2 != 0 }.map { _._1}.force.toList
This version uses view so if can perform all those operations in one shot, without needing to create those intermediate collections at each step.

Lazily generate partial sums in Scala

I want to produce a lazy list of partial sums and stop when I have found a "suitable" sum. For example, I want to do something like the following:
val str = Stream.continually {
val i = Random.nextInt
println("generated " + i)
List(i)
}
str
.take(5)
.scanLeft(List[Int]())(_ ++ _)
.find(l => !l.forall(_ > 0))
This produces output like the following:
generated -354822103
generated 1841977627
z: Option[List[Int]] = Some(List(-354822103))
This is nice because I've avoided producing the entire list of lists before finding a suitable list. However, it's suboptimal because I generated one extra random number that I don't need (i.e., the second, positive number in this test run). I know I can hand code a solution to do what I want, but is there a way to use the core scala collection library to achieve this result without writing my own recursion?
The above example is just a toy, but the real application involves heavy-duty network traffic for each "retry" as I build up a map until the map is "complete".
EDIT: Note that even substituting take(1) for find(...) results in the generation of a random number even though the returned value List() does not depend on the number. Does anyone know why the number is being generated in this case? I would think scanLeft does not need to fetch an element of the iterable receiving the call to scanLeft in this case.

comparing an element with all elements in a list

I'm learning Scala now, and I have a scenario where I have to compare an element (say num) with all the elements in a list.
Assume,
val MyList = List(1, 2, 3, 4)
If num is equal to anyone the elements in the list, I need to return true. I know to do it recursively using the head and tail functions, but is there a simpler way to it (I think I'll be able to do it using foreach, but I'm not sure how to implement it exactly)?
There is number of possibilities:
val x = 3
MyList.contains(x)
!MyList.forall(y => y != x) // early exit, basically the same as .contains
If you plan to do it frequently, you may consider to convert your list to Set, cause every .contains lookup on list in worst case is proportional to number of elements, whereas on Set it is effectively constant
val mySet = MyList.toSet
mySet.contains(x)
or simply:
mySet(x)
A contains method is pretty standard for lists in any language. Scala's List has it too:
http://www.scala-lang.org/api/current/scala/collection/immutable/List.html
As others have answered, the contains method on the list will do exactly this, and it's the most understandable/performant way.
Looking at your closing comments though, you wouldn't be able to do it (in an elegant fashion) with foreach, since that returns Unit. Foreach "does" something for each element, but you don't get any result back. It's useful for logging/println statements, but it doesn't act as a transformation.
If you want to run a function on every element individually, you would use map, which returns a List of the results of applying the function. So assuming num = 3, then MyList.map(_ == num) would return List(false, false, true, false). Since you're looking for a single result, and not a list of results, then this is not what you're after.
In order to collapse a sequence of things into a single result, you would use a fold over the data. Folding involves a function that takes two arguments (the result so far, and the current thing in the list) and returns the new running result. So that this can work on the very first element, you also need to provide the initial value to use for the ongoing result (usually some sort of zero).
In your particular case, then, you want a Boolean answer at the end - "was an element found that was equal to num". So the running result would be "have I seen an element so far that was equal to num". Which means the initial value is false. And the function itself should return true if an element has already been seen, or if the current element is equal to num.
Putting this together, it would look like this:
MyList.foldLeft(false) { case (runningResult, listElem) =>
// return true if runningResult is true, or if listElem is the target number
runningResult || listElem == num
}
This doesn't have the nice aspect of stopping as soon as the target value has been found - and it's nowhere near as concise as calling MyList.contains. But as an instructional example, this is how you could implement this yourself from the primitive functional operations on a list.
List has a method for that:
val found = MyList.contains(num)

How to correctly get current loop count from a Iterator in scala

I am looping over the following lines from a csv file to parse them. I want to identify the first line since its the header. Whats the best way of doing this instead of making a var counter holder.
var counter = 0
for (line <- lines) {
println(CsvParser.parse(line, counter))
counter++
}
I know there is got to be a better way to do this, newbie to Scala.
Try zipWithIndex:
for (line <- lines.zipWithIndex) {
println(CsvParser.parse(line._1, line._2))
}
#tenshi suggested the following improvement with pattern matching:
for ((line, count) <- lines.zipWithIndex) {
println(CsvParser.parse(line, count))
}
I totally agree with the given answer, still that I've to point something important out and initially I planned to put in a simple comment.
But it would be quite long, so that, leave me set it as a variant answer.
It's prefectly true that zip* methods are helpful in order to create tables with lists, but they have the counterpart that they loop the lists in order to create it.
So that, a common recommendation is to sequence the actions required on the lists in a view, so that you combine all of them to be applied only producing a result will be required. Producing a result is considered when the returnable isn't an Iterable. So is foreach for instance.
Now, talking about the first answer, if you have lines to be the list of lines in a very big file (or even an enumeratee on it), zipWithIndex will go through all of 'em and produce a table (Iterable of tuples). Then the for-comprehension will go back again through the same amount of items.
Finally, you've impacted the running lenght by n, where n is the length of lines and added a memory footprint of m + n*16 (roughtly) where m is the lines' footprint.
Proposition
lines.view.zipWithIndex map Function.tupled(CsvParser.parse) foreach println
Some few words left (I promise), lines.view will create something like scala.collection.SeqView that will hold all further "mapping" function producing new Iterable, as are zipWithIndex and map.
Moreover, I think the expression is more elegant because it follows the reader and logical.
"For lines, create a view that will zip each item with its index, the result as to be mapped on the result of the parser which must be printed".
HTH.