Variable not updated - scala

This is perhaps very basic but I am certainly missing something here.
It is all within a method, where I increment/alter a variable within a child/sub scope (it could be within a if block, or like here, within a map)
However, the result is unchanged variable. e.g. here, sum remains zero after the map. whereas, it should come up to 3L.
What am I missing ?
val resultsMap = scala.collection.mutable.Map.empty[String, Long]
resultsMap("0001") = 0L
resultsMap("0003") = 2L
resultsMap("0007") = 1L
var sum = 0L
resultsMap.mapValues(x => {sum = sum + x})
// I first wrote this, but then got worried and wrote more explicit version too, same behaviour
// resultMap.mapValues(sum+=_)
println("total of counts for txn ="+sum) // sum still 0
-- Update
I have similar behaviour where a loop is not updating the variable outside the loop. looking for text on variable scoping, but not found the golden source yet. all help is appreciated.
var cnt : Int = 0
rdd.foreach(article => {
if (<something>) {
println(<something>) // being printed
cnt += 1
println("counter is now "+cnt) // printed correctly
}
})

You should proceed like this:
val sum = resultsMap.values.reduce(_+_)
You just get your values and then add them up with reduce.
EDIT:
The reason sum stays unchanged is that mapValues produces a view, which means (among other things) the new map won't be computed unless the resulting view is acted upon, so in this case - the code block updating sum is simply never executed.
To see this - you can "force" the view to "materialize" (compute the new map) and see that sum is updated, as expected:
var sum = 0L
resultsMap.mapValues(x => {sum = sum + x}).view.force
println("SUM: " + sum) // prints 3
See related discussion here: Scala: Why mapValues produces a view and is there any stable alternatives?

Related

How to increment for loop variable dynamically in scala?

How to increment for-loop variable dynamically as per some condition.
For example.
var col = 10
for (i <- col until 10) {
if (Some condition)
i = i+2; // Reassignment to val, compile error
println(i)
}
How it is possible in scala.
Lots of low level languages allow you to do that via the C like for loop but that's not what a for loop is really meant for. In most languages, a for loop is used when you know in advance (when the loop starts) how many iterations you will need. Otherwise, a while loop is used.
You should use a while loop for that in scala.
var i = 0
while(i<10) {
if (Some condition)
i = i+2
println(i)
i+=1
}
If you dont want to use mutable variables you can try functional way for this
def loop(start: Int) {
if (some condition) {
loop(start + 2)
} else {
loop(start - 1) // whatever you want to do.
}
}
And as normal recursion function you'll need some condition to break the flow, I just wanted to give an idea of what can be done.
Hope this helps!!!
Ideally, you wouldn't use var for this. fold works pretty well with immutable values, whether it is an int, list, map...
It lets you set a default value (e.g. 0) to the variable you want to return and also iterate through the values(e.g i) changing that value (e.g accumulator) on every iteration.
val value = (1 to 10).fold(0)((loopVariable,i) => {
if(i == condition)
loopVariable+1
else
loopVariable
})
println(value)
Example

What is the most efficient way to iterate a collection in parallel in Scala (TOP N pattern)

I am new to Scala and would like to build a real-time application to match some people. For a given Person, I would like to get the TOP 50 people with the highest matching score.
The idiom is as follows :
val persons = new mutable.HashSet[Person]() // Collection of people
/* Feed omitted */
val personsPar = persons.par // Make it parall
val person = ... // The given person
res = personsPar
.filter(...) // Some filters
.map{p => (p,computeMatchingScoreAsFloat(person, p))}
.toList
.sortBy(-_._2)
.take(50)
.map(t => t._1 + "=" + t._2).mkString("\n")
In the sample code above, HashSet is used, but it can be any type of collection, as I am pretty sure it is not optimal
The problem is that persons contains over 5M elements, the computeMatchingScoreAsFloat méthods computes a kind a correlation value with 2 vectors of 200 floats. This computation takes ~2s on my computer with 6 cores.
My question is, what is the fastest way of doing this TOPN pattern in Scala ?
Subquestions :
- What implementation of collection (or something else?) should I use ?
- Should I use Futures ?
NOTE: It has to be computed in parallel, the pure computation of computeMatchingScoreAsFloat alone (with no ranking/TOP N) takes more than a second, and < 200 ms if multi-threaded on my computer
EDIT: Thanks to Guillaume, compute time has been decreased from 2s to 700ms
def top[B](n:Int,t: Traversable[B])(implicit ord: Ordering[B]):collection.mutable.PriorityQueue[B] = {
val starter = collection.mutable.PriorityQueue[B]()(ord.reverse) // Need to reverse for us to capture the lowest (of the max) or the greatest (of the min)
t.foldLeft(starter)(
(myQueue,a) => {
if( myQueue.length <= n ){ myQueue.enqueue(a);myQueue}
else if( ord.compare(a,myQueue.head) < 0 ) myQueue
else{
myQueue.dequeue
myQueue.enqueue(a)
myQueue
}
}
)
}
Thanks
I would propose a few changes:
1- I believe that the filter and map steps requires traversing the collection twice. Having a lazy collection would reduce it to one. Either have a lazy collection (like Stream) or converting it to one, for instance for a list:
myList.view
2- the sort step requires sorting all elements. Instead, you can do a FoldLeft with an accumulator storing the top N records. See there for an example of an implementation:
Simplest way to get the top n elements of a Scala Iterable . I would probably test a Priority Queue instead of a list if you want maximum performance (really falling into its wheelhouse). For instance, something like this:
def IntStream(n:Int):Stream[(Int,Int)] = if(n == 0) Stream.empty else (util.Random.nextInt,util.Random.nextInt) #:: IntStream(n-1)
def top[B](n:Int,t: Traversable[B])(implicit ord: Ordering[B]):collection.mutable.PriorityQueue[B] = {
val starter = collection.mutable.PriorityQueue[B]()(ord.reverse) // Need to reverse for us to capture the lowest (of the max) or the greatest (of the min)
t.foldLeft(starter)(
(myQueue,a) => {
if( myQueue.length <= n ){ myQueue.enqueue(a);myQueue}
else if( ord.compare(a,myQueue.head) < 0 ) myQueue
else{
myQueue.dequeue
myQueue.enqueue(a)
myQueue
}
}
)
}
def diff(t2:(Int,Int)) = t2._2
top(10,IntStream(10000))(Ordering.by(diff)) // select top 10
I really think that you problem requires a SINGLE collection traverse so you be able to get your run time down to below 1 sec
Good luck!

Different result returned using Scala Collection par in a series of runs

I have tasks that I want to execute concurrently and each task takes substantial amount of memory so I have to execute them in batches of 2 to conserve memory.
def runme(n: Int = 120) = (1 to n).grouped(2).toList.flatMap{tuple =>
tuple.par.map{x => {
println(s"Running $x")
val s = (1 to 100000).toList // intentionally to make the JVM allocate a sizeable chunk of memory
s.sum.toLong
}}
}
val result = runme()
println(result.size + " => " + result.sum)
The result I expected from the output was 120 => 84609924480 but the output was rather random. The returned collection size differed from execution to execution. Most of the time there was missing count even though all the futures were executed looking at the console. I thought flatMap waits the parallel executions in map to complete before returning the complete. What should I do to always get the right result using par? Thanks
Just for the record: changing the underlying collection in this case shouldn't change the output of your program. The problem is related to this known bug. It's fixed from 2.11.6, so if you use that (or higher) Scala version, you should not see the strange behavior.
And about the overflow, I still think that your expected value is wrong. You can check that the sum is overflowing because the list is of integers (which are 32 bit) while the total sum exceeds the integer limits. You can check it with the following snippet:
val n = 100000
val s = (1 to n).toList // your original code
val yourValue = s.sum.toLong // your original code
val correctValue = 1l * n * (n + 1) / 2 // use math formula
var bruteForceValue = 0l // in case you don't trust math :) It's Long because of 0l
for (i ← 1 to n) bruteForceValue += i // iterate through range
println(s"yourValue = $yourValue")
println(s"correctvalue = $correctValue")
println(s"bruteForceValue = $bruteForceValue")
which produces the output
yourValue = 705082704
correctvalue = 5000050000
bruteForceValue = 5000050000
Cheers!
Thanks #kaktusito.
It worked after I changed the grouped list to Vector or Seq i.e. (1 to n).grouped(2).toList.flatMap{... to (1 to n).grouped(2).toVector.flatMap{...

For loop in scala without sequence?

So, while working my way through "Scala for the Impatient" I found myself wondering: Can you use a Scala for loop without a sequence?
For example, there is an exercise in the book that asks you to build a counter object that cannot be incremented past Integer.MAX_VALUE. In order to test my solution, I wrote the following code:
var c = new Counter
for( i <- 0 to Integer.MAX_VALUE ) c.increment()
This throws an error: sequences cannot contain more than Int.MaxValue elements.
It seems to me that means that Scala is first allocating and populating a sequence object, with the values 0 through Integer.MaxValue, and then doing a foreach loop on that sequence object.
I realize that I could do this instead:
var c = new Counter
while(c.value < Integer.MAX_VALUE ) c.increment()
But is there any way to do a traditional C-style for loop with the for statement?
In fact, 0 to N does not actually populate anything with integers from 0 to N. It instead creates an instance of scala.collection.immutable.Range, which applies its methods to all the integers generated on the fly.
The error you ran into is only because you have to be able to fit the number of elements (whether they actually exist or not) into the positive part of an Int in order to maintain the contract for the length method. 1 to Int.MaxValue works fine, as does 0 until Int.MaxValue. And the latter is what your while loop is doing anyway (to includes the right endpoint, until omits it).
Anyway, since the Scala for is a very different (much more generic) creature than the C for, the short answer is no, you can't do exactly the same thing. But you can probably do what you want with for (though maybe not as fast as you want, since there is some performance penalty).
Wow, some nice technical answers for a simple question (which is good!) But in case anyone is just looking for a simple answer:
//start from 0, stop at 9 inclusive
for (i <- 0 until 10){
println("Hi " + i)
}
//or start from 0, stop at 9 inclusive
for (i <- 0 to 9){
println("Hi " + i)
}
As Rex pointed out, "to" includes the right endpoint, "until" omits it.
Yes and no, it depends what you are asking for. If you're asking whether you can iterate over a sequence of integers without having to build that sequence first, then yes you can, for instance using streams:
def fromTo(from : Int, to : Int) : Stream[Int] =
if(from > to) {
Stream.empty
} else {
// println("one more.") // uncomment to see when it is called
Stream.cons(from, fromTo(from + 1, to))
}
Then:
for(i <- fromTo(0, 5)) println(i)
Writing your own iterator by defining hasNext and next is another option.
If you're asking whether you can use the 'for' syntax to write a "native" loop, i.e. a loop that works by incrementing some native integer rather than iterating over values produced by an instance of an object, then the answer is, as far as I know, no. As you may know, 'for' comprehensions are syntactic sugar for a combination of calls to flatMap, filter, map and/or foreach (all defined in the FilterMonadic trait), depending on the nesting of generators and their types. You can try to compile some loop and print its compiler intermediate representation with
scalac -Xprint:refchecks
to see how they are expanded.
There's a bunch of these out there, but I can't be bothered googling them at the moment. The following is pretty canonical:
#scala.annotation.tailrec
def loop(from: Int, until: Int)(f: Int => Unit): Unit = {
if (from < until) {
f(from)
loop(from + 1, until)(f)
}
}
loop(0, 10) { i =>
println("Hi " + i)
}

Scala vals vs vars

I'm pretty new to Scala but I like to know what is the preferred way of solving this problem. Say I have a list of items and I want to know the total amount of the items that are checks. I could do something like so:
val total = items.filter(_.itemType == CHECK).map(._amount).sum
That would give me what I need, the sum of all checks in a immutable variable. But it does it with what seems like 3 iterations. Once to filter the checks, again to map the amounts and then the sum. Another way would be to do something like:
var total = new BigDecimal(0)
for (
item <- items
if item.itemType == CHECK
) total += item.amount
This gives me the same result but with 1 iteration and a mutable variable which seems fine too. But if I wanted to to extract more information, say the total number of checks, that would require more counters or mutable variables but I wouldn't have to iterate over the list again. Doesn't seem like the "functional" way of achieving what I need.
var numOfChecks = 0
var total = new BigDecimal(0)
items.foreach { item =>
if (item.itemType == CHECK) {
numOfChecks += 1
total += item.amount
}
}
So if you find yourself needing a bunch of counters or totals on a list is it preferred to keep mutable variables or not worry about it do something along the lines of:
val checks = items.filter(_.itemType == CHECK)
val total = checks.map(_.amount).sum
return (checks.size, total)
which seems easier to read and only uses vals
Another way of solving your problem in one iteration would be to use views or iterators:
items.iterator.filter(_.itemType == CHECK).map(._amount).sum
or
items.view.filter(_.itemType == CHECK).map(._amount).sum
This way the evaluation of the expression is delayed until the call of sum.
If your items are case classes you could also write it like this:
items.iterator collect { case Item(amount, CHECK) => amount } sum
I find that speaking of doing "three iterations" is a bit misleading -- after all, each iteration does less work than a single iteration with everything. So it doesn't automatically follows that iterating three times will take longer than iterating once.
Creating temporary objects, now that is a concern, because you'll be hitting memory (even if cached), which isn't the case of the single iteration. In those cases, view will help, even though it adds more method calls to do the same work. Hopefully, JVM will optimize that away. See Moritz's answer for more information on views.
You may use foldLeft for that:
(0 /: items) ((total, item) =>
if(item.itemType == CHECK)
total + item.amount
else
total
)
The following code will return a tuple (number of checks -> sum of amounts):
((0, 0) /: items) ((total, item) =>
if(item.itemType == CHECK)
(total._1 + 1, total._2 + item.amount)
else
total
)