How to increment for loop variable dynamically in scala? - scala

How to increment for-loop variable dynamically as per some condition.
For example.
var col = 10
for (i <- col until 10) {
if (Some condition)
i = i+2; // Reassignment to val, compile error
println(i)
}
How it is possible in scala.

Lots of low level languages allow you to do that via the C like for loop but that's not what a for loop is really meant for. In most languages, a for loop is used when you know in advance (when the loop starts) how many iterations you will need. Otherwise, a while loop is used.
You should use a while loop for that in scala.
var i = 0
while(i<10) {
if (Some condition)
i = i+2
println(i)
i+=1
}

If you dont want to use mutable variables you can try functional way for this
def loop(start: Int) {
if (some condition) {
loop(start + 2)
} else {
loop(start - 1) // whatever you want to do.
}
}
And as normal recursion function you'll need some condition to break the flow, I just wanted to give an idea of what can be done.
Hope this helps!!!

Ideally, you wouldn't use var for this. fold works pretty well with immutable values, whether it is an int, list, map...
It lets you set a default value (e.g. 0) to the variable you want to return and also iterate through the values(e.g i) changing that value (e.g accumulator) on every iteration.
val value = (1 to 10).fold(0)((loopVariable,i) => {
if(i == condition)
loopVariable+1
else
loopVariable
})
println(value)
Example

Related

Variable not updated

This is perhaps very basic but I am certainly missing something here.
It is all within a method, where I increment/alter a variable within a child/sub scope (it could be within a if block, or like here, within a map)
However, the result is unchanged variable. e.g. here, sum remains zero after the map. whereas, it should come up to 3L.
What am I missing ?
val resultsMap = scala.collection.mutable.Map.empty[String, Long]
resultsMap("0001") = 0L
resultsMap("0003") = 2L
resultsMap("0007") = 1L
var sum = 0L
resultsMap.mapValues(x => {sum = sum + x})
// I first wrote this, but then got worried and wrote more explicit version too, same behaviour
// resultMap.mapValues(sum+=_)
println("total of counts for txn ="+sum) // sum still 0
-- Update
I have similar behaviour where a loop is not updating the variable outside the loop. looking for text on variable scoping, but not found the golden source yet. all help is appreciated.
var cnt : Int = 0
rdd.foreach(article => {
if (<something>) {
println(<something>) // being printed
cnt += 1
println("counter is now "+cnt) // printed correctly
}
})
You should proceed like this:
val sum = resultsMap.values.reduce(_+_)
You just get your values and then add them up with reduce.
EDIT:
The reason sum stays unchanged is that mapValues produces a view, which means (among other things) the new map won't be computed unless the resulting view is acted upon, so in this case - the code block updating sum is simply never executed.
To see this - you can "force" the view to "materialize" (compute the new map) and see that sum is updated, as expected:
var sum = 0L
resultsMap.mapValues(x => {sum = sum + x}).view.force
println("SUM: " + sum) // prints 3
See related discussion here: Scala: Why mapValues produces a view and is there any stable alternatives?

Scala: For loop that matches ints in a List

New to Scala. I'm iterating a for loop 100 times. 10 times I want condition 'a' to be met and 90 times condition 'b'. However I want the 10 a's to occur at random.
The best way I can think is to create a val of 10 random integers, then loop through 1 to 100 ints.
For example:
val z = List.fill(10)(100).map(scala.util.Random.nextInt)
z: List[Int] = List(71, 5, 2, 9, 26, 96, 69, 26, 92, 4)
Then something like:
for (i <- 1 to 100) {
whenever i == to a number in z: 'Condition a met: do something'
else {
'condition b met: do something else'
}
}
I tried using contains and == and =! but nothing seemed to work. How else can I do this?
Your generation of random numbers could yield duplicates... is that OK? Here's how you can easily generate 10 unique numbers 1-100 (by generating a randomly shuffled sequence of 1-100 and taking first ten):
val r = scala.util.Random.shuffle(1 to 100).toList.take(10)
Now you can simply partition a range 1-100 into those who are contained in your randomly generated list and those who are not:
val (listOfA, listOfB) = (1 to 100).partition(r.contains(_))
Now do whatever you want with those two lists, e.g.:
println(listOfA.mkString(","))
println(listOfB.mkString(","))
Of course, you can always simply go through the list one by one:
(1 to 100).map {
case i if (r.contains(i)) => println("yes: " + i) // or whatever
case i => println("no: " + i)
}
What you consider to be a simple for-loop actually isn't one. It's a for-comprehension and it's a syntax sugar that de-sugares into chained calls of maps, flatMaps and filters. Yes, it can be used in the same way as you would use the classical for-loop, but this is only because List is in fact a monad. Without going into too much details, if you want to do things the idiomatic Scala way (the "functional" way), you should avoid trying to write classical iterative for loops and prefer getting a collection of your data and then mapping over its elements to perform whatever it is that you need. Note that collections have a really rich library behind them which allows you to invoke cool methods such as partition.
EDIT (for completeness):
Also, you should avoid side-effects, or at least push them as far down the road as possible. I'm talking about the second example from my answer. Let's say you really need to log that stuff (you would be using a logger, but println is good enough for this example). Doing it like this is bad. Btw note that you could use foreach instead of map in that case, because you're not collecting results, just performing the side effects.
Good way would be to compute the needed stuff by modifying each element into an appropriate string. So, calculate the needed strings and accumulate them into results:
val results = (1 to 100).map {
case i if (r.contains(i)) => ("yes: " + i) // or whatever
case i => ("no: " + i)
}
// do whatever with results, e.g. print them
Now results contains a list of a hundred "yes x" and "no x" strings, but you didn't do the ugly thing and perform logging as a side effect in the mapping process. Instead, you mapped each element of the collection into a corresponding string (note that original collection remains intact, so if (1 to 100) was stored in some value, it's still there; mapping creates a new collection) and now you can do whatever you want with it, e.g. pass it on to the logger. Yes, at some point you need to do "the ugly side effect thing" and log the stuff, but at least you will have a special part of code for doing that and you will not be mixing it into your mapping logic which checks if number is contained in the random sequence.
(1 to 100).foreach { x =>
if(z.contains(x)) {
// do something
} else {
// do something else
}
}
or you can use a partial function, like so:
(1 to 100).foreach {
case x if(z.contains(x)) => // do something
case _ => // do something else
}

What is the most efficient way to iterate a collection in parallel in Scala (TOP N pattern)

I am new to Scala and would like to build a real-time application to match some people. For a given Person, I would like to get the TOP 50 people with the highest matching score.
The idiom is as follows :
val persons = new mutable.HashSet[Person]() // Collection of people
/* Feed omitted */
val personsPar = persons.par // Make it parall
val person = ... // The given person
res = personsPar
.filter(...) // Some filters
.map{p => (p,computeMatchingScoreAsFloat(person, p))}
.toList
.sortBy(-_._2)
.take(50)
.map(t => t._1 + "=" + t._2).mkString("\n")
In the sample code above, HashSet is used, but it can be any type of collection, as I am pretty sure it is not optimal
The problem is that persons contains over 5M elements, the computeMatchingScoreAsFloat méthods computes a kind a correlation value with 2 vectors of 200 floats. This computation takes ~2s on my computer with 6 cores.
My question is, what is the fastest way of doing this TOPN pattern in Scala ?
Subquestions :
- What implementation of collection (or something else?) should I use ?
- Should I use Futures ?
NOTE: It has to be computed in parallel, the pure computation of computeMatchingScoreAsFloat alone (with no ranking/TOP N) takes more than a second, and < 200 ms if multi-threaded on my computer
EDIT: Thanks to Guillaume, compute time has been decreased from 2s to 700ms
def top[B](n:Int,t: Traversable[B])(implicit ord: Ordering[B]):collection.mutable.PriorityQueue[B] = {
val starter = collection.mutable.PriorityQueue[B]()(ord.reverse) // Need to reverse for us to capture the lowest (of the max) or the greatest (of the min)
t.foldLeft(starter)(
(myQueue,a) => {
if( myQueue.length <= n ){ myQueue.enqueue(a);myQueue}
else if( ord.compare(a,myQueue.head) < 0 ) myQueue
else{
myQueue.dequeue
myQueue.enqueue(a)
myQueue
}
}
)
}
Thanks
I would propose a few changes:
1- I believe that the filter and map steps requires traversing the collection twice. Having a lazy collection would reduce it to one. Either have a lazy collection (like Stream) or converting it to one, for instance for a list:
myList.view
2- the sort step requires sorting all elements. Instead, you can do a FoldLeft with an accumulator storing the top N records. See there for an example of an implementation:
Simplest way to get the top n elements of a Scala Iterable . I would probably test a Priority Queue instead of a list if you want maximum performance (really falling into its wheelhouse). For instance, something like this:
def IntStream(n:Int):Stream[(Int,Int)] = if(n == 0) Stream.empty else (util.Random.nextInt,util.Random.nextInt) #:: IntStream(n-1)
def top[B](n:Int,t: Traversable[B])(implicit ord: Ordering[B]):collection.mutable.PriorityQueue[B] = {
val starter = collection.mutable.PriorityQueue[B]()(ord.reverse) // Need to reverse for us to capture the lowest (of the max) or the greatest (of the min)
t.foldLeft(starter)(
(myQueue,a) => {
if( myQueue.length <= n ){ myQueue.enqueue(a);myQueue}
else if( ord.compare(a,myQueue.head) < 0 ) myQueue
else{
myQueue.dequeue
myQueue.enqueue(a)
myQueue
}
}
)
}
def diff(t2:(Int,Int)) = t2._2
top(10,IntStream(10000))(Ordering.by(diff)) // select top 10
I really think that you problem requires a SINGLE collection traverse so you be able to get your run time down to below 1 sec
Good luck!

For loop in scala without sequence?

So, while working my way through "Scala for the Impatient" I found myself wondering: Can you use a Scala for loop without a sequence?
For example, there is an exercise in the book that asks you to build a counter object that cannot be incremented past Integer.MAX_VALUE. In order to test my solution, I wrote the following code:
var c = new Counter
for( i <- 0 to Integer.MAX_VALUE ) c.increment()
This throws an error: sequences cannot contain more than Int.MaxValue elements.
It seems to me that means that Scala is first allocating and populating a sequence object, with the values 0 through Integer.MaxValue, and then doing a foreach loop on that sequence object.
I realize that I could do this instead:
var c = new Counter
while(c.value < Integer.MAX_VALUE ) c.increment()
But is there any way to do a traditional C-style for loop with the for statement?
In fact, 0 to N does not actually populate anything with integers from 0 to N. It instead creates an instance of scala.collection.immutable.Range, which applies its methods to all the integers generated on the fly.
The error you ran into is only because you have to be able to fit the number of elements (whether they actually exist or not) into the positive part of an Int in order to maintain the contract for the length method. 1 to Int.MaxValue works fine, as does 0 until Int.MaxValue. And the latter is what your while loop is doing anyway (to includes the right endpoint, until omits it).
Anyway, since the Scala for is a very different (much more generic) creature than the C for, the short answer is no, you can't do exactly the same thing. But you can probably do what you want with for (though maybe not as fast as you want, since there is some performance penalty).
Wow, some nice technical answers for a simple question (which is good!) But in case anyone is just looking for a simple answer:
//start from 0, stop at 9 inclusive
for (i <- 0 until 10){
println("Hi " + i)
}
//or start from 0, stop at 9 inclusive
for (i <- 0 to 9){
println("Hi " + i)
}
As Rex pointed out, "to" includes the right endpoint, "until" omits it.
Yes and no, it depends what you are asking for. If you're asking whether you can iterate over a sequence of integers without having to build that sequence first, then yes you can, for instance using streams:
def fromTo(from : Int, to : Int) : Stream[Int] =
if(from > to) {
Stream.empty
} else {
// println("one more.") // uncomment to see when it is called
Stream.cons(from, fromTo(from + 1, to))
}
Then:
for(i <- fromTo(0, 5)) println(i)
Writing your own iterator by defining hasNext and next is another option.
If you're asking whether you can use the 'for' syntax to write a "native" loop, i.e. a loop that works by incrementing some native integer rather than iterating over values produced by an instance of an object, then the answer is, as far as I know, no. As you may know, 'for' comprehensions are syntactic sugar for a combination of calls to flatMap, filter, map and/or foreach (all defined in the FilterMonadic trait), depending on the nesting of generators and their types. You can try to compile some loop and print its compiler intermediate representation with
scalac -Xprint:refchecks
to see how they are expanded.
There's a bunch of these out there, but I can't be bothered googling them at the moment. The following is pretty canonical:
#scala.annotation.tailrec
def loop(from: Int, until: Int)(f: Int => Unit): Unit = {
if (from < until) {
f(from)
loop(from + 1, until)(f)
}
}
loop(0, 10) { i =>
println("Hi " + i)
}

Scala vals vs vars

I'm pretty new to Scala but I like to know what is the preferred way of solving this problem. Say I have a list of items and I want to know the total amount of the items that are checks. I could do something like so:
val total = items.filter(_.itemType == CHECK).map(._amount).sum
That would give me what I need, the sum of all checks in a immutable variable. But it does it with what seems like 3 iterations. Once to filter the checks, again to map the amounts and then the sum. Another way would be to do something like:
var total = new BigDecimal(0)
for (
item <- items
if item.itemType == CHECK
) total += item.amount
This gives me the same result but with 1 iteration and a mutable variable which seems fine too. But if I wanted to to extract more information, say the total number of checks, that would require more counters or mutable variables but I wouldn't have to iterate over the list again. Doesn't seem like the "functional" way of achieving what I need.
var numOfChecks = 0
var total = new BigDecimal(0)
items.foreach { item =>
if (item.itemType == CHECK) {
numOfChecks += 1
total += item.amount
}
}
So if you find yourself needing a bunch of counters or totals on a list is it preferred to keep mutable variables or not worry about it do something along the lines of:
val checks = items.filter(_.itemType == CHECK)
val total = checks.map(_.amount).sum
return (checks.size, total)
which seems easier to read and only uses vals
Another way of solving your problem in one iteration would be to use views or iterators:
items.iterator.filter(_.itemType == CHECK).map(._amount).sum
or
items.view.filter(_.itemType == CHECK).map(._amount).sum
This way the evaluation of the expression is delayed until the call of sum.
If your items are case classes you could also write it like this:
items.iterator collect { case Item(amount, CHECK) => amount } sum
I find that speaking of doing "three iterations" is a bit misleading -- after all, each iteration does less work than a single iteration with everything. So it doesn't automatically follows that iterating three times will take longer than iterating once.
Creating temporary objects, now that is a concern, because you'll be hitting memory (even if cached), which isn't the case of the single iteration. In those cases, view will help, even though it adds more method calls to do the same work. Hopefully, JVM will optimize that away. See Moritz's answer for more information on views.
You may use foldLeft for that:
(0 /: items) ((total, item) =>
if(item.itemType == CHECK)
total + item.amount
else
total
)
The following code will return a tuple (number of checks -> sum of amounts):
((0, 0) /: items) ((total, item) =>
if(item.itemType == CHECK)
(total._1 + 1, total._2 + item.amount)
else
total
)