Scala: For loop that matches ints in a List

Scala: For loop that matches ints in a List - scala

New to Scala. I'm iterating a for loop 100 times. 10 times I want condition 'a' to be met and 90 times condition 'b'. However I want the 10 a's to occur at random.
The best way I can think is to create a val of 10 random integers, then loop through 1 to 100 ints.
For example:
val z = List.fill(10)(100).map(scala.util.Random.nextInt)
z: List[Int] = List(71, 5, 2, 9, 26, 96, 69, 26, 92, 4)
Then something like:
for (i <- 1 to 100) {
whenever i == to a number in z: 'Condition a met: do something'
else {
'condition b met: do something else'
}
}
I tried using contains and == and =! but nothing seemed to work. How else can I do this?

Your generation of random numbers could yield duplicates... is that OK? Here's how you can easily generate 10 unique numbers 1-100 (by generating a randomly shuffled sequence of 1-100 and taking first ten):
val r = scala.util.Random.shuffle(1 to 100).toList.take(10)
Now you can simply partition a range 1-100 into those who are contained in your randomly generated list and those who are not:
val (listOfA, listOfB) = (1 to 100).partition(r.contains(_))
Now do whatever you want with those two lists, e.g.:
println(listOfA.mkString(","))
println(listOfB.mkString(","))
Of course, you can always simply go through the list one by one:
(1 to 100).map {
case i if (r.contains(i)) => println("yes: " + i) // or whatever
case i => println("no: " + i)
}
What you consider to be a simple for-loop actually isn't one. It's a for-comprehension and it's a syntax sugar that de-sugares into chained calls of maps, flatMaps and filters. Yes, it can be used in the same way as you would use the classical for-loop, but this is only because List is in fact a monad. Without going into too much details, if you want to do things the idiomatic Scala way (the "functional" way), you should avoid trying to write classical iterative for loops and prefer getting a collection of your data and then mapping over its elements to perform whatever it is that you need. Note that collections have a really rich library behind them which allows you to invoke cool methods such as partition.
EDIT (for completeness):
Also, you should avoid side-effects, or at least push them as far down the road as possible. I'm talking about the second example from my answer. Let's say you really need to log that stuff (you would be using a logger, but println is good enough for this example). Doing it like this is bad. Btw note that you could use foreach instead of map in that case, because you're not collecting results, just performing the side effects.
Good way would be to compute the needed stuff by modifying each element into an appropriate string. So, calculate the needed strings and accumulate them into results:
val results = (1 to 100).map {
case i if (r.contains(i)) => ("yes: " + i) // or whatever
case i => ("no: " + i)
}
// do whatever with results, e.g. print them
Now results contains a list of a hundred "yes x" and "no x" strings, but you didn't do the ugly thing and perform logging as a side effect in the mapping process. Instead, you mapped each element of the collection into a corresponding string (note that original collection remains intact, so if (1 to 100) was stored in some value, it's still there; mapping creates a new collection) and now you can do whatever you want with it, e.g. pass it on to the logger. Yes, at some point you need to do "the ugly side effect thing" and log the stuff, but at least you will have a special part of code for doing that and you will not be mixing it into your mapping logic which checks if number is contained in the random sequence.

(1 to 100).foreach { x =>
if(z.contains(x)) {
// do something
} else {
// do something else
}
}
or you can use a partial function, like so:
(1 to 100).foreach {
case x if(z.contains(x)) => // do something
case _ => // do something else
}

Related

Optimal Way to Achieve Traditional Loop Based Tasks in Scala

I am new to Scala and working on implementing an algorithm. In C#, this would have been a much easier task with necessary loops, but it is a bit confusing to implement with Scala functional programming semantics.
Assume I have to fill a spreadsheet (S) with N rows and M cols with values that I have in a one-dimensional list (L).
While filing an individual cell in the spreadsheet, there is a back and forth logic involved.
2a. The system will walk through the items in L sequentially and will fill the same in next empty cell in sheet S
2b. While filling the item value of the currently processed item from L in a cell, the system will check, can the current cell accept the item value. If yes, it will fill, and move on to the next item and follow Step 2a. If not, it will see if it could fill the next item from L. Until it finds a value that could fit in, the system will continue to evaluate till it runs out of values and will leave it blank.
2c. The system after filling the cell in Step 2b will move to the next cell. Now, it will first check whether any of the unprocessed values from the previous step (2b) could be accepted by the currently processed cell. If yes, it will fill the same and continue to do work with unprocessed values. If it cannot find an unprocessed value that could fit in, it will pull the next item from L based on the position of the pointer on Step 2b.
It would be great if I could get ideas of how-to structure this with Scala. As I mentioned earlier, in C# this would have been easy with foreach loops, but I am not sure what is the most optimal way to do this in a functional programming construct.

You can remember that imperative:
for (init; condition; afterEach) {
instructions
}
is just a syntactic sugar for:
init
while (condition) {
instructions
afterEach
}
(at least until you use break or continue). So if you are able to rewrite your for-loop code into while-loop code the translation is pretty straightforward.
If you are not interested in such solution you could do something like
val indices = for {
i <- (0 until n).toStream // or .to(LazyList) if on 2.13
j <- (0 until m).toStream // or .to(LazyList) if on 2.13
} yield i -> j
indices.foldLeft(allItemsToInsert) { case (itemsLeft, (i, j)) =>
itemsLeft.find(item => /* predicate if item can be inserted at (i, j) */) match {
case Some(item) =>
// insert item to spreadsheet
items diff List(1) // remove found element - use other data structure if you find this too costly
case None =>
items // nothing could be inserted, move on
}
}
This would go through all indices one after another, and then try to find the first element which can be inserted. If it does it would insert it and take it off the list, if it cannot be inserted move on.
You can tweak the logic to e.g. partition on items that can be inserted if there could be more than one:
indices.foldLeft(allItemsToInsert) { case (itemsLeft, (i, j)) =>
val (insertable, nonInsertable) = itemsLeft.partition(item => /* predicate if item can be inserted */)
// insert insertable
nonInsertable // pass non-insertable for the next indice
}
Alternatively you could also use tail recursion if you really need to go back and forth:
#scala.annotation.tailrec
def insertValues(items: List[Item], i: Int, j: Int): Unit = {
if (items.nonEmpty) {
// insert what you can into spreadsheet
val itemsLeft = ... // items that you haven't inserted
val newI, newJ = ...
insertValues(itemsLeft, newI, newJ)
}
}

Scala filter by set

Say I have a map that looks like this
val map = Map("Shoes" -> 1, "heels" -> 2, "sneakers" -> 3, "dress" -> 4, "jeans" -> 5, "boyfriend jeans" -> 6)
And also I have a set or collection that looks like this:
val set = Array(Array("Shoes", "heels", "sneakers"), Array("dress", "maxi dress"), Array("jeans", "boyfriend jeans", "destroyed jeans"))
I would like to perform a filter operation on my map so that only one element in each of my set retains. Expected output should be something like this:
map = Map("Shoes" -> 1, "dress" -> 4 ,"jeans" -> 5)
The purpose of doing this is so that if I have multiple sets that indicate different categories of outfits, my output map doesn't "repeat" itself on technically the same objects.
Any help is appreciated, thanks!

So first get rid of the confusion that your sets are actually arrays. For the rest of the example I will use this definition instead:
val arrays = Array(Array("Shoes", "heels", "sneakers"), Array("dress", "maxi dress"), Array("jeans", "boyfriend jeans", "destroyed jeans"))
So in a sense you have an array of arrays of equivalent objects and want to remove all but one of them?
Well first you have to find which of the elements in an array are actually used as keys in the mep. So we just filter out all elements that are not used as keys:
array.filter(map.keySet)
Now, we have to chose one element. As you said, we just take the first one:
array.filter(map.keySet).head
As your "sets" are actually arrays, this is really the first element in your array that is also used as a key. If you would actually use sets this code would still work as sets actually have a "first element". It is just highly implementations specific and it might not even be deterministic over various executions of the same program. At least for immutable sets it should however be deterministic over several calls to head, i.e., you should always get the same element.
Instead of the first element we are actually interested in all other elements, as we want to remove them from the map:
array.filter(map.keySet).tail
Now, we just have to remove those from the map:
map -- array.filter(map.keySet).tail
And to do it for all arrays:
map -- arrays.flatMap(_.filter(map.keySet).tail)
This works fine as long as the arrays are disjoined. If they are not, we can not take the initial map to filter the array in every step. Instead, we have to use one array to compute a new map, then take the next starting with the result from the last and so on. Luckily, we do not have to do much:
arrays.foldLeft(map){(m,a) => m -- a.filter(m.keySet).tail}
Note: Sets are also functions from elements to Boolean, this is, why this solution works.

This code solves the problem:
var newMap = map
set.foreach { list =>
var remove = false
list.foreach { _key =>
if (remove) {
newMap -= _key
}
if (newMap.contains(_key)) {
remove = true
}
}
}
I'm completely new at Scala. I have taken this as my first Scala
example, please any hints from Scala's Gurus is welcome.

The basic idea is to use groupBy. Something like
map.groupBy{ case (k,v) => g(k) }.
map{ case (_, kvs) => kvs.head }
This is the general way to group similar things (using some function g). Now the question is just how to make the g that you need. One way is
val g = set.zipWithIndex.
flatMap{ case (a, i) => a.map(x => x -> i) }.
toMap
which labels each set with a number, and then forms a map so you can look it up. Maps have an apply function, so you can use it as above.

A slightly simpler version
set.flatMap(_.find(map.contains).map(y => y -> map(y)))

Calculate sums of even/odd pairs on Hadoop?

I want to create a parallel scanLeft(computes prefix sums for an associative operator) function for Hadoop (scalding in particular; see below for how this is done).
Given a sequence of numbers in a hdfs file (one per line) I want to calculate a new sequence with the sums of consecutive even/odd pairs. For example:
input sequence:
0,1,2,3,4,5,6,7,8,9,10
output sequence:
0+1, 2+3, 4+5, 6+7, 8+9, 10
i.e.
1,5,9,13,17,10
I think in order to do this, I need to write an InputFormat and InputSplits classes for Hadoop, but I don't know how to do this.
See this section 3.3 here. Below is an example algorithm in Scala:
// for simplicity assume input length is a power of 2
def scanadd(input : IndexedSeq[Int]) : IndexedSeq[Int] =
if (input.length == 1)
input
else {
//calculate a new collapsed sequence which is the sum of sequential even/odd pairs
val collapsed = IndexedSeq.tabulate(input.length/2)(i => input(2 * i) + input(2*i+1))
//recursively scan collapsed values
val scancollapse = scanadd(collapse)
//now we can use the scan of the collapsed seq to calculate the full sequence
val output = IndexedSeq.tabulate(input.length)(
i => i.evenOdd match {
//if an index is even then we can just look into the collapsed sequence and get the value
// otherwise we can look just before it and add the value at the current index
case Even => scancollapse(i/2)
case Odd => scancollapse((i-1)/2) + input(i)
}
output
}
I understand that this might need a fair bit of optimization for it to work nicely with Hadoop. Translating this directly I think would lead to pretty inefficient Hadoop code. For example, Obviously in Hadoop you can't use an IndexedSeq. I would appreciate any specific problems you see. I think it can probably be made to work well, though.

Superfluous. You meant this code?
val vv = (0 to 1000000).grouped(2).toVector
vv.par.foldLeft((0L, 0L, false))((a, v) =>
if (a._3) (a._1, a._2 + v.sum, !a._3) else (a._1 + v.sum, a._2, !a._3))

This was the best tutorial I found for writing an InputFormat and RecordReader. I ended up reading the whole split as one ArrayWritable record.

For loop in scala without sequence?

So, while working my way through "Scala for the Impatient" I found myself wondering: Can you use a Scala for loop without a sequence?
For example, there is an exercise in the book that asks you to build a counter object that cannot be incremented past Integer.MAX_VALUE. In order to test my solution, I wrote the following code:
var c = new Counter
for( i <- 0 to Integer.MAX_VALUE ) c.increment()
This throws an error: sequences cannot contain more than Int.MaxValue elements.
It seems to me that means that Scala is first allocating and populating a sequence object, with the values 0 through Integer.MaxValue, and then doing a foreach loop on that sequence object.
I realize that I could do this instead:
var c = new Counter
while(c.value < Integer.MAX_VALUE ) c.increment()
But is there any way to do a traditional C-style for loop with the for statement?

In fact, 0 to N does not actually populate anything with integers from 0 to N. It instead creates an instance of scala.collection.immutable.Range, which applies its methods to all the integers generated on the fly.
The error you ran into is only because you have to be able to fit the number of elements (whether they actually exist or not) into the positive part of an Int in order to maintain the contract for the length method. 1 to Int.MaxValue works fine, as does 0 until Int.MaxValue. And the latter is what your while loop is doing anyway (to includes the right endpoint, until omits it).
Anyway, since the Scala for is a very different (much more generic) creature than the C for, the short answer is no, you can't do exactly the same thing. But you can probably do what you want with for (though maybe not as fast as you want, since there is some performance penalty).

Wow, some nice technical answers for a simple question (which is good!) But in case anyone is just looking for a simple answer:
//start from 0, stop at 9 inclusive
for (i <- 0 until 10){
println("Hi " + i)
}
//or start from 0, stop at 9 inclusive
for (i <- 0 to 9){
println("Hi " + i)
}
As Rex pointed out, "to" includes the right endpoint, "until" omits it.

Yes and no, it depends what you are asking for. If you're asking whether you can iterate over a sequence of integers without having to build that sequence first, then yes you can, for instance using streams:
def fromTo(from : Int, to : Int) : Stream[Int] =
if(from > to) {
Stream.empty
} else {
// println("one more.") // uncomment to see when it is called
Stream.cons(from, fromTo(from + 1, to))
}
Then:
for(i <- fromTo(0, 5)) println(i)
Writing your own iterator by defining hasNext and next is another option.
If you're asking whether you can use the 'for' syntax to write a "native" loop, i.e. a loop that works by incrementing some native integer rather than iterating over values produced by an instance of an object, then the answer is, as far as I know, no. As you may know, 'for' comprehensions are syntactic sugar for a combination of calls to flatMap, filter, map and/or foreach (all defined in the FilterMonadic trait), depending on the nesting of generators and their types. You can try to compile some loop and print its compiler intermediate representation with
scalac -Xprint:refchecks
to see how they are expanded.

There's a bunch of these out there, but I can't be bothered googling them at the moment. The following is pretty canonical:
#scala.annotation.tailrec
def loop(from: Int, until: Int)(f: Int => Unit): Unit = {
if (from < until) {
f(from)
loop(from + 1, until)(f)
}
}
loop(0, 10) { i =>
println("Hi " + i)
}

Scala vals vs vars

I'm pretty new to Scala but I like to know what is the preferred way of solving this problem. Say I have a list of items and I want to know the total amount of the items that are checks. I could do something like so:
val total = items.filter(_.itemType == CHECK).map(._amount).sum
That would give me what I need, the sum of all checks in a immutable variable. But it does it with what seems like 3 iterations. Once to filter the checks, again to map the amounts and then the sum. Another way would be to do something like:
var total = new BigDecimal(0)
for (
item <- items
if item.itemType == CHECK
) total += item.amount
This gives me the same result but with 1 iteration and a mutable variable which seems fine too. But if I wanted to to extract more information, say the total number of checks, that would require more counters or mutable variables but I wouldn't have to iterate over the list again. Doesn't seem like the "functional" way of achieving what I need.
var numOfChecks = 0
var total = new BigDecimal(0)
items.foreach { item =>
if (item.itemType == CHECK) {
numOfChecks += 1
total += item.amount
}
}
So if you find yourself needing a bunch of counters or totals on a list is it preferred to keep mutable variables or not worry about it do something along the lines of:
val checks = items.filter(_.itemType == CHECK)
val total = checks.map(_.amount).sum
return (checks.size, total)
which seems easier to read and only uses vals

Another way of solving your problem in one iteration would be to use views or iterators:
items.iterator.filter(_.itemType == CHECK).map(._amount).sum
or
items.view.filter(_.itemType == CHECK).map(._amount).sum
This way the evaluation of the expression is delayed until the call of sum.
If your items are case classes you could also write it like this:
items.iterator collect { case Item(amount, CHECK) => amount } sum

I find that speaking of doing "three iterations" is a bit misleading -- after all, each iteration does less work than a single iteration with everything. So it doesn't automatically follows that iterating three times will take longer than iterating once.
Creating temporary objects, now that is a concern, because you'll be hitting memory (even if cached), which isn't the case of the single iteration. In those cases, view will help, even though it adds more method calls to do the same work. Hopefully, JVM will optimize that away. See Moritz's answer for more information on views.

You may use foldLeft for that:
(0 /: items) ((total, item) =>
if(item.itemType == CHECK)
total + item.amount
else
total
)
The following code will return a tuple (number of checks -> sum of amounts):
((0, 0) /: items) ((total, item) =>
if(item.itemType == CHECK)
(total._1 + 1, total._2 + item.amount)
else
total
)