Scala filter by set - scala

Say I have a map that looks like this
val map = Map("Shoes" -> 1, "heels" -> 2, "sneakers" -> 3, "dress" -> 4, "jeans" -> 5, "boyfriend jeans" -> 6)
And also I have a set or collection that looks like this:
val set = Array(Array("Shoes", "heels", "sneakers"), Array("dress", "maxi dress"), Array("jeans", "boyfriend jeans", "destroyed jeans"))
I would like to perform a filter operation on my map so that only one element in each of my set retains. Expected output should be something like this:
map = Map("Shoes" -> 1, "dress" -> 4 ,"jeans" -> 5)
The purpose of doing this is so that if I have multiple sets that indicate different categories of outfits, my output map doesn't "repeat" itself on technically the same objects.
Any help is appreciated, thanks!

So first get rid of the confusion that your sets are actually arrays. For the rest of the example I will use this definition instead:
val arrays = Array(Array("Shoes", "heels", "sneakers"), Array("dress", "maxi dress"), Array("jeans", "boyfriend jeans", "destroyed jeans"))
So in a sense you have an array of arrays of equivalent objects and want to remove all but one of them?
Well first you have to find which of the elements in an array are actually used as keys in the mep. So we just filter out all elements that are not used as keys:
array.filter(map.keySet)
Now, we have to chose one element. As you said, we just take the first one:
array.filter(map.keySet).head
As your "sets" are actually arrays, this is really the first element in your array that is also used as a key. If you would actually use sets this code would still work as sets actually have a "first element". It is just highly implementations specific and it might not even be deterministic over various executions of the same program. At least for immutable sets it should however be deterministic over several calls to head, i.e., you should always get the same element.
Instead of the first element we are actually interested in all other elements, as we want to remove them from the map:
array.filter(map.keySet).tail
Now, we just have to remove those from the map:
map -- array.filter(map.keySet).tail
And to do it for all arrays:
map -- arrays.flatMap(_.filter(map.keySet).tail)
This works fine as long as the arrays are disjoined. If they are not, we can not take the initial map to filter the array in every step. Instead, we have to use one array to compute a new map, then take the next starting with the result from the last and so on. Luckily, we do not have to do much:
arrays.foldLeft(map){(m,a) => m -- a.filter(m.keySet).tail}
Note: Sets are also functions from elements to Boolean, this is, why this solution works.

This code solves the problem:
var newMap = map
set.foreach { list =>
var remove = false
list.foreach { _key =>
if (remove) {
newMap -= _key
}
if (newMap.contains(_key)) {
remove = true
}
}
}
I'm completely new at Scala. I have taken this as my first Scala
example, please any hints from Scala's Gurus is welcome.

The basic idea is to use groupBy. Something like
map.groupBy{ case (k,v) => g(k) }.
map{ case (_, kvs) => kvs.head }
This is the general way to group similar things (using some function g). Now the question is just how to make the g that you need. One way is
val g = set.zipWithIndex.
flatMap{ case (a, i) => a.map(x => x -> i) }.
toMap
which labels each set with a number, and then forms a map so you can look it up. Maps have an apply function, so you can use it as above.

A slightly simpler version
set.flatMap(_.find(map.contains).map(y => y -> map(y)))

Related

Processing of two dimension list in scala

I want process to two dimension list in scala: eg:
Input:
List(
List(‘John’, ‘Will’, ’Steven’),
List(25,28,34),
List(‘M’,’M’,’M’)
)
O/P:
John|25|M
Will|28|M
Steven|34|M
You need to work with indexes, and while working with indexes List is not probably the wisest choice, and also your input is not a well-structured two dimensional list, I would suggest you to have a dedicated data structure for this:
// you can name this better
case class Input(names: Array[String], ages: Array[Int], genders: Array[Char])
val input = Input(Array("John", "Will", "Steven"), Array(25, 28, 34), Array('M', 'M', 'M'))
By essence, this question is meant to use indexes with, so this would be a solution using indexes:
input.names.zipWithIndex.map {
case (name, index) =>
(name, inp.ages(index), inp.genders(index))
}
// or use a for instead
But always try to keep an eye on IndexOutOfBoundsException while working with array indexes. A safer and more "Scala-ish" approach would be zipping, which also take care of arrays with non-equal sizes:
input.names zip inp.ages zip inp.genders map { // flattening tuples
case ((name, age), gender) => (name, age, gender)
}
But this is a little bit slower (one iteration per zipping, and an iteration in mapping stage, I don't know if there's an optimization for this).
And in case you didn't want to go on with this kind of modeling, the algorithm would stay just the same, except for explicit unsafe conversions you will need to do from Any to String and Int and Char.
I got the expected o/p from for loop
for(i<-0 to 2)
{
println()
for(j<-0 to 2)
{
print(twodlist(j)(i)+"|")
}
}
Note: In loop we can use list length instead of 2 to make it more generic.

Iterating through Seq[row] till a particular condition is met using Scala

I need to iterate a scala Seq of Row type until a particular condition is met. i dont need to process further post the condition.
I have a seq[Row] r->WrappedArray([1/1/2020,abc,1],[1/2/2020,pqr,1],[1/3/2020,stu,0],[1/4/2020,opq,1],[1/6/2020,lmn,0])
I want to iterate through this collection for r.getInt(2) until i encounter 0. As soon as i encounter 0, i need to break the iteration and collect r.getString(1) till then. I dont need to look into any other data post that.
My output should be: Array(abc,pqr,stu)
I am new to scala programming. This seq was actually a Dataframe. I know how to handle this using Spark dataframes, but due to some restriction put forth by my organization, windows function, createDataFrame function are not available/working in our environment. Hence i have resort to Scala programming to achieve the same.
All I could come up was something like below, but not really working!
breakable{
for(i <- r)
var temp = i.getInt(3)===0
if(temp ==true)
{
val = i.getInt(2)
break()
}
}
Can someone please help me here!
You can use the takeWhile method to grab the elements while it's value is 1
s.takeWhile(_.getInt(2) == 1).map(_.getString(1))
Than will give you
List(abc, pqr)
So you still need to get the first element where the int values 0 which you can do as follows:
s.find(_.getInt(2)== 0).map(_.getString(1)).get
Putting all together (and handle possible nil values):
s.takeWhile(_.getInt(2) == 1).map(_.getString(1)) ++ s.find(_.getInt(2)== 0).map(r => List(r.getString(1))).getOrElse(Nil)
Result:
Seq[String] = List(abc, pqr, stu)

Scala: For loop that matches ints in a List

New to Scala. I'm iterating a for loop 100 times. 10 times I want condition 'a' to be met and 90 times condition 'b'. However I want the 10 a's to occur at random.
The best way I can think is to create a val of 10 random integers, then loop through 1 to 100 ints.
For example:
val z = List.fill(10)(100).map(scala.util.Random.nextInt)
z: List[Int] = List(71, 5, 2, 9, 26, 96, 69, 26, 92, 4)
Then something like:
for (i <- 1 to 100) {
whenever i == to a number in z: 'Condition a met: do something'
else {
'condition b met: do something else'
}
}
I tried using contains and == and =! but nothing seemed to work. How else can I do this?
Your generation of random numbers could yield duplicates... is that OK? Here's how you can easily generate 10 unique numbers 1-100 (by generating a randomly shuffled sequence of 1-100 and taking first ten):
val r = scala.util.Random.shuffle(1 to 100).toList.take(10)
Now you can simply partition a range 1-100 into those who are contained in your randomly generated list and those who are not:
val (listOfA, listOfB) = (1 to 100).partition(r.contains(_))
Now do whatever you want with those two lists, e.g.:
println(listOfA.mkString(","))
println(listOfB.mkString(","))
Of course, you can always simply go through the list one by one:
(1 to 100).map {
case i if (r.contains(i)) => println("yes: " + i) // or whatever
case i => println("no: " + i)
}
What you consider to be a simple for-loop actually isn't one. It's a for-comprehension and it's a syntax sugar that de-sugares into chained calls of maps, flatMaps and filters. Yes, it can be used in the same way as you would use the classical for-loop, but this is only because List is in fact a monad. Without going into too much details, if you want to do things the idiomatic Scala way (the "functional" way), you should avoid trying to write classical iterative for loops and prefer getting a collection of your data and then mapping over its elements to perform whatever it is that you need. Note that collections have a really rich library behind them which allows you to invoke cool methods such as partition.
EDIT (for completeness):
Also, you should avoid side-effects, or at least push them as far down the road as possible. I'm talking about the second example from my answer. Let's say you really need to log that stuff (you would be using a logger, but println is good enough for this example). Doing it like this is bad. Btw note that you could use foreach instead of map in that case, because you're not collecting results, just performing the side effects.
Good way would be to compute the needed stuff by modifying each element into an appropriate string. So, calculate the needed strings and accumulate them into results:
val results = (1 to 100).map {
case i if (r.contains(i)) => ("yes: " + i) // or whatever
case i => ("no: " + i)
}
// do whatever with results, e.g. print them
Now results contains a list of a hundred "yes x" and "no x" strings, but you didn't do the ugly thing and perform logging as a side effect in the mapping process. Instead, you mapped each element of the collection into a corresponding string (note that original collection remains intact, so if (1 to 100) was stored in some value, it's still there; mapping creates a new collection) and now you can do whatever you want with it, e.g. pass it on to the logger. Yes, at some point you need to do "the ugly side effect thing" and log the stuff, but at least you will have a special part of code for doing that and you will not be mixing it into your mapping logic which checks if number is contained in the random sequence.
(1 to 100).foreach { x =>
if(z.contains(x)) {
// do something
} else {
// do something else
}
}
or you can use a partial function, like so:
(1 to 100).foreach {
case x if(z.contains(x)) => // do something
case _ => // do something else
}

Calculate sums of even/odd pairs on Hadoop?

I want to create a parallel scanLeft(computes prefix sums for an associative operator) function for Hadoop (scalding in particular; see below for how this is done).
Given a sequence of numbers in a hdfs file (one per line) I want to calculate a new sequence with the sums of consecutive even/odd pairs. For example:
input sequence:
0,1,2,3,4,5,6,7,8,9,10
output sequence:
0+1, 2+3, 4+5, 6+7, 8+9, 10
i.e.
1,5,9,13,17,10
I think in order to do this, I need to write an InputFormat and InputSplits classes for Hadoop, but I don't know how to do this.
See this section 3.3 here. Below is an example algorithm in Scala:
// for simplicity assume input length is a power of 2
def scanadd(input : IndexedSeq[Int]) : IndexedSeq[Int] =
if (input.length == 1)
input
else {
//calculate a new collapsed sequence which is the sum of sequential even/odd pairs
val collapsed = IndexedSeq.tabulate(input.length/2)(i => input(2 * i) + input(2*i+1))
//recursively scan collapsed values
val scancollapse = scanadd(collapse)
//now we can use the scan of the collapsed seq to calculate the full sequence
val output = IndexedSeq.tabulate(input.length)(
i => i.evenOdd match {
//if an index is even then we can just look into the collapsed sequence and get the value
// otherwise we can look just before it and add the value at the current index
case Even => scancollapse(i/2)
case Odd => scancollapse((i-1)/2) + input(i)
}
output
}
I understand that this might need a fair bit of optimization for it to work nicely with Hadoop. Translating this directly I think would lead to pretty inefficient Hadoop code. For example, Obviously in Hadoop you can't use an IndexedSeq. I would appreciate any specific problems you see. I think it can probably be made to work well, though.
Superfluous. You meant this code?
val vv = (0 to 1000000).grouped(2).toVector
vv.par.foldLeft((0L, 0L, false))((a, v) =>
if (a._3) (a._1, a._2 + v.sum, !a._3) else (a._1 + v.sum, a._2, !a._3))
This was the best tutorial I found for writing an InputFormat and RecordReader. I ended up reading the whole split as one ArrayWritable record.

How to generate a list of random numbers?

This might be the least important Scala question ever, but it's bothering me. How would I generate a list of n random number. What I have so far:
def n_rands(n : Int) = {
val r = new scala.util.Random
1 to n map { _ => r.nextInt(100) }
}
Which works, but doesn't look very Scalarific to me. I'm open to suggestions.
EDIT
Not because it's relevant so much as it's amusing and obvious in retrospect, the following looks like it works:
1 to 20 map r.nextInt
But the index of each entry in the returned list is also the upper bound of that last. The first number must be less than 1, the second less than 2, and so on. I ran it three or four times and noticed "Hmmm, the result always starts with 0..."
You can either use Don's solution or:
Seq.fill(n)(Random.nextInt)
Note that you don't need to create a new Random object, you can use the default companion object Random, as stated above.
How about:
import util.Random.nextInt
Stream.continually(nextInt(100)).take(10)
regarding your EDIT,
nextInt can take an Int argument as an upper bound for the random number, so 1 to 20 map r.nextInt is the same as 1 to 20 map (i => r.nextInt(i)), rather than a more useful compilation error.
1 to 20 map (_ => r.nextInt(100)) does what you intended. But it's better to use Seq.fill since that more accurately represents what you're doing.