How can I accumulate my totals in a more functional manner? - scala

At the moment I have
val orders = new HashMap[Int, Int]
orders.put(36, 110)
orders.put(35, 90)
orders.put(34, 80)
orders.put(33, 60)
I would like to keep a running so that the end mapping appears as follows
36 -> 110
35 -> 200
34 -> 280
33 -> 340
At the moment I do this imperatively as follows
val keys = orders.keys.toList.sortBy(x => -x)
val accum = new HashMap[Int, Int]
accum.put(keys.head, orders(keys.head))
for (i <- 1 to keys.length - 1) {
accum.put(keys(i), orders(keys(i)) + accum(keys(i-1)))
}
accum.foreach {
x => println(x._1, x._2)
}
Is there a more functional way of doing this using mapping, folding etc? I would be able to do it with a straight List but the can't quite wrap my head around how to do this with HashMap
Edit: Ordering is important. The left column (36, 35, 34, 33) needs to be in descending order

Since HashMaps aren't sorted, it's not so simple to do this directly, so convert to an ordered sequence first:
val elems = orders.toSeq.sortBy(-_._1)
.scanLeft(0,0)((x, y) => (y._1, x._2 + y._2)).tail
// ArrayBuffer((36,110), (35,200), (34,280), (33,340))
If you actually want to stick these in an ordered map with reverse ordering, rather than just print them out, you could do this:
val accum = collection.SortedMap(elems: _*)(
new Ordering[Int] { def compare(x: Int, y: Int) = y compare x })
// SortedMap[Int,Int] = Map(36 -> 110, 35 -> 200, 34 -> 280, 33 -> 340)

This should work:
val orders = new HashMap[Int, Int]
orders.put(36, 110)
orders.put(35, 90)
orders.put(34, 80)
orders.put(33, 60)
val accum = new HashMap[Int, Int]
orders.toList.sortBy(-_._1).foldLeft(0){
case (sum, (k, v)) => {
accum.put(k, sum + v)
sum + v
}
}

For the record, here is a solution using the inits method:
import scala.collection.mutable._
// use a LinkedHashMap to keep the order
val orders = new LinkedHashMap[Int, Int]
orders.put(36, 110)
orders.put(35, 90)
orders.put(34, 80)
orders.put(33, 60)
// create a list of init sequences with no empty element
orders.toSeq.inits.toList.dropRight(1).
// > this returns
// ArrayBuffer((36,110), (35,90), (34,80), (33,60))
// ArrayBuffer((36,110), (35,90), (34,80))
// ArrayBuffer((36,110), (35,90))
// ArrayBuffer((36,110))
// now take the last key of each sequence and sum the values of the sequence
map(init => (init.last._1, init.map(_._2).sum)).reverse.toMap.mkString("\n")
36 -> 110
35 -> 200
34 -> 280
33 -> 340

I think you are doing it wrong. Don't create a Map directly: create a sequence. In this case, a ListBuffer is probably the most appropriate, so that you can easily append elements to it. It also supports constant time toList, though that shouldn't matter here.
If you must use a functional approach, you can either prepend to a List and reverse it, or go the way of iteratees. I'm not comfortable enough with the latter to explain them, though.
Once you have your collection, you'll scanLeft it. Or, if you built a List, you could scanRight it instead of having to reverse it. After that, it is a simple matter of calling toMap on the result.
Roughly speaking:
var accum: List[(Int, Int)] = Nil
accum ::= 36 -> 110
accum ::= 35 -> 90
accum ::= 34 -> 80
accum ::= 33 -> 60
val orders = accum.scanRight(0 -> 0) {
case ((k, v), (_, acc)) => (k, v + acc)
}.init.toMap
The init drops the seed. I could have avoided having to do that using tail and head, but that would require a check to see if accum is empty.
The var can be removed using either iteratees, or, perhaps, using the state monad at a higher level.

var sum = 0
orders.toList.sortBy (-_._1).map (o =>
{sum += o._2; (o._1 -> sum) }).toMap
Not very elegant, since it uses a var.

Related

Don't return same type from filter for efficiency

Let's say I have a Map in scala.
Map.filter returns a Map.
That means that it has to create a Map containing all the remaining items after the filter.
Since creating a map is not cheap in general (approximately O(nlog(n))), this is wasteful if all I want to do is iterate over the filtered results.
For example:
val map = Map(1 -> "hello", 50 -> "world", 100 -> "hi", 1000 -> "bye")
val filtered = map.filter(x => x._1 < 100)
for(x <- filtered) println(x._2)
I don't think using map.toIterable helps since the underlying is still a Map, and filter is virtual.
I don't know whether map.view has the required behavior or not.
I think map.iterator would work, but that means I can't iterate over the iterator twice. I suppose I could use map.iterator.filter(x => x._1 < 100).toList?
I could do map.map(x => (x)), but that means iterating over the Map twice.
What's the simplest, most idiomatic, not unnecessarily inefficient way of doing what I want?
Note that if all you want to do is iterate in a for-comprehension or similar (i.e. flatMap, foreach, map), an intermediate collection isn't created:
for (x <- map if (x._1 < 100)) println(x._2) // Doesn't create an intermediate Map
This desugars to
map.withFilter(x => x._1 < 100).foreach(x => println(x))
and withFilter is non-strict.
Use collect.
val map = Map(1 -> "hello", 50 -> "world", 100 -> "hi", 1000 -> "bye")
val filtered : Iterable[String] = map.collect{
case(x,y) if x<100 => y
}
Gives you only the values for which the key satisfies the condition

Getting the mode from an RDD

I would like to get the mode (the most common number) from an rdd using Spark + Scala.
I can get it doing the following but I think it could be a better way to calculate this. The most important thing is if more than one value has the same number of repetition, I need to return both of them.
Let's see my example code:
val l = List(3,4,4,3,3,7,7,7,9)
val rdd = spark.sparkContext.parallelize(l)
val grouped = rdd.map (e => (e, 1)).groupBy(_._1).map(e=> (e._1, e._2.size))
val maxRep = grouped.collect().maxBy(_._2)._2
val mode = grouped.filter(e => e._2 == maxRep).map(e => e._1).collect
And the result is right:
Array[Int] = Array(3, 7)
but is there a better way to do this? I mean considering the performance because the original RDD would be much bigger than this.
This should work and be a little bit more efficient.
(only if you are sure the total number of elements is small)
val counted = rdd.countByValue()
val max = counted.valuesIterator.max
val maxElements = count.collect { case (k, v) if (v == max) => k }
If there could be many elements, consider this alternative which is memory safe.
val counted = rdd.map(x => (x, 1L)).reduceByKey(_ + _).cache()
val max = counted.values.max
val maxElements = counted.map { case (k, v) => (v, k) }.lookup(max)
How about get the max key-value pair from a double groupBy? This works even better for bigger data size.
rdd.groupBy(identity).mapValues(_.size).groupBy(_._2).max
// res1: (Int, Iterable[(Int, Int)]) = (3,CompactBuffer((3,3), (7,3)))
To get the element
rdd.groupBy(identity).mapValues(_.size).groupBy(_._2).max._2.map(_._1)
// res4: Iterable[Int] = List(3, 7)
The first groupBy will get element into (element -> count) with type Map[Int, Long], the second groupBy will group (element -> count) by count, like (count -> Iterable((element, count)), then simply max to get the key-value pair with the maximum key value, which is the count.

How to pair each element of a Seq with the rest?

I'm looking for an elegant way to combine every element of a Seq with the rest for a large collection.
Example: Seq(1,2,3).someMethod should produce something like
Iterator(
(1,Seq(2,3)),
(2,Seq(1,3)),
(3,Seq(1,2))
)
Order of elements doesn't matter. It doesn't have to be a tuple, a Seq(Seq(1),Seq(2,3)) is also acceptable (although kinda ugly).
Note the emphasis on large collection (which is why my example shows an Iterator).
Also note that this is not combinations.
Ideas?
Edit:
In my use case, the numbers are expected to be unique. If a solution can eliminate the dupes, that's fine, but not at additional cost. Otherwise, dupes are acceptable.
Edit 2: In the end, I went with a nested for-loop, and skipped the case when i == j. No new collections were created. I upvoted the solutions that were correct and simple ("simplicity is the ultimate sophistication" - Leonardo da Vinci), but even the best ones are quadratic just by the nature of the problem, and some create intermediate collections by usage of ++ that I wanted to avoid because the collection I'm dealing with has close to 50000 elements, 2.5 billion when quadratic.
The following code has constant runtime (it does everything lazily), but accessing every element of the resulting collections has constant overhead (when accessing each element, an index shift must be computed every time):
def faceMap(i: Int)(j: Int) = if (j < i) j else j + 1
def facets[A](simplex: Vector[A]): Seq[(A, Seq[A])] = {
val n = simplex.size
(0 until n).view.map { i => (
simplex(i),
(0 until n - 1).view.map(j => simplex(faceMap(i)(j)))
)}
}
Example:
println("Example: facets of a 3-dimensional simplex")
for ((i, v) <- facets((0 to 3).toVector)) {
println(i + " -> " + v.mkString("[", ",", "]"))
}
Output:
Example: facets of a 3-dimensional simplex
0 -> [1,2,3]
1 -> [0,2,3]
2 -> [0,1,3]
3 -> [0,1,2]
This code expresses everything in terms of simplices, because "omitting one index" corresponds exactly to the face maps for a combinatorially described simplex. To further illustrate the idea, here is what the faceMap does:
println("Example: how `faceMap(3)` shifts indices")
for (i <- 0 to 5) {
println(i + " -> " + faceMap(3)(i))
}
gives:
Example: how `faceMap(3)` shifts indices
0 -> 0
1 -> 1
2 -> 2
3 -> 4
4 -> 5
5 -> 6
The facets method uses the faceMaps to create a lazy view of the original collection that omits one element by shifting the indices by one starting from the index of the omitted element.
If I understand what you want correctly, in terms of handling duplicate values (i.e., duplicate values are to be preserved), here's something that should work. Given the following input:
import scala.util.Random
val nums = Vector.fill(20)(Random.nextInt)
This should get you what you need:
for (i <- Iterator.from(0).take(nums.size)) yield {
nums(i) -> (nums.take(i) ++ nums.drop(i + 1))
}
On the other hand, if you want to remove dups, I'd convert to Sets:
val numsSet = nums.toSet
for (num <- nums) yield {
num -> (numsSet - num)
}
seq.iterator.map { case x => x -> seq.filter(_ != x) }
This is quadratic, but I don't think there is very much you can do about that, because in the end of the day, creating a collection is linear, and you are going to need N of them.
import scala.annotation.tailrec
def prems(s : Seq[Int]):Map[Int,Seq[Int]]={
#tailrec
def p(prev: Seq[Int],s :Seq[Int],res:Map[Int,Seq[Int]]):Map[Int,Seq[Int]] = s match {
case x::Nil => res+(x->prev)
case x::xs=> p(x +: prev,xs, res+(x ->(prev++xs)))
}
p(Seq.empty[Int],s,Map.empty[Int,Seq[Int]])
}
prems(Seq(1,2,3,4))
res0: Map[Int,Seq[Int]] = Map(1 -> List(2, 3, 4), 2 -> List(1, 3, 4), 3 -> List(2, 1, 4),4 -> List(3, 2, 1))
I think you are looking for permutations. You can map the resulting lists into the structure you are looking for:
Seq(1,2,3).permutations.map(p => (p.head, p.tail)).toList
res49: List[(Int, Seq[Int])] = List((1,List(2, 3)), (1,List(3, 2)), (2,List(1, 3)), (2,List(3, 1)), (3,List(1, 2)), (3,List(2, 1)))
Note that the final toList call is only there to trigger the evaluation of the expressions; otherwise, the result is an iterator as you asked for.
In order to get rid of the duplicate heads, toMap seems like the most straight-forward approach:
Seq(1,2,3).permutations.map(p => (p.head, p.tail)).toMap
res50: scala.collection.immutable.Map[Int,Seq[Int]] = Map(1 -> List(3, 2), 2 -> List(3, 1), 3 -> List(2, 1))

Update values of Map

I have a Map like:
Map("product1" -> List(Product1ObjectTypes), "product2" -> List(Product2ObjectTypes))
where ProductObjectType has a field usage. Based on the other field (counter) I have to update all ProductXObjectTypes.
The issue is that this update depends on previous ProductObjectType, and I can't find a way to get previous item when iterating over mapValues of this map. So basically, to update current usage I need: CurrentProduct1ObjectType.counter - PreviousProduct1ObjectType.counter.
Is there any way to do this?
I started it like:
val reportsWithCalculatedUsage =
reportsRefined.flatten.flatten.toList.groupBy(_._2.product).mapValues(f)
but I don't know in mapValues how to access previous list item.
I'm not sure if I understand completely, but if you want to update the values inside the lists based on their predecessors, this can generally be done with a fold:
case class Thing(product: String, usage: Int, counter: Int)
val m = Map(
"product1" -> List(Thing("Fnord", 10, 3), Thing("Meep", 0, 5))
//... more mappings
)
//> Map(product1 -> List(Thing(Fnord,10,3), Thing(Meep,0,5)))
m mapValues { list => list.foldLeft(List[Thing]()){
case (Nil, head) =>
List(head)
case (tail, head) =>
val previous = tail.head
val current = head copy (usage = head.usage + head.counter - previous.counter)
current :: tail
} reverse }
//> Map(product1 -> List(Thing(Fnord,10,3), Thing(Meep,2,5)))
Note that regular map is an unordered collection, you need to use something like TreeMap to have predictable order of iteration.
Anyways, from what I understand you want to get pairs of all values in a map. Try something like this:
scala> val map = Map(1 -> 2, 2 -> 3, 3 -> 4)
scala> (map, map.tail).zipped.foreach((t1, t2) => println(t1 + " " + t2))
(1,2) (2,3)
(2,3) (3,4)

How do I populate a list of objects with new values

Apologies: I'm well noob
I have an items class
class item(ind:Int,freq:Int,gap:Int){}
I have an ordered list of ints
val listVar = a.toList
where a is an array
I want a list of items called metrics where
ind is the (unique) integer
freq is the number of times that ind appears in list
gap is the minimum gap between ind and the number in the list before it
so far I have:
def metrics = for {
n <- 0 until 255
listVar filter (x == n) count > 0
}
yield new item(n, (listVar filter == n).count,0)
It's crap and I know it - any clues?
Well, some of it is easy:
val freqMap = listVar groupBy identity mapValues (_.size)
This gives you ind and freq. To get gap I'd use a fold:
val gapMap = listVar.sliding(2).foldLeft(Map[Int, Int]()) {
case (map, List(prev, ind)) =>
map + (ind -> (map.getOrElse(ind, Int.MaxValue) min ind - prev))
}
Now you just need to unify them:
freqMap.keys.map( k => new item(k, freqMap(k), gapMap.getOrElse(k, 0)) )
Ideally you want to traverse the list only once and in the course for each different Int, you want to increment a counter (the frequency) as well as keep track of the minimum gap.
You can use a case class to store the frequency and the minimum gap, the value stored will be immutable. Note that minGap may not be defined.
case class Metric(frequency: Int, minGap: Option[Int])
In the general case you can use a Map[Int, Metric] to lookup the Metric immutable object. Looking for the minimum gap is the harder part. To look for gap, you can use the sliding(2) method. It will traverse the list with a sliding window of size two allowing to compare each Int to its previous value so that you can compute the gap.
Finally you need to accumulate and update the information as you traverse the list. This can be done by folding each element of the list into your temporary result until you traverse the whole list and get the complete result.
Putting things together:
listVar.sliding(2).foldLeft(
Map[Int, Metric]().withDefaultValue(Metric(0, None))
) {
case (map, List(a, b)) =>
val metric = map(b)
val newGap = metric.minGap match {
case None => math.abs(b - a)
case Some(gap) => math.min(gap, math.abs(b - a))
}
val newMetric = Metric(metric.frequency + 1, Some(newGap))
map + (b -> newMetric)
case (map, List(a)) =>
map + (a -> Metric(1, None))
case (map, _) =>
map
}
Result for listVar: List[Int] = List(2, 2, 4, 4, 0, 2, 2, 2, 4, 4)
scala.collection.immutable.Map[Int,Metric] = Map(2 -> Metric(4,Some(0)),
4 -> Metric(4,Some(0)), 0 -> Metric(1,Some(4)))
You can then turn the result into your desired item class using map.toSeq.map((i, m) => new Item(i, m.frequency, m.minGap.getOrElse(-1))).
You can also create directly your Item object in the process, but I thought the code would be harder to read.