GroupBy with list of keys -- Scala - scala

I'm micro-optimising some code as a challenge.
I have a list of objects with a list of keys in each of them.
What's the most efficient way of grouping them by key, with each object being in every group of which it has a key.
This is what I have, but I have a feeling it can be improved.
I have many objects (100k+), each has ~2 keys, and there's less than 50 possible keys.
I've tried parallelising the list with listOfObjs.par, but there doesn't seem to be much of an improvement overall.
case class Obj(value: Option[Int], key: Option[List[String]])
listOfObjs
.filter(x => x.key.isDefined && x.value.isDefined)
.flatMap(x => x.key.get.map((_, x.value.get)))
.groupBy(_._1)

If you have that many object, the logical next step would be to distribute the work by using a MapReduce framework. At the end of the day you still need to go over every single object to determine the group it belongs in and your worst case will be bottlenecked by that.
The best you can do here is to replace these 3 operations by a fold so you only iterate through the collection once.
Edit: Updated the order based on Luis' recommendation in the comments
listOfObj.foldLeft(Map.empty[String, List[Int]]){ (acc, obj) =>
(obj.key, obj.value) match {
case (Some(k), Some(v)) =>
k.foldLeft(acc)((a, ky) => a + (ky -> {v +: a.getOrElse(ky, List.empty)}))))
case _ => acc
}
}

I got the impression you are looking for a fast alternative; thus a little bit of encapsulated mutability can help.
So, what about something like this:
def groupObjectsByKey(objects: List[Obj]): Map[String, List[Int]] = {
val iter =
objects.iterator.flatMap {
case Obj(Some(value), Some(keys)) =>
keys.iterator.map(key => key -> value)
case _ =>
Iterator.empty[(String, Int)]
}
val m =
mutable
.Map
.empty[String, mutable.Builder[Int, List[Int]]
iter.foreach {
case (k, v) =>
m.get(key = k) match {
case Some(builder) =>
builder.addOne(v)
case None =>
m.update(key = k, value = List.newBuilder[Int].addOne(v))
}
}
immutable
.Map
.mapFactory[String, List[Int]]
.fromSpecific(m.view.mapValues(_.result()))
}
Or if you don't care about the order of the elements of each group we can simplify and speed up the code a lot:
def groupObjectsByKey(objects: List[Obj]): Map[String, List[Int]] = {
val iter = objects.iterator.flatMap {
case Obj(Some(value), Some(keys)) =>
keys.iterator.map(key => key -> value)
case _ =>
Iterator.empty[(String, Int)]
}
val m = mutable.Map.empty[String, List[Int]]
iter.foreach {
case (k, v) =>
m.updateWith(key = k) match {
case Some(list) =>
Some(v :: list)
case None =>
Some(v :: Nil)
}
}
m.to(immutable.Map)
}

Related

concise way to filter and map a scala sequence of tuples containing options

What is the shortest way to transform a Seq of Tuples, e.g.:
val xs : Seq[(Long,Option[Double])] = Seq((1L,None),(2L,Some(2.0)),(3L,None))
to a Seq[(Long,Double)] by removing the Nones
I've used both
xs.filter(_._2.isDefined).map{case (i,x) => (i,x.get)}
and
xs.flatMap{
case (i,Some(x)) => Some(i,x)
case _ => None
}
But wonder if there is a shorter way. For a Seq[Option[Double]] I would just do flatten... but this does not work for nested Options.
You could use collect which discards what's not part of your cases:
xs.collect{ case (i, Some(x)) => (i, x) }
In this case since case (i, None) is not used, these cases will just be filtered out.
What about:
val x: Seq[(Long, Option[Double])]
x.map {
case (a, b) => b.map(z => (a, z))
} flatten
Depends on what you'd call acceptable. Couple of options:
xs.collect(Function.unlift(e => e._2.map(e._1 -> _)))
xs.map(e => e._2.map(e._1 ->)).flatten
These are shorter, although we're entering code golf territory.

Elegant way to validate scala map

My program receives a scala map, the requirements is to validate this map (key-value pairs). Ex: validate a key value, change its value to an acceptable format etc. In a rare case, we update the key as well before passing the map to the down layer. Its not always required to update this map , but only when we detect that there are any unsupported keys or values. However, we have to check all key/value pairs. I'm doing some thing like this:
private def updateMap ( parameters: Map[String, String]): Map[String, String] = {
parameters.map{
case(k,v) => k match { case "checkPool" =>
(k, (if (k.contains("checkPool"))
v match {
case "1" => "true"
case _ => "false"
}
else v))
case "Newheader" => (k.replace("Newheader","header"),v)
case _ =>(k,v)
}
case _ => ("","")
}
}
Like this the code increases for doing the validation and converting the keys/values to supported ones. Is there a cleaner way of doing this validation in Scala for a map?
Thanks
It will be clearer if you put all your patterns above one another:
parameters.map{
case (k#"checkPool", "1") => k -> "true"
case (k#"checkPool", _") => k -> "false"
case ("Newheader", v) => "header" -> v
// put here all your other cases
case (k, v) => k -> v //last possible case, if nothing other matches
}
For clarity, you can also put different validators in partial functions:
type Validator = PartialFunction[(String, String), (String, String)
val checkPool: Validator = {
case (k#"checkPool", "1") => k -> "true"
case (k#"checkPool", _") => k -> "false"
}
val headers: Validator = {
case ("Newheader", v) => "header" -> v
}
And then put all your validators one after the other in your map:
parameters.map(
checkPool orElse
headers orElse
... orElse
PartialFunction(identity[(String, String)]) //this is the same as case (k, v) => k -> v
)
simple if else condition matching seems to be the best choice.
def updateMap(parameters: Map[String, String]): Map[String, String] = {
parameters.map(kv => {
var key = kv._1
var value = kv._2
if(key.contains("checkPool")){
value = if(value.equals("1")) "true" else "false"
}
else if(key.contains("Newheader")){
key = key.replace("Newheader", "header")
}
(key, value)
})
}
You can add more else if conditions

Scala check a Sequence of Eithers

I want to update a sequence in Scala, I have this code :
def update(userId: Long): Either[String, Int] = {
Logins.findByUserId(userId) map {
logins: Login => update(login.id,
Seq(NamedParameter("random_date", "prefix-" + logins.randomDate)))
} match {
case sequence : Seq(Nil, Int) => sequence.foldLeft(Right(_) + Right(_))
case _ => Left("error.logins.update")
}
}
Where findByUserId returns a Seq[Logins] and update returns Either[String, Int] where Int is the number of updated rows,
and String would be the description of the error.
What I want to achieve is to return an String if while updating the list an error happenes or an Int with the total number of updated rows.
The code is not working, I think I should do something different in the match, I don't know how I can check if every element in the Seq of Eithers is a Right value.
If you are open to using Scalaz or Cats you can use traverse. An example using Scalaz :
import scalaz.std.either._
import scalaz.std.list._
import scalaz.syntax.traverse._
val logins = Seq(1, 2, 3)
val updateRight: Int => Either[String, Int] = Right(_)
val updateLeft: Int => Either[String, Int] = _ => Left("kaboom")
logins.toList.traverseU(updateLeft).map(_.sum) // Left(kaboom)
logins.toList.traverseU(updateRight).map(_.sum) // Right(6)
Traversing over the logins gives us a Either[String, List[Int]], if we get the sum of the List we get the wanted Either[String, Int].
We use toList because there is no Traverse instance for Seq.
traverse is a combination of map and sequence.
We use traverseU instead of traverse because it infers some of the types for us (otherwise we should have introduced a type alias or a type lambda).
Because we imported scalaz.std.either._ we can use map directly without using a right projection (.right.map).
You shouldn't really use a fold if you want to exit early. A better solution would be to recursively iterate over the list, updating and counting successes, then return the error when you encounter one.
Here's a little example function that shows the technique. You would probably want to modify this to do the update on each login instead of just counting.
val noErrors = List[Either[String,Int]](Right(10), Right(12))
val hasError = List[Either[String,Int]](Right(10), Left("oops"), Right(12))
def checkList(l: List[Either[String,Int]], goodCount: Int): Either[String, Int] = {
l match {
case Left(err) :: xs =>
Left(err)
case Right(_) :: xs =>
checkList(xs, (goodCount + 1))
case Nil =>
Right(goodCount)
}
}
val r1 = checkList(noErrors, 0)
val r2 = checkList(hasError, 0)
// r1: Either[String,Int] = Right(2)
// r2: Either[String,Int] = Left(oops)
You want to stop as soon as an update fails, don't you?
That means that you want to be doing your matching inside the map, not outside. Try is actually a more suitable construct for this purpose, than Either. Something like this, perhaps:
def update(userId: Long): Either[String, Int] = Try {
Logins.findByUserId(userId) map { login =>
update(login.id, whatever) match {
case Right(x) => x
case Left(s) => throw new Exception(s)
}
}.sum
}
.map { n => Right(n) }
.recover { case ex => Left(ex.getMessage) }
BTW, a not-too-widely-known fact about scala is that putting a return statement inside a lambda, actually returns from the enclosing method. So, another, somewhat shorter way to write this would be like this:
def update(userId: Long): Either[String, Int] =
Logins.findByUserId(userId).foldLeft(Right(0)) { (sum,login) =>
update(login.id, whatever) match {
case Right(x) => Right(sum.right + x)
case error#Left(s) => return error
}
}
Also, why in the world does findUserById return a sequence???

Return all combinations for nested lists

I have the following data structure
val list = List(1,2,
List(3,4),
List(5,6,7)
)
I want to get this as a result
List(
List(1,2,3,5), List(1,2,3,6), List(1,2,3,7),
List(1,2,4,5), List(1,2,4,6), List(1,2,4,7)
)
Number of sub-lists in the input and number of elements in them can vary
P.S.
I'm trying to use this as a first step
list.map{
case x => List(x)
case list:List => list
}
and some for comprehension, but it won't work because I don't know how many elements each sublist of the result will have
Types like List[Any] are most often avoided in Scala – so much of the power of the language comes from its smart type system, and this kind of type impedes this. So your instinct to turn the list into a normalized List[List[Int]] is spot on:
val normalizedList = list.map {
case x: Int => List(x)
case list: List[Int #unchecked] => list
}
Note that this will eventually throw a runtime exception if list includes a List of some type other than Int, such as List[String], due to type erasure. This is exactly the kind of problem that arises when failing to use strong types! You can read more about strategies for dealing with type erasure here.
Once you have a normalized List[List[Int]], then you can use foldLeft to build the combinations. You are also correct in seeing that a for comprehension can work well here:
normalizedList.foldLeft(List(List.empty[Int])) { (acc, next) =>
for {
combo <- acc
num <- next
} yield (combo :+ num)
}
In each iteration of the foldLeft, we consider one more sublist (next) from the normalizedList. We look at each combination thus far constructed (each combo in acc), and then for each number num in next, we make a new combination by appending it to combo.
As you might now, for comprehensions are really syntactic sugar for map, flatMap, and filter operations. So we can also express this with those more primitive methods:
normalizedList.foldLeft(List(List.empty[Int])) { (acc, next) =>
acc.flatMap { combo =>
next.map { num => combo :+ num }
}
}
You can even use the (somewhat silly) :/ alias for foldLeft, switch the order of the maps, and use underscore syntax for ultimate brevity:
(List(List[Int]()) /: normalizedList) { (acc, next) => next.flatMap { num => acc.map(_ :+ num) } }
val list = List(1,2,
List(3,4),
List(5,6,7)
)
def getAllCombinations(list: List[Any]) : List[List[Int]] ={
//normalize head so it is always a List
val headList: List[Int] = list.head match {
case i:Int => List(i)
case l:List[Int] => l
}
if(list.tail.nonEmpty){
// recursion for tail combinations
val tailCombinations : List[List[Int]] = getAllCombinations(list.tail)
//combine head combinations with tail combinations
headList.flatMap(
{i:Int => tailCombinations.map(
{l=>List(i).++(l)}
)
}
)
}
else{
headList.map(List(_))
}
}
print(getAllCombinations(list))
This can be achieved with the use of a foldLeft, as well. In the code below, each item of the outer List is folded into the List of List's by combining each current list with each new item.
val list = List(1,2, List(3,4), List(5,6,7) )
val lxl0 = List( List[Int]() ) //start value for foldLeft
val lxl = list.foldLeft( lxl0 )( (lxl, i) => {
i match {
case i:Int => for( l <- lxl ) yield l :+ i
case newl:List[Int] => for( l <- lxl;
i <- newl ) yield l :+ i
}
})
lxl.map( _.mkString(",") ).foreach( println(_))
While I didn't use the map that you desired, I do believe that the code may be changed to do the map and make all elements List[Int]. Then, that may simplify the foldLeft to simply do the for-comprehension. I was not able to get that to work immediately, though ;)

Getting minimum Int in IndexedSeq[(Int, Future[Long])] where Long > 0

I have a scala IndexedSeq[(Int, Future[Long])]).
I would like to fill out this function:
def getMininumIfCountIsPositive(distances: IndexedSeq[(Int, Future[Long])]): Future[Option[Int]] = {
}
If there does not exist an element where the Long is greater than 0, should return a Future of None. If there are elements where the Long is greater than 0, should return a Future of the minimum associated Int.
This is what I've got right now:
Future.sequence(distances.map {
case (index, count) => count.map(index -> _)
}) map {
s =>
Option(s.filter(_._2 > 0).minBy(_._1)._1)
}
But, I don't know how to handle the case where there are no elements that pass the filter, or where Futures have failed.
Map your sequence of Int, Future[Long] to a sequence of Future[(Int,Long)]:
val futureOfSequence = a map ( b: (Int, Future[Long]) => b._2 map ( c => (b._1,c)))
Then use Future.sequence to convert that sequence of Future[(Int,Long)] to Future[IndexedSeq(Int,Long)]
val sequenceOfFuture = Future.sequence(futureOfSequence)
Now you can map that Future to your Future[Option[Int]]:
val finalResult = sequenceOfFuture map ( iSeq: IndexedSeq[(Int,Long)] => /* your logic goes here */ )
Here is an efficient version, derived from the one in the question:
Future.traverse(distances) {
case (index, count) => count.map(index -> _)
} map { _.foldLeft(None: Option[Int]) {
case (a, (_, x)) if x <= 0 => a
case (None, (i, _)) => Some(i)
case (Some(ai), (i, _)) => Some(ai min i)
}}
Future.traverse lets us combine the Future.sequence and map operations together. The foldLeft combines all the logic from filter and minBy and produces the appropriate Option.
Both Future.traverse and Future.sequence produce a failed future if any of the futures they are built from fails, so you already have proper failure handling.
Rather long-winded..
def get(a: IndexedSeq[(Int, Future[Long])]): Future[Option[Int]] = {
Future.sequence( // Convert the Seq[Future] to Future[Seq]
a.map{ case (index, f) =>
f.map(l => (index, l)) // map each Future to be paired with its index
.recover{ case _: Throwable => (0, 0L)} // recover failed Futures as (0, 0) since they'll be thrown out anyway
}
).map{ seq =>
Option(seq.minBy(_._2)) // Map the Seq to it's minimum element wrapped in Option
.filter(_._2 > 0) // Remove non-positives
.map(_._1) // Take the index
}
}
trait Test2 {
import scala.concurrent.Future
import scala.concurrent.Future.{traverse, successful}
implicit def context: scala.concurrent.ExecutionContext
def logic(in: IndexedSeq[(Int, Long)]): Option[Int]
def getMininumIfCountIsPositive(a: IndexedSeq[(Int, Future[Long])]): Future[Option[Int]] = {
traverse(a) { case (i, f) => successful(i).zip(f) } map(logic)
}
}