Getting minimum Int in IndexedSeq[(Int, Future[Long])] where Long > 0 - scala

I have a scala IndexedSeq[(Int, Future[Long])]).
I would like to fill out this function:
def getMininumIfCountIsPositive(distances: IndexedSeq[(Int, Future[Long])]): Future[Option[Int]] = {
}
If there does not exist an element where the Long is greater than 0, should return a Future of None. If there are elements where the Long is greater than 0, should return a Future of the minimum associated Int.
This is what I've got right now:
Future.sequence(distances.map {
case (index, count) => count.map(index -> _)
}) map {
s =>
Option(s.filter(_._2 > 0).minBy(_._1)._1)
}
But, I don't know how to handle the case where there are no elements that pass the filter, or where Futures have failed.

Map your sequence of Int, Future[Long] to a sequence of Future[(Int,Long)]:
val futureOfSequence = a map ( b: (Int, Future[Long]) => b._2 map ( c => (b._1,c)))
Then use Future.sequence to convert that sequence of Future[(Int,Long)] to Future[IndexedSeq(Int,Long)]
val sequenceOfFuture = Future.sequence(futureOfSequence)
Now you can map that Future to your Future[Option[Int]]:
val finalResult = sequenceOfFuture map ( iSeq: IndexedSeq[(Int,Long)] => /* your logic goes here */ )

Here is an efficient version, derived from the one in the question:
Future.traverse(distances) {
case (index, count) => count.map(index -> _)
} map { _.foldLeft(None: Option[Int]) {
case (a, (_, x)) if x <= 0 => a
case (None, (i, _)) => Some(i)
case (Some(ai), (i, _)) => Some(ai min i)
}}
Future.traverse lets us combine the Future.sequence and map operations together. The foldLeft combines all the logic from filter and minBy and produces the appropriate Option.
Both Future.traverse and Future.sequence produce a failed future if any of the futures they are built from fails, so you already have proper failure handling.

Rather long-winded..
def get(a: IndexedSeq[(Int, Future[Long])]): Future[Option[Int]] = {
Future.sequence( // Convert the Seq[Future] to Future[Seq]
a.map{ case (index, f) =>
f.map(l => (index, l)) // map each Future to be paired with its index
.recover{ case _: Throwable => (0, 0L)} // recover failed Futures as (0, 0) since they'll be thrown out anyway
}
).map{ seq =>
Option(seq.minBy(_._2)) // Map the Seq to it's minimum element wrapped in Option
.filter(_._2 > 0) // Remove non-positives
.map(_._1) // Take the index
}
}

trait Test2 {
import scala.concurrent.Future
import scala.concurrent.Future.{traverse, successful}
implicit def context: scala.concurrent.ExecutionContext
def logic(in: IndexedSeq[(Int, Long)]): Option[Int]
def getMininumIfCountIsPositive(a: IndexedSeq[(Int, Future[Long])]): Future[Option[Int]] = {
traverse(a) { case (i, f) => successful(i).zip(f) } map(logic)
}
}

Related

Sorting a collection of collections by indices of inner collection elements in Scala

Let us have a collection of collections as below:
type Row = IndexedSeq[Any]
type RowTable = IndexedSeq[Row]
val table: RowTable = IndexedSeq(
IndexedSeq(2, "b", ... /* some elements of type Any*/),
IndexedSeq(1, "a", ...),
IndexedSeq(2, "c", ...))
Each Row in RowTable "has the same schema", meaning that as in example if the first row in the table contains Int, String, ..., then the second row in the table contains the elements of the same type in the same order, i.e., Int, String, ....
I would like to sort Rows in a RowTable by given indices of Row's elements and the sorting direction (ascending or descending sort) for that element.
For example, the collection above would be sorted this way for Index 0 ascending and Index 1 descending and the rest of elements are not important in sorting:
1, "a", ...
2, "c", ...
2, "b", ...
Since Row is IndexedSeq[Any], we do not know the type of each element to compare it; however, we know that it may be casted to Comparable[Any] and, thus, has compareTo() method to compare it with an element under the same index in another row.
The indices, as mentioned above, that will determine the sorting order are not known before we start sorting. How can I code this in Scala?
First of all, it's a bad design to compare a pair of Any.
By default, scala doesn't provide any way to get Ordering[Any]. Hence if you want to compare a pair of Any, you should implement Ordering[Any] by yourself:
object AnyOrdering extends Ordering[Any] {
override def compare(xRaw: Any, yRaw: Any): Int = {
(xRaw, yRaw) match {
case (x: Int, y: Int) => Ordering.Int.compare(x, y)
case (_: Int, _) => 1
case (_, _: Int) => -1
...
case (x: String, y: String) => Ordering.String.compare(x, y)
case (_: String, _) => 1
case (_, _: String) => -1
...
case (_, _) => 0
}
}
}
In your example, you want to compare two IndexedSeq[T] recursively. Scala doesn't provide any recursive Ordering and you need to implement it too:
def recOrdering[T](implicit ordering: Ordering[T]): Ordering[IndexedSeq[T]] = new Ordering[IndexedSeq[T]] {
override def compare(x: IndexedSeq[T], y: IndexedSeq[T]): Int = compareRec(x, y)
#tailrec
private def compareRec(x: IndexedSeq[T], y: IndexedSeq[T]): Int = {
(x.headOption, y.headOption) match {
case (Some(xHead), Some(yHead)) =>
val compare = ordering.compare(xHead, yHead)
if (compare == 0) {
compareRec(x.tail, y.tail)
} else {
compare
}
case (Some(_), None) => 1
case (None, Some(_)) => -1
}
}
}
After that you can finally sort your collection:
table.sorted(recOrdering(AnyOrdering))
(Sorry for unidiomatic (maybe not compiling) code; I can probably help with it upon request)
We can use the code below to sort a table
table.sortWith {
case (tupleL, tupleR) => isLessThan(tupleL, tupleR)
}
where isLessThan is defined as follows (unidiomatic to Scala, ik):
def isLessThan(tupleL: Row, tupleR: Row): Boolean = {
var i = 0
while (i < sortInfos.length) {
val sortInfo = sortInfos(i)
val result = tupleL(sortInfo.fieldIndex)
.asInstanceOf[Comparable[Any]].compareTo(
tupleR(sortInfo.fieldIndex)
.asInstanceOf[Comparable[Any]])
if (result != 0) {
if (sortInfo.isDescending) {
if (result > 0)
return true
else
return false
} else {
if (result < 0)
return true
else
return false
}
}
i += 1
}
true
}
where sortInfos is IndexedSeq[SortInfo] and
case class SortInfo(val fieldIndex: Int, val isDescending: Boolean)
Here is working Example with Ordering[IndexedSeq[Any]]:
val table: IndexedSeq[IndexedSeq[Any]] = IndexedSeq(
IndexedSeq(2, "b", "a"),
IndexedSeq(2, "b"),
IndexedSeq("c", 2),
IndexedSeq(1, "c"),
IndexedSeq("c", "c"),
//IndexedSeq((), "c"), //it will blow in runtime
IndexedSeq(2, "a"),
)
implicit val isaOrdering:Ordering[IndexedSeq[Any]] = { (a, b) =>
a.zip(b).filter {case (a, b)=> a != b}.collectFirst {
case (a:Int, b:Int) => a compare b
case (a:String, b:String) => a compare b
case (a:String, b:Int) => 1 //prefere ints over strings
case (a:Int, b:String) => -1 //prefere ints over strings
case _ => throw new RuntimeException(s"cannot compare $a to $b")
}.getOrElse(a.length compare b.length) //shorter will be first
}
println(table.sorted) //used implicitly
println(table.sorted(isaOrdering))
//Vector(Vector(1, c), Vector(2, a), Vector(2, b), Vector(2, b, a), Vector(c, 2), Vector(c, c))
https://scalafiddle.io/sf/yvLEnYL/4
or if you really need to compare different types somehow this is best I could find:
implicit val isaOrdering:Ordering[IndexedSeq[Any]] = { (a, b) =>
a.zip(b).filter {case (a, b)=> a != b}.collectFirst {
case (a:Int, b:Int) => a compare b
case (a:String, b:String) => a compare b
//add your known types here
// ...
//below is rule that cares about unknown cases.
//We don't know types at all, at best what we can do is compare equality.
//If they are equal then return 0... if not we throw
//this could be also very slow (don't tested)
case (a, b) =>
//not nice but it is stable at least
val ac = a.getClass.getName
val bc = b.getClass.getName
ac.compare(bc) match {
case 0 => if (ac == bc) 0 else throw new RuntimeException(s"cannot compare $a to $b")
case x => x
}
}.getOrElse(a.length compare b.length) //shorter will be first
}
https://scalafiddle.io/sf/yvLEnYL/5
This implementation will fail in runtime when we could not compare them.

Best way to unpack an option field inside a map operation in scala

Let's say that I have a list of tuples:
val xs: List[(Seq[String], Option[String])] = List(
(Seq("Scala", "Python", "Javascript"), Some("Java")),
(Seq("Wine", "Beer"), Some("Beer")),
(Seq("Dog", "Cat", "Man"), None)
)
and a function that returns the index of the string if it is in the sequence of strings:
def getIndex(s: Seq[String], e: Option[String]): Option[Int] =
if (e.isEmpty) None
else Some(s.indexOf(e.get))
Now I am trying to map over xs with getIndex and return only those that I found a valid index. One way to do this is as follows:
xs.map{case (s, e) => {
val ii = getIndex(s, e) // returns an Option
ii match { // unpack the option
case Some(idx) => (e, idx)
case None => (e, -1) // give None entries a placeholder with -1
}
}}.filter(_._2 != -1) // filter out invalid entries
This approach is quite verbose and clunky to me. flatMap does not work here because I am returning a tuple instead of just the index. What is the idiomatic way to do this?
A for comprehension is one way to achieve this:
scala> val xs: List[(Seq[String], Option[String])] = List(
(Seq("Scala", "Python", "Javascript"), Some("Java")),
(Seq("Wine", "Beer"), Some("Beer")),
(Seq("Dog", "Cat", "Man"), None)
)
xs: List[(Seq[String], Option[String])] = List((List(Scala, Python, Javascript),Some(Java)), (List(Wine, Beer),Some(Beer)), (List(Dog, Cat, Man),None))
scala> def getIndex(seq: Seq[String], e: Option[String]): Option[Int] =
e.map(seq.indexOf(_)).filter(_ != -1) // notice we're doing the filter here
getIndex: getIndex[](val seq: Seq[String],val e: Option[String]) => Option[Int]
scala> for {
(seq, string) <- xs
index <- getIndex(seq, string)
s <- string
} yield (s, index)
res0: List[(String, Int)] = List((Beer,1))
There are a lot of ways to do this. One of them is this:
val result = xs.flatMap { tuple =>
val (seq, string) = tuple
string.map(s => (s, seq.indexOf(s))).filter(_._2 >= 0)
}
Maybe this looks a bit more idiomatic:
val two = xs.filter {case (s, e) => e.isDefined}
.map {case (s, e) => (e, s.indexOf(e.get)) }
.filter {case (e, i) => i > 0}
We can use the collect method to combine a map and filter:
xs.collect { case (s, e) if e.isDefined => (e, s.indexOf(e.get)) }
.filter { case (e, i) => i > 0 }
map and getOrElse might get things a little clearer:
// use map you will get Some(-1) if the element doesn't exist or None if the element is None
xs.map{case (s, e) => (e, e.map(s.indexOf(_)))}.
// check if the index is positive and use getOrElse to return false if it's None
filter{case (e, idx) => idx.map(_ >= 0).getOrElse(false)}
// res16: List[(Option[String], Option[Int])] = List((Some(Beer),Some(1)))
Or:
xs.map{ case (s, e) => (e, e.map(s.indexOf).getOrElse(-1)) }.filter(_._2 != -1)
// res17: List[(Option[String], Int)] = List((Some(Beer),1)

Scala check a Sequence of Eithers

I want to update a sequence in Scala, I have this code :
def update(userId: Long): Either[String, Int] = {
Logins.findByUserId(userId) map {
logins: Login => update(login.id,
Seq(NamedParameter("random_date", "prefix-" + logins.randomDate)))
} match {
case sequence : Seq(Nil, Int) => sequence.foldLeft(Right(_) + Right(_))
case _ => Left("error.logins.update")
}
}
Where findByUserId returns a Seq[Logins] and update returns Either[String, Int] where Int is the number of updated rows,
and String would be the description of the error.
What I want to achieve is to return an String if while updating the list an error happenes or an Int with the total number of updated rows.
The code is not working, I think I should do something different in the match, I don't know how I can check if every element in the Seq of Eithers is a Right value.
If you are open to using Scalaz or Cats you can use traverse. An example using Scalaz :
import scalaz.std.either._
import scalaz.std.list._
import scalaz.syntax.traverse._
val logins = Seq(1, 2, 3)
val updateRight: Int => Either[String, Int] = Right(_)
val updateLeft: Int => Either[String, Int] = _ => Left("kaboom")
logins.toList.traverseU(updateLeft).map(_.sum) // Left(kaboom)
logins.toList.traverseU(updateRight).map(_.sum) // Right(6)
Traversing over the logins gives us a Either[String, List[Int]], if we get the sum of the List we get the wanted Either[String, Int].
We use toList because there is no Traverse instance for Seq.
traverse is a combination of map and sequence.
We use traverseU instead of traverse because it infers some of the types for us (otherwise we should have introduced a type alias or a type lambda).
Because we imported scalaz.std.either._ we can use map directly without using a right projection (.right.map).
You shouldn't really use a fold if you want to exit early. A better solution would be to recursively iterate over the list, updating and counting successes, then return the error when you encounter one.
Here's a little example function that shows the technique. You would probably want to modify this to do the update on each login instead of just counting.
val noErrors = List[Either[String,Int]](Right(10), Right(12))
val hasError = List[Either[String,Int]](Right(10), Left("oops"), Right(12))
def checkList(l: List[Either[String,Int]], goodCount: Int): Either[String, Int] = {
l match {
case Left(err) :: xs =>
Left(err)
case Right(_) :: xs =>
checkList(xs, (goodCount + 1))
case Nil =>
Right(goodCount)
}
}
val r1 = checkList(noErrors, 0)
val r2 = checkList(hasError, 0)
// r1: Either[String,Int] = Right(2)
// r2: Either[String,Int] = Left(oops)
You want to stop as soon as an update fails, don't you?
That means that you want to be doing your matching inside the map, not outside. Try is actually a more suitable construct for this purpose, than Either. Something like this, perhaps:
def update(userId: Long): Either[String, Int] = Try {
Logins.findByUserId(userId) map { login =>
update(login.id, whatever) match {
case Right(x) => x
case Left(s) => throw new Exception(s)
}
}.sum
}
.map { n => Right(n) }
.recover { case ex => Left(ex.getMessage) }
BTW, a not-too-widely-known fact about scala is that putting a return statement inside a lambda, actually returns from the enclosing method. So, another, somewhat shorter way to write this would be like this:
def update(userId: Long): Either[String, Int] =
Logins.findByUserId(userId).foldLeft(Right(0)) { (sum,login) =>
update(login.id, whatever) match {
case Right(x) => Right(sum.right + x)
case error#Left(s) => return error
}
}
Also, why in the world does findUserById return a sequence???

subsets manipulation on vectors in spark scala

I have an RDD curRdd of the form
res10: org.apache.spark.rdd.RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] = ShuffledRDD[102]
with curRdd.collect() producing the following result.
Array((Vector((5,2)),1), (Vector((1,1)),2), (Vector((1,1), (5,2)),2))
Here key : vector of pairs of ints and value: count
Now, I want to convert it into another RDD of the same form RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] by percolating down the counts.
That is (Vector((1,1), (5,2)),2)) will contribute its count of 2 to any key which is a subset of it like (Vector((5,2)),1) becomes (Vector((5,2)),3).
For the example above, our new RDD will have
(Vector((5,2)),3), (Vector((1,1)),4), (Vector((1,1), (5,2)),2)
How do I achieve this? Kindly help.
First you can introduce subsets operation for Seq:
implicit class SubSetsOps[T](val elems: Seq[T]) extends AnyVal {
def subsets: Vector[Seq[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}
empty subset will allways the be first element in the result vector, so you can omit it with .tail
Now your task is pretty obvious map-reduce which is flatMap-reduceByKey in terms of RDD:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
Update
This implementation could introduce new sets in the result, if you would like to choose only those that was presented in the original collection, you can join result with original:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
.join(curRdd map identity[(Seq[(Int, Int)], Int)])
.map { case (key, (v, _)) => (key, v) }
Note that map identity step is needed to convert key type from Vector[_] to Seq[_] in the original RDD. You can instead modify SubSetsOps definition substituting all occurencest of Seq[T] with Vector[T] or change definition following (hardcode scala.collection) way:
import scala.collection.SeqLike
import scala.collection.generic.CanBuildFrom
implicit class SubSetsOps[T, F[e] <: SeqLike[e, F[e]]](val elems: F[T]) extends AnyVal {
def subsets(implicit cbf: CanBuildFrom[F[T], T, F[T]]): Vector[F[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}

n-way `span` on sequences

Given a sequence of elements and a predicate p, I would like to produce a sequence of sequences such that, in each subsequence, either all elements satisfy p or the sequence has length 1. Additionally, calling .flatten on the result should give me back my original sequence (so no re-ordering of elements).
For instance, given:
val l = List(2, 4, -6, 3, 1, 8, 7, 10, 0)
val p = (i : Int) => i % 2 == 0
I would like magic(l,p) to produce:
List(List(2, 4, -6), List(3), List(1), List(8), List(7), List(10, 0))
I know of .span, but that method stops the first time it encounters a value that doesn't satisfy p and just returns a pair.
Below is a candidate implementation. It does what I want, but, well, makes we want to cry. I would love for someone to come up with something slightly more idiomatic.
def magic[T](elems : Seq[T], p : T=>Boolean) : Seq[Seq[T]] = {
val loop = elems.foldLeft[(Boolean,Seq[Seq[T]])]((false,Seq.empty)) { (pr,e) =>
val (lastOK,s) = pr
if(lastOK && p(e)) {
(true, s.init :+ (s.last :+ e))
} else {
(p(e), s :+ Seq(e))
}
}
loop._2
}
(Note that I do not particularly care about preserving the actual type of the Seq.)
I would not use foldLeft. It's just a simple recursion of span with a special rule if the head doesn't match the predicate:
def magic[T](elems: Seq[T], p: T => Boolean): Seq[Seq[T]] =
elems match {
case Seq() => Seq()
case Seq(head, tail # _*) if !p(head) => Seq(head) +: magic(tail, p)
case xs =>
val (prefix, rest) = xs span p
prefix +: magic(rest, p)
}
You could also do it tail-recursive, but you need to remember to reverse the output if you're prepending (as is sensible):
def magic[T](elems: Seq[T], p: T => Boolean): Seq[Seq[T]] = {
def iter(elems: Seq[T], out: Seq[Seq[T]]) : Seq[Seq[T]] =
elems match {
case Seq() => out.reverse
case Seq(head, tail # _*) if !p(head) => iter(tail, Seq(head) +: out)
case xs =>
val (prefix, rest) = xs span p
iter(rest, prefix +: out)
}
iter(elems, Seq())
}
For this task you can use takeWhile and drop combined with a little pattern matching an recursion:
def magic[T](elems : Seq[T], p : T=>Boolean) : Seq[Seq[T]] = {
def magic(elems: Seq[T], result: Seq[Seq[T]]): Seq[Seq[T]] = elems.takeWhile(p) match {
// if elems is Nil, we have a result
case Nil if elems.isEmpty => result
// if it's not, but we don't get any values from takeWhile, we take a single elem
case Nil => magic(elems.tail, result :+ Seq(elems.head))
// takeWhile gave us something, so we add it to the result
// and drop as many elements from elems, as takeWhile gave us
case xs => magic(elems.drop(xs.size), result :+ xs)
}
magic(elems, Seq())
}
Another solution using a fold:
def magicFilter[T](seq: Seq[T], p: T => Boolean): Seq[Seq[T]] = {
val (filtered, current) = (seq foldLeft (Seq[Seq[T]](), Seq[T]())) {
case ((filtered, current), element) if p(element) => (filtered, current :+ element)
case ((filtered, current), element) if !current.isEmpty => (filtered :+ current :+ Seq(element), Seq())
case ((filtered, current), element) => (filtered :+ Seq(element), Seq())
}
if (!current.isEmpty) filtered :+ current else filtered
}