I need a maxBy that returns all max values in case of equality.
Here is the signature and a first implementation :
def maxBy[A, B](as: Seq[A])(f: A => B)(implicit cmp: Ordering[B]) : Seq[A] =
as.groupBy(f).toList.maxBy(_._1)._2
Example :
maxBy(Seq(("a", "a1"),("a", "a2"),("b", "b1"),("b", "b2")))(_._1)
res6: Seq[(String, String)] = List(("b", "b1"), ("b", "b2"))
Updated with #thearchetypepaul comment
def maxBy[A, B](l: Seq[A])(f: A => B)(implicit cmp: Ordering[B]) : Seq[A] = {
l.foldLeft(Seq.empty[A])((b, a) =>
b.headOption match {
case None => Seq(a)
case Some(v) => cmp.compare(f(a), f(v)) match {
case -1 => b
case 0 => b.+:(a)
case 1 => Seq(a)
}
}
)
}
Is there a better way ?
(1) Ordering#compare promises to denote the three possible results by a negative, positive, or zero number, not -1, 1, or 0 necessarily.
(2) Option#fold is generally (though not universally) considered to be more idiomatic than pattern matching.
(3) You are calling f possibly multiple times per element. TraversableOnce#maxBy used to do this before it was fixed in 2.11.
(4) You only accept Seq. The Scala library works hard to use CanBuildFrom to generalize the algorithms; you might want to as well.
(5) You can use the syntactic sugar B : Ordering if you would like.
(6) You prepend to the Seq. This is faster than appending, since prepending is O(1) for List and appending is O(n). But you wind up with the results in reverse order. foldRight will correct this. (Or you can call reverse on the final result.)
If you want to allow the use of CanBuildFrom,
def maxBy[A, Repr, That, B : Ordering](elements: TraversableLike[A, Repr])(f: A => B)(implicit bf: CanBuildFrom[Repr, A, That]): That = {
val b = bf()
elements.foldLeft(Option.empty[B]) { (best, element) =>
val current = f(element)
val result = best.fold(0)(implicitly[Ordering[B]].compare(current, _))
if (result > 0) {
b.clear()
}
if (result >= 0) {
b += element
Some(current)
} else {
best
}
}
b.result
}
If you want to work with TraversableOnce,
def maxBy[A, B : Ordering](elements: TraversableOnce[A])(f: A => B): Seq[A] = {
elements.foldRight((Option.empty[B], List.empty[A])) { case (element, (best, elements)) =>
val current = f(element)
val result = best.fold(0)(implicitly[Ordering[B]].compare(current, _))
if (result > 0) {
(Some(current), List(element))
} else {
(best, if (result == 0) element +: elements else elements)
}
}._2
}
if the dataset is small the performance shouldn't concern you that much
and you can just sort, reverse, and takeWhile.
def maxBy[A,B:Ordering](l:List[A], f: A => B): List[A] = {
l.sortBy(f).reverse match {
case Nil => Nil
case h :: t => h :: t.takeWhile(x => f(x) == f(h))
}
}
where the f should be an isomorphism on A.
and you can also save f(h) before comparison
Related
I'm working through the Functional Programming in Scala book, and am on the section about lazy evaluation. There is an exercise to implement a takeWhile function using foldRight. I was able to complete it successfully, but when I added print statements I saw that it seems to be doing processing I wouldn't expect. I am very confused by this.
Here is the code:
trait McStream[+A] {
def uncons: Option[(A, McStream[A])]
def isEmpty: Boolean = uncons.isEmpty
def toList: List[A] = {
uncons match {
case Some(head -> tail) => head :: tail.toList
case None => Nil
}
}
def foldRight[B](z: => B)(f: (A, => B) => B): B = {
uncons match {
case Some(head -> tail) =>
println(s"Inside foldRight, head is $head")
f(head, tail.foldRight(z)(f))
case None => z
}
}
// TODO how does evaluate? Trace steps
// TODO it seems to be storing a deferred takeWhile in the `b` variable that evaluates during the cons
def takeWhile(p: A => Boolean): McStream[A] = {
foldRight(McStream.empty[A]) { (a, b) =>
println(s"a is $a, b is $b")
if (p(a)) {
McStream.cons(a, b)
} else {
McStream.empty
}
}
}
}
With a helper object for constructors:
object McStream {
def empty[A]: McStream[A] = new McStream[A] {
override def uncons: Option[(A, McStream[A])] = None
}
def cons[A](hd: => A, tl: => McStream[A]): McStream[A] = {
new McStream[A] {
lazy val uncons = Some((hd, tl))
}
}
def apply[A](as: A*): McStream[A] = {
if (as.isEmpty) empty
else cons(as.head, apply(as.tail: _*))
}
}
}
Here is the test I'm running:
"take while a predicate is matched" in {
val stream = McStream(1, 2)
stream.takeWhile(_ < 2).toList shouldEqual List(1)
}
And here is the output I get:
Inside foldRight, head is 1
Inside foldRight, head is 2
a is 2, b is McStream(None)
a is 1, b is McStream(None)
Inside foldRight, head is 2
a is 2, b is McStream(None)
I'm confused about the last two lines, to me it seems like it should recurse all the way to the end of the list, and then either connect the currently processed tail to the next element if the predicate matches, or replace it with an empty McStream otherwise. At that point, it should just be returning the list, and not doing the additional foldRight and evaluation.
Here is the evaluation order as far as I can understand it:
Stream(1, Stream(2, Stream.empty)).takeWhile(_ < 2)
should print Inside foldRight, head is 1
Stream(2, Stream.empty).takeWhile(_ < 2)
should print Inside foldRight, head is 2
Stream.empty.takeWhile(_ < 2)
End of recursion, starts to return
Stream(2, Stream.empty).takeWhile(_ < 2)
should print a is 2, b is Stream.empty
Stream(1, Stream.empty).takeWhile(_ < 2)
should print a is 1, b is Stream.empty
1 #:: Stream.empty
Thanks in advance!
Turns out that the very mechanisms I was using to try to understand the evaluation (println statements), were forcing evaluation and causing the above issues. When I remove the print statements it evaluates like it should.
Don't use println in your lazy evaluations!
How do I rewrite the following loop (pattern) into Scala, either using built-in higher order functions or tail recursion?
This the example of an iteration pattern where you do a computation (comparison, for example) of two list elements, but only if the second one comes after first one in the original input. Note that the +1 step is used here, but in general, it could be +n.
public List<U> mapNext(List<T> list) {
List<U> results = new ArrayList();
for (i = 0; i < list.size - 1; i++) {
for (j = i + 1; j < list.size; j++) {
results.add(doSomething(list[i], list[j]))
}
}
return results;
}
So far, I've come up with this in Scala:
def mapNext[T, U](list: List[T])(f: (T, T) => U): List[U] = {
#scala.annotation.tailrec
def loop(ix: List[T], jx: List[T], res: List[U]): List[U] = (ix, jx) match {
case (_ :: _ :: is, Nil) => loop(ix, ix.tail, res)
case (i :: _ :: is, j :: Nil) => loop(ix.tail, Nil, f(i, j) :: res)
case (i :: _ :: is, j :: js) => loop(ix, js, f(i, j) :: res)
case _ => res
}
loop(list, Nil, Nil).reverse
}
Edit:
To all contributors, I only wish I could accept every answer as solution :)
Here's my stab. I think it's pretty readable. The intuition is: for each head of the list, apply the function to the head and every other member of the tail. Then recurse on the tail of the list.
def mapNext[U, T](list: List[U], fun: (U, U) => T): List[T] = list match {
case Nil => Nil
case (first :: Nil) => Nil
case (first :: rest) => rest.map(fun(first, _: U)) ++ mapNext(rest, fun)
}
Here's a sample run
scala> mapNext(List(1, 2, 3, 4), (x: Int, y: Int) => x + y)
res6: List[Int] = List(3, 4, 5, 5, 6, 7)
This one isn't explicitly tail recursive but an accumulator could be easily added to make it.
Recursion is certainly an option, but the standard library offers some alternatives that will achieve the same iteration pattern.
Here's a very simple setup for demonstration purposes.
val lst = List("a","b","c","d")
def doSomething(a:String, b:String) = a+b
And here's one way to get at what we're after.
val resA = lst.tails.toList.init.flatMap(tl=>tl.tail.map(doSomething(tl.head,_)))
// resA: List[String] = List(ab, ac, ad, bc, bd, cd)
This works but the fact that there's a map() within a flatMap() suggests that a for comprehension might be used to pretty it up.
val resB = for {
tl <- lst.tails
if tl.nonEmpty
h = tl.head
x <- tl.tail
} yield doSomething(h, x) // resB: Iterator[String] = non-empty iterator
resB.toList // List(ab, ac, ad, bc, bd, cd)
In both cases the toList cast is used to get us back to the original collection type, which might not actually be necessary depending on what further processing of the collection is required.
Comeback Attempt:
After deleting my first attempt to give an answer I put some more thought into it and came up with another, at least shorter solution.
def mapNext[T, U](list: List[T])(f: (T, T) => U): List[U] = {
#tailrec
def loop(in: List[T], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail => loop(tail, out ::: tail.map { f(head, _) } )
}
loop(list, Nil)
}
I would also like to recommend the enrich my library pattern for adding the mapNext function to the List api (or with some adjustments to any other collection).
object collection {
object Implicits {
implicit class RichList[A](private val underlying: List[A]) extends AnyVal {
def mapNext[U](f: (A, A) => U): List[U] = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail => loop(tail, out ::: tail.map { f(head, _) } )
}
loop(underlying, Nil)
}
}
}
}
Then you can use the function like:
list.mapNext(doSomething)
Again, there is a downside, as concatenating lists is relatively expensive.
However, variable assignemends inside for comprehensions can be quite inefficient, too (as this improvement task for dotty Scala Wart: Convoluted de-sugaring of for-comprehensions suggests).
UPDATE
Now that I'm into this, I simply cannot let go :(
Concerning 'Note that the +1 step is used here, but in general, it could be +n.'
I extended my proposal with some parameters to cover more situations:
object collection {
object Implicits {
implicit class RichList[A](private val underlying: List[A]) extends AnyVal {
def mapNext[U](f: (A, A) => U): List[U] = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail => loop(tail, out ::: tail.map { f(head, _) } )
}
loop(underlying, Nil)
}
def mapEvery[U](step: Int)(f: A => U) = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = {
in match {
case Nil => out.reverse
case head :: tail => loop(tail.drop(step), f(head) :: out)
}
}
loop(underlying, Nil)
}
def mapDrop[U](drop1: Int, drop2: Int, step: Int)(f: (A, A) => U): List[U] = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail =>
loop(tail.drop(drop1), out ::: tail.drop(drop2).mapEvery(step) { f(head, _) } )
}
loop(underlying, Nil)
}
}
}
}
list // [a, b, c, d, ...]
.indices // [0, 1, 2, 3, ...]
.flatMap { i =>
elem = list(i) // Don't redo access every iteration of the below map.
list.drop(i + 1) // Take only the inputs that come after the one we're working on
.map(doSomething(elem, _))
}
// Or with a monad-comprehension
for {
index <- list.indices
thisElem = list(index)
thatElem <- list.drop(index + 1)
} yield doSomething(thisElem, thatElem)
You start, not with the list, but with its indices. Then, you use flatMap, because each index goes to a list of elements. Use drop to take only the elements after the element we're working on, and map that list to actually run the computation. Note that this has terrible time complexity, because most operations here, indices/length, flatMap, map, are O(n) in the list size, and drop and apply are O(n) in the argument.
You can get better performance if you a) stop using a linked list (List is good for LIFO, sequential access, but Vector is better in the general case), and b) make this a tiny bit uglier
val len = vector.length
(0 until len)
.flatMap { thisIdx =>
val thisElem = vector(thisIdx)
((thisIdx + 1) until len)
.map { thatIdx =>
doSomething(thisElem, vector(thatIdx))
}
}
// Or
val len = vector.length
for {
thisIdx <- 0 until len
thisElem = vector(thisIdx)
thatIdx <- (thisIdx + 1) until len
thatElem = vector(thatIdx)
} yield doSomething(thisElem, thatElem)
If you really need to, you can generalize either version of this code to all IndexedSeqs, by using some implicit CanBuildFrom parameters, but I won't cover that.
In scalaz 7.2.6, I want to implement sequence on Disjunction, such that if there is one or more lefts, it returns a list of those, instead of taking only the first one (as in Disjunction.sequenceU):
import scalaz._, Scalaz._
List(1.right, 2.right, 3.right).sequence
res1: \/-(List(1, 2, 3))
List(1.right, "error2".left, "error3".left).sequence
res2: -\/(List(error2, error3))
I've implemented it as follows and it works, but it looks ugly. Is there a getRight method (such as in scala Either class, Right[String, Int](3).right.get)? And how to improve this code?
implicit class RichSequence[L, R](val l: List[\/[L, R]]) {
def getLeft(v: \/[L, R]):L = v match { case -\/(x) => x }
def getRight(v: \/[L, R]):R = v match { case \/-(x) => x }
def sequence: \/[List[L], List[R]] =
if (l.forall(_.isRight)) {
l.map(e => getRight(e)).right
} else {
l.filter(_.isLeft).map(e => getLeft(e)).left
}
}
Playing around I've implemented a recursive function for that, but the best option would be to use separate:
implicit class RichSequence[L, R](val l: List[\/[L, R]]) {
def sequence: \/[List[L], List[R]] = {
def seqLoop(left: List[L], right: List[R], list: List[\/[L, R]]): \/[List[L], List[R]] =
list match {
case (h :: t) =>
h match {
case -\/(e) => seqLoop(left :+ e, right, t)
case \/-(s) => seqLoop(left, right :+ s, t)
}
case Nil =>
if(left.isEmpty) \/-(right)
else -\/(left)
}
seqLoop(List(), List(), l)
}
def sequenceSeparate: \/[List[L], List[R]] = {
val (left, right) = l.separate[\/[L, R], L, R]
if(left.isEmpty) \/-(right)
else -\/(left)
}
}
The first one just collects results and at the end decide what to do with those, the second its basically the same with the exception that the recursive function is much simpler, I didn't think about performance here, I've used :+, if you care use prepend or some other collection.
You may also want to take a look at Validation and ValidationNEL which unlike Disjunction accumulate failures.
Given a sequence of elements and a predicate p, I would like to produce a sequence of sequences such that, in each subsequence, either all elements satisfy p or the sequence has length 1. Additionally, calling .flatten on the result should give me back my original sequence (so no re-ordering of elements).
For instance, given:
val l = List(2, 4, -6, 3, 1, 8, 7, 10, 0)
val p = (i : Int) => i % 2 == 0
I would like magic(l,p) to produce:
List(List(2, 4, -6), List(3), List(1), List(8), List(7), List(10, 0))
I know of .span, but that method stops the first time it encounters a value that doesn't satisfy p and just returns a pair.
Below is a candidate implementation. It does what I want, but, well, makes we want to cry. I would love for someone to come up with something slightly more idiomatic.
def magic[T](elems : Seq[T], p : T=>Boolean) : Seq[Seq[T]] = {
val loop = elems.foldLeft[(Boolean,Seq[Seq[T]])]((false,Seq.empty)) { (pr,e) =>
val (lastOK,s) = pr
if(lastOK && p(e)) {
(true, s.init :+ (s.last :+ e))
} else {
(p(e), s :+ Seq(e))
}
}
loop._2
}
(Note that I do not particularly care about preserving the actual type of the Seq.)
I would not use foldLeft. It's just a simple recursion of span with a special rule if the head doesn't match the predicate:
def magic[T](elems: Seq[T], p: T => Boolean): Seq[Seq[T]] =
elems match {
case Seq() => Seq()
case Seq(head, tail # _*) if !p(head) => Seq(head) +: magic(tail, p)
case xs =>
val (prefix, rest) = xs span p
prefix +: magic(rest, p)
}
You could also do it tail-recursive, but you need to remember to reverse the output if you're prepending (as is sensible):
def magic[T](elems: Seq[T], p: T => Boolean): Seq[Seq[T]] = {
def iter(elems: Seq[T], out: Seq[Seq[T]]) : Seq[Seq[T]] =
elems match {
case Seq() => out.reverse
case Seq(head, tail # _*) if !p(head) => iter(tail, Seq(head) +: out)
case xs =>
val (prefix, rest) = xs span p
iter(rest, prefix +: out)
}
iter(elems, Seq())
}
For this task you can use takeWhile and drop combined with a little pattern matching an recursion:
def magic[T](elems : Seq[T], p : T=>Boolean) : Seq[Seq[T]] = {
def magic(elems: Seq[T], result: Seq[Seq[T]]): Seq[Seq[T]] = elems.takeWhile(p) match {
// if elems is Nil, we have a result
case Nil if elems.isEmpty => result
// if it's not, but we don't get any values from takeWhile, we take a single elem
case Nil => magic(elems.tail, result :+ Seq(elems.head))
// takeWhile gave us something, so we add it to the result
// and drop as many elements from elems, as takeWhile gave us
case xs => magic(elems.drop(xs.size), result :+ xs)
}
magic(elems, Seq())
}
Another solution using a fold:
def magicFilter[T](seq: Seq[T], p: T => Boolean): Seq[Seq[T]] = {
val (filtered, current) = (seq foldLeft (Seq[Seq[T]](), Seq[T]())) {
case ((filtered, current), element) if p(element) => (filtered, current :+ element)
case ((filtered, current), element) if !current.isEmpty => (filtered :+ current :+ Seq(element), Seq())
case ((filtered, current), element) => (filtered :+ Seq(element), Seq())
}
if (!current.isEmpty) filtered :+ current else filtered
}
Is it possible to rewrite the following code
for (i <- x) {
if (i==x.first) {
// do sth
} else if (i==x.last) {
// do sth
} else {
// do sth
}
}
using pattern matching like
for (i <- x) i match {
case `x.first` => // do sth
case `x.last` => // do sth
case _ => // do sth
}
I know we can use guard, or evaluate x.first and x.last in advance and store them in other vals to quote here, but that's just ugly. Any ideas? Thanks!
One clean way to do it would be to define extractors +: and :+ for yourself:
object +: {
def unapply[CC, A, That](seq: CC)(implicit asSeq: CC => Seq[A], cbf: CanBuildFrom[CC, A, That]): Option[(A, That)] = {
if (seq.nonEmpty)
Some(seq.head, cbf(seq) ++= seq.tail result)
else
None
}
}
object :+ {
def unapply[CC, A, That](seq: CC)(implicit asSeq: CC => Seq[A], cbf: CanBuildFrom[CC, A, That]): Option[(That, A)] = {
if (seq.nonEmpty)
Some(cbf(seq) ++= seq.dropRight(1) result, seq.last)
else
None
}
}
Then you can simply do:
val x = Seq(1, 2, 3, 4)
val first +: middle :+ last = x
println("first is %s".format(first))
for (y <- middle)
println("middle contains %s".format(y))
println("last is %s".format(last))
Which prints:
first is 1
middle contains 2
middle contains 3
last is 4