Related
I often find myself needing to chain collects where I want to do multiple collects in a single traversal. I also would like to return a "remainder" for things that don't match any of the collects.
For example:
sealed trait Animal
case class Cat(name: String) extends Animal
case class Dog(name: String, age: Int) extends Animal
val animals: List[Animal] =
List(Cat("Bob"), Dog("Spot", 3), Cat("Sally"), Dog("Jim", 11))
// Normal way
val cats: List[Cat] = animals.collect { case c: Cat => c }
val dogAges: List[Int] = animals.collect { case Dog(_, age) => age }
val rem: List[Animal] = Nil // No easy way to create this without repeated code
This really isn't great, it requires multiple iterations and there is no reasonable way to calculate the remainder. I could write a very complicated fold to pull this off, but it would be really nasty.
Instead, I usually opt for mutation which is fairly similar to the logic you would have in a fold:
import scala.collection.mutable.ListBuffer
// Ugly, hide the mutation away
val (cats2, dogsAges2, rem2) = {
// Lose some benefits of type inference
val cs = ListBuffer[Cat]()
val da = ListBuffer[Int]()
val rem = ListBuffer[Animal]()
// Bad separation of concerns, I have to merge all of my functions
animals.foreach {
case c: Cat => cs += c
case Dog(_, age) => da += age
case other => rem += other
}
(cs.toList, da.toList, rem.toList)
}
I don't like this one bit, it has worse type inference and separation of concerns since I have to merge all of the various partial functions. It also requires lots of lines of code.
What I want, are some useful patterns, like a collect that returns the remainder (I grant that partitionMap new in 2.13 does this, but uglier). I also could use some form of pipe or map for operating on parts of tuples. Here are some made up utilities:
implicit class ListSyntax[A](xs: List[A]) {
import scala.collection.mutable.ListBuffer
// Collect and return remainder
// A specialized form of new 2.13 partitionMap
def collectR[B](pf: PartialFunction[A, B]): (List[B], List[A]) = {
val rem = new ListBuffer[A]()
val res = new ListBuffer[B]()
val f = pf.lift
for (elt <- xs) {
f(elt) match {
case Some(r) => res += r
case None => rem += elt
}
}
(res.toList, rem.toList)
}
}
implicit class Tuple2Syntax[A, B](x: Tuple2[A, B]){
def chainR[C](f: B => C): Tuple2[A, C] = x.copy(_2 = f(x._2))
}
Now, I can write this in a way that could be done in a single traversal (with a lazy datastructure) and yet follows functional, immutable practice:
// Relatively pretty, can imagine lazy forms using a single iteration
val (cats3, (dogAges3, rem3)) =
animals.collectR { case c: Cat => c }
.chainR(_.collectR { case Dog(_, age) => age })
My question is, are there patterns like this? It smells like the type of thing that would be in a library like Cats, FS2, or ZIO, but I am not sure what it might be called.
Scastie link of code examples: https://scastie.scala-lang.org/Egz78fnGR6KyqlUTNTv9DQ
I wanted to see just how "nasty" a fold() would be.
val (cats
,dogAges
,rem) = animals.foldRight((List.empty[Cat]
,List.empty[Int]
,List.empty[Animal])) {
case (c:Cat, (cs,ds,rs)) => (c::cs, ds, rs)
case (Dog(_,d),(cs,ds,rs)) => (cs, d::ds, rs)
case (r, (cs,ds,rs)) => (cs, ds, r::rs)
}
Eye of the beholder I suppose.
How about defining a couple utility classes to help you with this?
case class ListCollect[A](list: List[A]) {
def partialCollect[B](f: PartialFunction[A, B]): ChainCollect[List[B], A] = {
val (cs, rem) = list.partition(f.isDefinedAt)
new ChainCollect((cs.map(f), rem))
}
}
case class ChainCollect[A, B](tuple: (A, List[B])) {
def partialCollect[C](f: PartialFunction[B, C]): ChainCollect[(A, List[C]), B] = {
val (cs, rem) = tuple._2.partition(f.isDefinedAt)
ChainCollect(((tuple._1, cs.map(f)), rem))
}
}
ListCollect is just meant to start the chain, and ChainCollect takes the previous remainder (the second element of the tuple) and tries to apply a PartialFunction to it, creating a new ChainCollect object. I'm not particularly fond of the nested tuples this produces, but you may be able to make it look a bit better if you use Shapeless's HLists.
val ((cats, dogs), rem) = ListCollect(animals)
.partialCollect { case c: Cat => c }
.partialCollect { case Dog(_, age) => age }
.tuple
Scastie
Dotty's *: type makes this a bit easier:
opaque type ChainResult[Prev <: Tuple, Rem] = (Prev, List[Rem])
extension [P <: Tuple, R, N](chainRes: ChainResult[P, R]) {
def partialCollect(f: PartialFunction[R, N]): ChainResult[List[N] *: P, R] = {
val (cs, rem) = chainRes._2.partition(f.isDefinedAt)
(cs.map(f) *: chainRes._1, rem)
}
}
This does end up in the output being reversed, but it doesn't have that ugly nesting from my previous approach:
val ((owls, dogs, cats), rem) = (EmptyTuple, animals)
.partialCollect { case c: Cat => c }
.partialCollect { case Dog(_, age) => age }
.partialCollect { case Owl(wisdom) => wisdom }
/* more animals */
case class Owl(wisdom: Double) extends Animal
case class Fly(isAnimal: Boolean) extends Animal
val animals: List[Animal] =
List(Cat("Bob"), Dog("Spot", 3), Cat("Sally"), Dog("Jim", 11), Owl(200), Fly(false))
Scastie
And if you still don't like that, you can always define a few more helper methods to reverse the tuple, add the extension on a List without requiring an EmptyTuple to begin with, etc.
//Add this to the ChainResult extension
def end: Reverse[List[R] *: P] = {
def revHelp[A <: Tuple, R <: Tuple](acc: A, rest: R): RevHelp[A, R] =
rest match {
case EmptyTuple => acc.asInstanceOf[RevHelp[A, R]]
case h *: t => revHelp(h *: acc, t).asInstanceOf[RevHelp[A, R]]
}
revHelp(EmptyTuple, chainRes._2 *: chainRes._1)
}
//Helpful types for safety
type Reverse[T <: Tuple] = RevHelp[EmptyTuple, T]
type RevHelp[A <: Tuple, R <: Tuple] <: Tuple = R match {
case EmptyTuple => A
case h *: t => RevHelp[h *: A, t]
}
And now you can do this:
val (cats, dogs, owls, rem) = (EmptyTuple, animals)
.partialCollect { case c: Cat => c }
.partialCollect { case Dog(_, age) => age }
.partialCollect { case Owl(wisdom) => wisdom }
.end
Scastie
Since you mentioned cats, I would also add solution using foldMap:
sealed trait Animal
case class Cat(name: String) extends Animal
case class Dog(name: String) extends Animal
case class Snake(name: String) extends Animal
val animals: List[Animal] = List(Cat("Bob"), Dog("Spot"), Cat("Sally"), Dog("Jim"), Snake("Billy"))
val map = animals.foldMap{ //Map(other -> List(Snake(Billy)), cats -> List(Cat(Bob), Cat(Sally)), dogs -> List(Dog(Spot), Dog(Jim)))
case d: Dog => Map("dogs" -> List(d))
case c: Cat => Map("cats" -> List(c))
case o => Map("other" -> List(o))
}
val tuples = animals.foldMap{ //(List(Dog(Spot), Dog(Jim)),List(Cat(Bob), Cat(Sally)),List(Snake(Billy)))
case d: Dog => (List(d), Nil, Nil)
case c: Cat => (Nil, List(c), Nil)
case o => (Nil, Nil, List(o))
}
Arguably it's more succinct than fold version, but it has to combine partial results using monoids, so it won't be as performant.
This code is dividing a list into three sets, so the natural way to do this is to use partition twice:
val (cats, notCat) = animals.partitionMap{
case c: Cat => Left(c)
case x => Right(x)
}
val (dogAges, rem) = notCat.partitionMap {
case Dog(_, age) => Left(age)
case x => Right(x)
}
A helper method can simplify this
def partitionCollect[T, U](list: List[T])(pf: PartialFunction[T, U]): (List[U], List[T]) =
list.partitionMap {
case t if pf.isDefinedAt(t) => Left(pf(t))
case x => Right(x)
}
val (cats, notCat) = partitionCollect(animals) { case c: Cat => c }
val (dogAges, rem) = partitionCollect(notCat) { case Dog(_, age) => age }
This is clearly extensible to more categories, with the slight irritation of having to invent temporary variable names (which could be overcome by explicit n-way partition methods)
Let's say I have two case classes:
case class C1(a: Int, b: String)
case class C2(a: Int, b: String, c: Long = 0)
I want to convert instance of C1 to C2 and then set additional field c. I found out the following solution:
C1.unapply(C1(1, "s1")).map(v => C2(v._1, v._2, 7l))
But specifying parameters one by one is not applicable, because real case class will have at least 15 params. Any ideas how to solve it?
This solution can be done by doing something like this thread.
How to append or prepend an element to a tuple in Scala
implicit class TupOps2[A, B](val x: (A, B)) extends AnyVal {
def :+[C](y: C) = (x._1, x._2, y)
def +:[C](y: C) = (y, x._1, x._2)
}
Usage:
val c1 = new C1(1, "s1")
val longVal = 0L
scala> C1.unapply(c1).map(r => (r :+ longVal))
res0: Option[(Int, String, Long)] = Some((1,s1,0))
scala> C1.unapply(c1).map(r => (C2.apply _) tupled (r:+ longVal))
res45: Option[C2] = Some(C2(1,s1,0))
Hope it helps :)
I think what you need is closer to
https://github.com/scalalandio/chimney
case class MakeCoffee(id: Int, kind: String, addict: String)
case class CoffeeMade(id: Int, kind: String, forAddict: String, at: ZonedDateTime)
val command = MakeCoffee(id = Random.nextInt,
kind = "Espresso",
addict = "Piotr")
import io.scalaland.chimney.dsl._
val event = command.into[CoffeeMade]
.withFieldComputed(_.at, _ => ZonedDateTime.now)
.withFieldRenamed(_.addict, _.forAddict)
.transform
Let's consider a classification problem :
object Classify extends App {
type Tag = String
type Classifier[A] = A => Set[Tag]
case class Model(a: Int, b: String, c: String, d: String)
def aClassifier : Classifier[Int] = _ => Set("A", "a")
def bClassifier : Classifier[String] = _ => Set("B")
def cClassifier : Classifier[String] = _ => Set("C")
def modelClassifier : Classifier[Model] = {
m => aClassifier(m.a) ++ bClassifier(m.b) ++ cClassifier(m.c)
}
println(modelClassifier(Model(1,"b", "c", "d")))
}
Is there a smarter way to implement modelClassifier using scalaz ?
As an idea, consider this code:
for (i <- 0 until model.productArity) yield {
val fieldValue = model.productElement(i)
fieldValue match {
case x: Int => //use integer classifier
case s: String => //use string classifier
case _ =>
}
}
scalaz library hasn't any macro case class introspection by design, but shapeless has
Consider such definitions:
import shapeless._
import shapeless.tag._
import shapeless.labelled._
trait Omit
val omit = tag[Omit]
case class Model(a: Int, b: String, c: String, d: String ## Omit)
Let define following polymorphic function
object classifiers extends Poly1 {
implicit def stringClassifier[K <: Symbol](implicit witness: Witness.Aux[K]) =
at[FieldType[K, String]](value => Set(witness.value.name.toUpperCase))
implicit def intClassifier[K <: Symbol](implicit witness: Witness.Aux[K]) =
at[FieldType[K, Int]](value => {
val name = witness.value.name
Set(name.toUpperCase, name.toLowerCase)
})
implicit def omitClassifier[K, T] =
at[FieldType[K, T ## Omit]](_ => Set.empty[String])
}
Now your modelClassifier could be done as:
def modelClassifier: Classifier[Model] =
m => LabelledGeneric[Model].to(m).map(classifiers).toList.reduce(_ union _)
you can check it via
println(modelClassifier(Model(1, "b", "c", omit("d"))))
Note that Type ## Tag is subtype of Type so model.d still could be used as String everywhere
How do you intend to distinguish between bClassifier and cClassifier? By name? By order of declaration? That does not sound very "smart" or reliable. Consider encoding your intent explicitly instead. Something like this, perhaps:
case class Classifiable[T](data: T, classifier: Classifier[T])
object Classifiable {
def Null[T](data: T) = Classifiable(data, _ => Nil)
}
case class Model(a: Classifiable[Int], b: Classifiable[String], c: Classifiable[String], d: Classifiable[String])
object Model {
def apply(a: Int, b: String, c: String, d: String) =
Model(
Classifiable(a, aClassifier),
Classifiable(b, bClassifier),
Classifiable(c, cClassifier),
Classifiable.Null(d)
)
}
def modelClassifier(m: Model) = m
.productIterator
.collect { case x: Classifiable[_] =>
x.classifier()(x)
}
.reduce(_ ++ _)
println(modelClassifier(Model(1,"b", "c", "d")))
I need to sort the keys in an RDD, but there is no natural sorting order (not ascending or descending). I wouldn't even know how to write a Comparator to do it. Say I had a map of apples, pears, oranges, and grapes, I want to sort by oranges, apples, grapes, and pears.
Any ideas on how to do this in Spark/Scala? Thanks!
In Scala, you need to look for the Ordering[T] trait rather than the Comparator interface -- mostly a cosmetic difference so that the focus is on the attribute of the data rather than a thing which compares two instances of the data. Implementing the trait requires that the compare(T,T) method be defined. A very explicit version of the enumerated comparison could be:
object fruitOrdering extends Ordering[String] {
def compare(lhs: String, rhs: String): Int = (lhs, rhs) match {
case ("orange", "orange") => 0
case ("orange", _) => -1
case ("apple", "orange") => 1
case ("apple", "apple") => 0
case ("apple", _) => -1
case ("grape", "orange") => 1
case ("grape", "apple") => 1
case ("grape", "grape") => 0
case ("grape", _) => -1
case ("pear", "orange") => 1
case ("pear", "apple") => 1
case ("pear", "grape") => 1
case ("pear", "pear") => 0
case ("pear", _) => -1
case _ => 0
}
}
Or, to slightly adapt zero323's answer:
object fruitOrdering2 extends Ordering[String] {
private val values = Seq("orange", "apple", "grape", "pear")
// generate the map based off of indices so we don't have to worry about human error during updates
private val ordinalMap = values.zipWithIndex.toMap.withDefaultValue(Int.MaxValue)
def compare(lhs: String, rhs: String): Int = ordinalMap(lhs).compare(ordinalMap(rhs))
}
Now that you have an instance of Ordering[String], you need to inform the sortBy method use this ordering rather than the built-in one. If you look at the signature for RDD#sortBy you'll see the full signature is
def sortBy[K](f: (T) ⇒ K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]
That implicit Ordering[K] in the second parameter list is normally looked up by the compiler for pre-defined orderings -- that's how it knows what the natural ordering should be. Any implicit parameter, however, can be given an explicit value instead. Note that if you supply one implicit value then you need to supply all, so in this case we also need to provide the ClassTag[K]. That's always generated by the compiler but can be easily explicitly generated using scala.reflect.classTag.
Specifying all of that, the invocation would look like:
import scala.reflect.classTag
rdd.sortBy { case (key, _) => key }(fruitOrdering, classOf[String])
That's still pretty messy, though, isn't it? Luckily we can use implicit classes to take away a lot of the cruft. Here's a snippet that I use fairly commonly:
package com.example.spark
import scala.reflect.ClassTag
import org.apache.spark.rdd.RDD
package object implicits {
implicit class RichSortingRDD[A : ClassTag](underlying: RDD[A]) {
def sorted(implicit ord: Ordering[A]): RDD[A] =
underlying.sortBy(identity)(ord, implicitly[ClassTag[A]])
def sortWith(fn: (A, A) => Int): RDD[A] = {
val ord = new Ordering[A] { def compare(lhs: A, rhs: A): Int = fn(lhs, rhs) }
sorted(ord)
}
}
implicit class RichSortingPairRDD[K : ClassTag, V](underlying: RDD[(K, V)]) {
def sortByKey(implicit ord: Ordering[K]): RDD[(K, V)] =
underlying.sortBy { case (key, _) => key } (ord, implicitly[ClassTag[K]])
def sortByKeyWith(fn: (K, K) => Int): RDD[(K, V)] = {
val ord = new Ordering[K] { def compare(lhs: K, rhs: K): Int = fn(lhs, rhs) }
sortByKey(ord)
}
}
}
And in action:
import com.example.spark.implicits._
val rdd = sc.parallelize(Seq(("grape", 0.3), ("apple", 5.0), ("orange", 5.6)))
rdd.sortByKey(fruitOrdering).collect
// Array[(String, Double)] = Array((orange,5.6), (apple,5.0), (grape,0.3))
rdd.sortByKey.collect // Natural ordering by default
// Array[(String, Double)] = Array((apple,5.0), (grape,0.3), (orange,5.6))
rdd.sortWith(_._2 compare _._2).collect // sort by the value instead
// Array[(String, Double)] = Array((grape,0.3), (apple,5.0), (orange,5.6))
If the only way you can describe the order is enumeration then simply enumerate:
val order = Map("orange" -> 0L, "apple" -> 1L, "grape" -> 2L, "pear" -> 3L)
val rdd = sc.parallelize(Seq(("grape", 0.3), ("apple", 5.0), ("orange", 5.6)))
val sorted = rdd.sortBy{case (key, _) => order.getOrElse(key, Long.MaxValue)}
sorted.collect
// Array[(String, Double)] = Array((orange,5.6), (apple,5.0), (grape,0.3))
There is a sortBy method in Spark which allows you to define an arbitrary ordering and whether you want ascending or descending. E.g.
scala> val rdd = sc.parallelize(Seq ( ("a", 1), ("z", 7), ("p", 3), ("a", 13) ))
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[331] at parallelize at <console>:70
scala> rdd.sortBy( _._2, ascending = false) .collect.mkString("\n")
res34: String =
(a,13)
(z,7)
(p,3)
(a,1)
scala> rdd.sortBy( _._1, ascending = false) .collect.mkString("\n")
res35: String =
(z,7)
(p,3)
(a,1)
(a,13)
scala> rdd.sortBy
def sortBy[K](f: T => K, ascending: Boolean, numPartitions: Int)(implicit ord: scala.math.Ordering[K], ctag: scala.reflect.ClassTag[K]): RDD[T]
The last part tells you what the signature of sortBy is. The ordering used in previous examples is by the first and second part of the pair.
Edit: answered too quickly, without checking your question, sorry... Anyway, you would define your ordering like in your example:
def myord(fruit:String) = fruit match {
case "oranges" => 1 ;
case "apples" => 2;
case "grapes" =>3;
case "pears" => 4;
case _ => 5}
val rdd = sc.parallelize(Seq("apples", "oranges" , "pears", "grapes" , "other") )
Then, the result of ordering would be:
scala> rdd.sortBy[Int](myord, ascending = true).collect.mkString("\n")
res1: String =
oranges
apples
grapes
pears
other
I don't know about spark, but with pure Scala collections that would be
_.sortBy(_.fruitType)
For example,
val l: List[String] = List("the", "big", "bang")
val sortedByFirstLetter = l.sortBy(_.head)
// List(big, bang, the)
I have Scala code with some boilerplate, and I figure it's Scala, so I must be doing something wrong. I need some help figuring out how to remove the redundancies.
trait Number {
val x: Int
}
case class EvenNumber(x: Int) extends Number
object EvenNumber {
def unapply(s: String): Option[EvenNumber] = {
val x = s.toInt
if (x % 2 == 0) Some(EvenNumber(x))
else None
}
}
case class OddNumber(x: Int) extends Number
object OddNumber {
def unapply(s: String): Option[OddNumber] = {
val x = s.toInt
if (x % 2 == 1) Some(OddNumber(x))
else None
}
}
In this simple example there are even numbers and odd numbers which are subtypes of a general number type. Both even and odd numbers have extractors that allow them to be created from strings. This enables use cases like the following.
scala> "4" match {case EvenNumber(n) => n;case _ => None}
// returns EvenNumber(4)
scala> "5" match {case EvenNumber(n) => n;case _ => None}
// returns None
scala> "4" match {case OddNumber(n) => n;case _ => None}
// returns None
scala> "5" match {case OddNumber(n) => n;case _ => None}
// returns OddNumber(5)
The source code for the two extractors is identical except for the result of the x % 2 operation (0 or 1) and the extracted type (EvenNumber or OddNumber). I'd like to be able to write the source once and parameterize on these two values, but I can't figure out how. I've tried various type parameterizations to no avail.
The Stackoverflow question "How to use extractor in polymorphic unapply?" is related but different, because my implementing classes are not distinguished by the types they contain by rather by the string inputs they recognize.
Here is a revised version of the code incorporating comments I received in addition to the original post. (As is often the case, the first round of answers helped me figure out what my real question was.)
import scala.util.Try
trait Number {
val x: Int
}
object NumberParser {
def parse[N <: Number](s: String, remainder: Int, n: Int => N): Option[N] =
Try {s.toInt}.toOption.filter(_ % 2 == remainder).map(n(_))
}
case class EvenNumber(x: Int) extends Number
object EvenNumber {
def unapply(s: String): Option[EvenNumber] = NumberParser.parse(s, 0, EvenNumber(_))
}
case class OddNumber(x: Int) extends Number
object OddNumber {
def unapply(s: String): Option[OddNumber] = NumberParser.parse(s, 1, OddNumber(_))
}
Factoring out a static NumberParser.parse function is a reasonable solution. I would still like to have have syntactic sugar that obviated my repeating unapply lines in all of my case classes, since in a more complicated example that had more than two that could get ugly. Does anyone know of a way to do this?
More crucially, the use case I really want to support is the following.
scala> "5" match {case EvenNumber(n) =>n;case OddNumber(n) => n;case _ => None}
// returns OddNumber(5)
scala> "4" match {case EvenNumber(n) =>n;case OddNumber(n) => n;case _ => None}
// returns EvenNumber(4)
scala> "x" match {case EvenNumber(n) =>n;case OddNumber(n) => n;case _ => None}
// returns None
Again this is fine for two cases, but in a different application where there are more than two it can become unmanageable. I want to write a single case
s match {case Number(n) => n; case _ => None}
which returns OddNumber(5), EvenNumber(4), None as above.
I can't figure out how to write my Number supertype to support this. Is it possible in Scala?
Edit: Wrote a description of my final answer with additional commentary in "Runtime Polymorphism with Scala Extractors".
Why inherit?
object Mod2Number {
def parse[A <: Number](s: String, i: Int, n: Int => A) = {
val x = s.toInt
if (x % 2 == i) Some(n(x))
else None
}
}
case class EvenNumber(x: Int) extends Number
object EvenNumber {
def unapply(s: String) = Mod2Number.parse(s, 0, n => EvenNumber(n))
}
But if even that is too much noise you can go one step further:
trait Makes[A <: Number] {
def apply(i: Int): A
def mod: Int
def unapply(s: String): Option[A] = {
val x = s.toInt
if (x % 2 == mod) Some(apply(x))
else None
}
}
case class EvenNumber(x: Int) extends Number
object EvenNumber extends Makes[EvenNumber] { def mod = 0 }
I guess you should catch exceptions on toInt - exceptions in pattern matching is something strange.
object EvenNumber {
def unapply(s: String): Option[Int] = Try{s.toInt}.toOption.filter{_ % 2 == 0}
}
object OddNumber {
def unapply(s: String): Option[Int] = Try{s.toInt}.toOption.filter{_ % 2 == 1}
}
You could extract similar code, but I don't think it's useful here:
class IntFilter(f: Int => Boolean) {
def unapply(s: String): Option[Int] = Try{s.toInt}.toOption.filter(f)
}
object EvenNumber extend IntFilter(_ % 2 == 0)
object OddNumber extend IntFilter(_ % 2 == 1)
For edited question:
s match {case Number(n) => n; case _ => None}
You could create object Number like this:
object Number{
def unapply(s: String): Option[Number] = Try{s.toInt}.toOption.collect{
case i if i % 2 == 0 => EvenNumber(i)
case i if i % 2 == 1 => OddNumber(i)
}
}