Avoiding repetition using lenses whilst deep-copying into Map values - scala

I have an immutable data structure where I have nested values in Maps, like so:
case class TradingDay(syms: Map[String, SymDay] = Map.empty)
case class SymDay(sym: String, traders: Map[String, TraderSymDay] = Map.empty)
case class TraderSymDay(trader: String, sym: String, trades: List[Trade] = Nil)
Separately I have a list of all trades over the day, and I want to generate the TradingDay structure, where
case class Trade(sym: String, trader: String, qty: Int)
I am trying to figure out how I would update this structure with lenses (see appendix) by folding through my trades:
(TradingDay() /: trades) { (trd, d) =>
def sym = trd.sym
def trader = trd.trader
import TradingDay._
import SymDay._
import TraderSymDay._
val mod =
for {
_ <- (Syms member sym).mods(
_ orElse some(SymDay(sym)))
_ <- (Syms at sym andThen Traders member trader).mods(
_ orElse some(TraderSymDay(trader, sym)))
_ <- (Syms at sym andThen (Traders at trader) andThen Trades).mods(
trd :: _)
x <- init
} yield x
mod ! d
}
This works; but I'm wondering whether I could be less repetitive (in terms of adding to a map and then modifying the value at the key of a map. It doesn't seem that much less annoying than the associated deep-copy.
Appendix - the lenses
object TradingDay {
val Syms = Lens[TradingDay, Map[String, SymDay]](_.syms, (d, s) => d.copy(syms = s))
}
object SymDay {
val Traders = Lens[SymDay, Map[String, TraderSymDay]](_.traders, (d, t) => d.copy(traders = t))
}
object TraderSymDay {
val Trades = Lens[TraderSymDay, List[Trade]](_.trades, (d, f) => d.copy(trades = f))
}

with
type #>[A,B] = Lens[A, B]
and by keeping this lens
val Syms : Lens[TradingDay, Map[String, SymDay]]
and defining those lenses:
val F : Map[String, SymDay] #> Option[SymDay] = ...
val G : Option[SymDay] #> Map[String, TraderSymDay] = ...
val H : Map[String, TraderSymDay] #> Option[TraderSymDay] = ...
val I : Option[TraderSymDay] #> List[Trade] = ...
val J: TradingDay #> List[Trade] = Syms >=> F >=> G >=> H >=> I
you could get this:
(trades /: TradingDay()){ (trd, d) => (J.map(trd :: _).flatMap(_ => init)) ! d }

Answer provided by Jordan West (#_jrwest)
It's only a slight change and involves introducing the following conversion:
implicit def myMapLens[S,K,V] = MyMapLens[S,K,V](_)
case class MyMapLens[S,K,V](lens: Lens[S,Map[K,V]]) {
def putIfAbsent(k: K, v: => V)
= lens.mods(m => m get k map (_ => m) getOrElse (m + (k -> v)))
}
Then we can use this as follows:
(TradingDay() /: trades) { (d, trade) =>
def sym = trade.sym
def trader = trade.trader
def traders = Syms at sym andThen Traders
def trades = Syms at sym andThen (Traders at trader) andThen Trades
val upd =
for {
_ <- Syms putIfAbsent (sym, SymDay(sym))
_ <- traders putIfAbsent (trader, TraderSymDay(trader, sym))
_ <- trades.mods(trade :: _)
} yield ()
upd ~> d
}

Related

Faster implementation for reduceByKey on Seq of pairs possible?

The code below contains various single-threaded implementations of reduceByKeyXXX methods and a few helper methods to create input sets and measure execution times. (Feel free to run the main-method)
The main purpose of reduceByKey (as in Spark) is to reduce key-value-pairs with the same key. Example:
scala> val xs = Seq( "a" -> 2, "b" -> 3, "a" -> 5)
xs: Seq[(String, Int)] = List((a,2), (b,3), (a,5))
scala> ReduceByKeyComparison.reduceByKey(xs, (x:Int, y:Int) ⇒ x+y )
res8: Seq[(String, Int)] = ArrayBuffer((b,3), (a,7))
Code
import java.util.HashMap
object Util {
def measure( body : => Unit ) : Long = {
val now = System.currentTimeMillis
body
val nowAfter = System.currentTimeMillis
nowAfter - now
}
def measureMultiple( body: => Unit, n: Int) : String = {
val executionTimes = (1 to n).toList.map( x => {
print(".")
measure(body)
} )
val avg = executionTimes.sum / executionTimes.size
executionTimes.mkString("", "ms, ", "ms") + s" Average: ${avg}ms."
}
}
object RandomUtil {
val AB = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
val r = new java.util.Random();
def randomString( len: Int ) : String = {
val sb = new StringBuilder( len );
for( i <- 0 to len-1 ) {
sb.append(AB.charAt(r.nextInt(AB.length())));
}
sb.toString();
}
def generateSeq(n: Int) : Seq[(String, Int)] = {
Seq.fill(n)( (randomString(2), r.nextInt(100)) )
}
}
object ReduceByKeyComparison {
def main(args: Array[String]) : Unit = {
implicit def iterableToPairedIterable[K, V](x: Iterable[(K, V)]) = { new PairedIterable(x) }
val runs = 10
val problemSize = 2000000
val ss = RandomUtil.generateSeq(problemSize)
println("ReduceByKey : " + Util.measureMultiple( reduceByKey(ss, (x:Int, y:Int) ⇒ x+y ), runs ))
println("ReduceByKey2: " + Util.measureMultiple( reduceByKey2(ss, (x:Int, y:Int) ⇒ x+y ), runs ))
println("ReduceByKey3: " + Util.measureMultiple( reduceByKey3(ss, (x:Int, y:Int) ⇒ x+y ), runs ))
println("ReduceByKeyPaired: " + Util.measureMultiple( ss.reduceByKey( (x:Int, y:Int) ⇒ x+y ), runs ))
println("ReduceByKeyA: " + Util.measureMultiple( reduceByKeyA( ss, (x:Int, y:Int) ⇒ x+y ), runs ))
}
// =============================================================================
// Different implementations
// =============================================================================
def reduceByKey[A,B]( s: Seq[(A,B)], fnc: (B, B) ⇒ B) : Seq[(A,B)] = {
val t = s.groupBy(x => x._1)
val u = t.map { case (k,v) => (k, v.map(_._2).reduce(fnc))}
u.toSeq
}
def reduceByKey2[A,B]( s: Seq[(A,B)], fnc: (B, B) ⇒ B) : Seq[(A,B)] = {
val r = s.foldLeft( Map[A,B]() ){ (m,a) ⇒
val k = a._1
val v = a._2
m.get(k) match {
case Some(pv) ⇒ m + ((k, fnc(pv, v)))
case None ⇒ m + ((k, v))
}
}
r.toSeq
}
def reduceByKey3[A,B]( s: Seq[(A,B)], fnc: (B, B) ⇒ B) : Seq[(A,B)] = {
var m = scala.collection.mutable.Map[A,B]()
s.foreach{ e ⇒
val k = e._1
val v = e._2
m.get(k) match {
case Some(pv) ⇒ m(k) = fnc(pv, v)
case None ⇒ m(k) = v
}
}
m.toSeq
}
/**
* Method code from [[http://ideone.com/dyrkYM]]
* All rights to Muhammad-Ali A'rabi according to [[https://issues.scala-lang.org/browse/SI-9064]]
*/
def reduceByKeyA[A,B]( s: Seq[(A,B)], fnc: (B, B) ⇒ B): Map[A, B] = {
s.groupBy(_._1).map(l => (l._1, l._2.map(_._2).reduce( fnc )))
}
/**
* Method code from [[http://ideone.com/dyrkYM]]
* All rights to Muhammad-Ali A'rabi according to [[https://issues.scala-lang.org/browse/SI-9064]]
*/
class PairedIterable[K, V](x: Iterable[(K, V)]) {
def reduceByKey(func: (V,V) => V) = {
val map = new HashMap[K, V]
x.foreach { pair =>
val old = map.get(pair._1)
map.put(pair._1, if (old == null) pair._2 else func(old, pair._2))
}
map
}
}
}
yielding the following results on my machine
..........ReduceByKey : 723ms, 782ms, 761ms, 617ms, 640ms, 707ms, 634ms, 611ms, 380ms, 458ms Average: 631ms.
..........ReduceByKey2: 580ms, 458ms, 452ms, 463ms, 462ms, 470ms, 463ms, 465ms, 458ms, 462ms Average: 473ms.
..........ReduceByKey3: 489ms, 466ms, 461ms, 468ms, 555ms, 474ms, 469ms, 457ms, 461ms, 468ms Average: 476ms.
..........ReduceByKeyPaired: 140ms, 124ms, 124ms, 120ms, 122ms, 124ms, 118ms, 126ms, 121ms, 119ms Average: 123ms.
..........ReduceByKeyA: 628ms, 694ms, 666ms, 656ms, 616ms, 660ms, 594ms, 659ms, 445ms, 399ms Average: 601ms.
and ReduceByKeyPaired currently being the fastest.
Question / Task
Is there a faster single-threaded (Scala) implementation?
Rewritting reduceByKey method of PairedIterable to recursion gives around 5-10% performance improvement.
That all i was able to get.
I've also tryed to increase initial capacity allocation for HashMap - but it does not show any significant changes.
class PairedIterable[K, V](x: Iterable[(K, V)]) {
def reduceByKey(func: (V,V) => V) = {
val map = new HashMap[K, V]()
#tailrec
def reduce(it: Iterable[(K, V)]): HashMap[K, V] = {
it match {
case Nil => map
case (k, v) :: tail =>
val old = map.get(k)
map.put(k, if (old == null) v else func(old, v))
reduce(tail)
}
}
val r = reduce(x)
r
}
}
In general, making some comparison analysis of provided methods - they can be splitted onto two categories.
First set of reduces are with sorting (grouping) - as we can see those methods add extra O(n*log[n]) complexity and are not effective for this scenario.
Seconds are with linear looping across all enries of Iterable. Those set of methods has extra get/put operations to temp map. But those gets/puts are not so time consuming - O(n)*O(c).
Moreover necessity to work with Options in scala collections makes it less effective.

Apply several transformation functions to string

Suppose I have 2 methods:
def a(s: String) = s + "..."
def b(s: String) = s + ",,,"
And I want to create 3rd method which will call both methods:
def c (s: String) = a(b(s))
How I can do it in idiomatic Scala way?
I think it's better to aggregate this functions into some List and then sequentially apply them:
List(a_, b_)
I think it's better to aggregate this functions into some List and
then sequentially apply them.
You get some help by specifying an expected type:
scala> val fs: List[String => String] = List(a,b)
fs: List[String => String] = List(<function1>, <function1>)
scala> fs.foldLeft("something")((s,f) => f(s))
res0: String = something...,,,
Here is how you can combine a set of functions into one:
// a() and b() are as defined in the question
// the following is equivalent to newfunc(x) = b(a(x))
val newFunc: String => String = List( a _, b _).reduce( _ andThen _ )
You can even create a generic function to combine them:
def functionChaining[A]( functions: A => A *): A => A = functions.reduce( _ andThen _ )
or using foldLeft:
def functionChaining[A]( functions: A => A *): A => A = functions.foldLeft( (x:A) => x )( _ andThen _ )
Here is an example of how to use this on the REPL:
scala> val newFunc: String => String = functionChaining( (x:String) => x + "---", (x:String) => x * 4)
scala> newFunc("|")
res12: String = |---|---|---|---
Many answers use andThen, but that will be give you
b(a(s))
Given that you want
a(b(s))
compose is the way to go (well, that or reversing the list, but what's the point?)
def c(s: String) = List[String => String](a, b).reduce(_ compose _)(s)
// or alternatively
def c(s: String) = List(a _, b _).reduce(_ compose _)(s)
As a result
c("foo") // foo,,,...
Now, speaking of what's idiomatic, I believe that
a(b(s))
is more idiomatic and readable than
List(a _, b _).reduce(_ compose _)(s)
This clearly depends on the number of functions you're composing. If you were to have
a(b(c(d(e(f(g(h(s))))))))
then
List[String => String](a, b, c, d, e, f, g, h).reduce(_ compose _)(s)
is probably neater and more idiomatic as well.
If you really think you need to do this:
val c = a _ andThen b
// (The signature is:)
val c:(String)=>String = a _ andThen b
or, more obviously:
def d(s:String) = a _ andThen b
If chained application is preferred then the below works. Caveats - Implicit syntax is a bit ugly; This being a structural type uses reflection.
object string {
implicit def aPimp(s: String) = new {
def a = "(a- " + s + " -a)"
}
implicit def bPimp(s: String) = new {
def b = "(b- " + s + " -b)"
}
}
scala> import string._
scala> "xyz".a.b
res0: String = (b- (a- xyz -a) -b)
scala> "xyz".b.a
res1: String = (a- (b- xyz -b) -a)
In my opinion, if not for the ugly syntax, this would be idiomatic scala.

Sequence with Streams in Scala

Suppose there is a sequence a[i] = f(a[i-1], a[i-2], ... a[i-k]). How would you code it using streams in Scala?
It will be possible to generalize it for any k, using an array for a and another k parameter, and having, f.i., the function with a rest... parameter.
def next(a1:Any, ..., ak:Any, f: (Any, ..., Any) => Any):Stream[Any] {
val n = f(a1, ..., ak)
Stream.cons(n, next(a2, ..., n, f))
}
val myStream = next(init1, ..., initk)
in order to have the 1000th do next.drop(1000)
An Update to show how this could be done with varargs. Beware that there is no arity check for the passed function:
object Test extends App {
def next(a:Seq[Long], f: (Long*) => Long): Stream[Long] = {
val v = f(a: _*)
Stream.cons(v, next(a.tail ++ Array(v), f))
}
def init(firsts:Seq[Long], rest:Seq[Long], f: (Long*) => Long):Stream[Long] = {
rest match {
case Nil => next(firsts, f)
case x :: xs => Stream.cons(x,init(firsts, xs, f))
}
}
def sum(a:Long*):Long = {
a.sum
}
val myStream = init(Seq[Long](1,1,1), Seq[Long](1,1,1), sum)
myStream.take(12).foreach(println)
}
Is this OK?
(a[i] = f(a[i-k], a[i-k+1], ... a[i-1]) instead of a[i] = f(a[i-1], a[i-2], ... a[i-k]), since I prefer to this way)
/**
Generating a Stream[T] by the given first k items and a function map k items to the next one.
*/
def getStream[T](f : T => Any,a : T*): Stream[T] = {
def invoke[T](fun: T => Any, es: T*): T = {
if(es.size == 1) fun.asInstanceOf[T=>T].apply(es.head)
else invoke(fun(es.head).asInstanceOf[T => Any],es.tail :_*)
}
Stream.iterate(a){ es => es.tail :+ invoke(f,es: _*)}.map{ _.head }
}
For example, the following code to generate Fibonacci sequence.
scala> val fn = (x: Int, y: Int) => x+y
fn: (Int, Int) => Int = <function2>
scala> val fib = getStream(fn.curried,1,1)
fib: Stream[Int] = Stream(1, ?)
scala> fib.take(10).toList
res0: List[Int] = List(1, 1, 2, 3, 5, 8, 13, 21, 34, 55)
The following code can generate a sequence {an} where a1 = 1, a2 = 2, a3 = 3, a(n+3) = a(n) + 2a(n+1) + 3a(n+2).
scala> val gn = (x: Int, y: Int, z: Int) => x + 2*y + 3*z
gn: (Int, Int, Int) => Int = <function3>
scala> val seq = getStream(gn.curried,1,2,3)
seq: Stream[Int] = Stream(1, ?)
scala> seq.take(10).toList
res1: List[Int] = List(1, 2, 3, 14, 50, 181, 657, 2383, 8644, 31355)
The short answer, that you are probably looking for, is a pattern to define your Stream once you have fixed a chosen k for the arity of f (i.e. you have a fixed type for f). The following pattern gives you a Stream which n-th element is the term a[n] of your sequence:
def recStreamK [A](f : A ⇒ A ⇒ ... A) (x1:A) ... (xk:A):Stream[A] =
x1 #:: recStreamK (f) (x2)(x3) ... (xk) (f(x1)(x2) ... (xk))
(credit : it is very close to the answer of andy petrella, except that the initial elements are set up correctly, and consequently the rank in the Stream matches that in the sequence)
If you want to generalize over k, this is possible in a type-safe manner (with arity checking) in Scala, using prioritized overlapping implicits. The code (˜80 lines) is available as a gist here. I'm afraid I got a little carried away, and explained it as an detailed & overlong blog post there.
Unfortunately, we cannot generalize over number and be type safe at the same time. So we’ll have to do it all manually:
def seq2[T, U](initials: Tuple2[T, T]) = new {
def apply(fun: Function2[T, T, T]): Stream[T] = {
initials._1 #::
initials._2 #::
(apply(fun) zip apply(fun).tail).map {
case (a, b) => fun(a, b)
}
}
}
And we get def fibonacci = seq2((1, 1))(_ + _).
def seq3[T, U](initials: Tuple3[T, T, T]) = new {
def apply(fun: Function3[T, T, T, T]): Stream[T] = {
initials._1 #::
initials._2 #::
initials._3 #::
(apply(fun) zip apply(fun).tail zip apply(fun).tail.tail).map {
case ((a, b), c) => fun(a, b, c)
}
}
}
def tribonacci = seq3((1, 1, 1))(_ + _ + _)
… and up to 22.
I hope the pattern is getting clear somehow. (We could of course improve and exchange the initials tuple with separate arguments. This saves us a pair of parentheses later when we use it.) If some day in the future, the Scala macro language arrives, this hopefully will be easier to define.

Replacing imperative PriorityQueue in my algorithm

I currently have a method that uses a scala.collection.mutable.PriorityQueue to combine elements in a certain order. For instance the code looks a bit like this:
def process[A : Ordering](as: Set[A], f: (A, A) => A): A = {
val queue = new scala.collection.mutable.PriorityQueue[A]() ++ as
while (queue.size > 1) {
val a1 = queue.dequeue
val a2 = queue.dequeue
queue.enqueue(f(a1, a2))
}
queue.dequeue
}
The code works as written, but is necessarily pretty imperative. I thought of using a SortedSet instead of the PriorityQueue, but my attempts make the process look a lot messier. What is a more declarative, succinct way of doing what I want to do?
If f doesn't produce elements that are already in the Set, you can indeed use a SortedSet. (If it does, you need an immutable priority queue.) A declarative wayto do this would be:
def process[A:Ordering](s:SortedSet[A], f:(A,A)=>A):A = {
if (s.size == 1) s.head else {
val fst::snd::Nil = s.take(2).toList
val newSet = s - fst - snd + f(fst, snd)
process(newSet, f)
}
}
Tried to improve #Kim Stebel's answer, but I think imperative variant is still more clear.
def process[A:Ordering](s: Set[A], f: (A, A) => A): A = {
val ord = implicitly[Ordering[A]]
#tailrec
def loop(lst: List[A]): A = lst match {
case result :: Nil => result
case fst :: snd :: rest =>
val insert = f(fst, snd)
val (more, less) = rest.span(ord.gt(_, insert))
loop(more ::: insert :: less)
}
loop(s.toList.sorted(ord.reverse))
}
Here's a solution with SortedSet and Stream:
def process[A : Ordering](as: Set[A], f: (A, A) => A): A = {
Stream.iterate(SortedSet.empty ++ as)( ss =>
ss.drop(2) + f(ss.head, ss.tail.head))
.takeWhile(_.size > 1).last.head
}

Value assignment inside for-loop in Scala

Is there any difference between this code:
for(term <- term_array) {
val list = hashmap.get(term)
...
}
and:
for(term <- term_array; val list = hashmap.get(term)) {
...
}
Inside the loop I'm changing the hashmap with something like this
hashmap.put(term, string :: list)
While checking for the head of list it seems to be outdated somehow when using the second code snippet.
The difference between the two is, that the first one is a definition which is created by pattern matching and the second one is a value inside a function literal. See Programming in Scala, Section 23.1 For Expressions:
for {
p <- persons // a generator
n = p.name // a definition
if (n startsWith "To") // a filter
} yield n
You see the real difference when you compile sources with scalac -Xprint:typer <filename>.scala:
object X {
val x1 = for (i <- (1 to 5); x = i*2) yield x
val x2 = for (i <- (1 to 5)) yield { val x = i*2; x }
}
After code transforming by the compiler you will get something like this:
private[this] val x1: scala.collection.immutable.IndexedSeq[Int] =
scala.this.Predef.intWrapper(1).to(5).map[(Int, Int), scala.collection.immutable.IndexedSeq[(Int, Int)]](((i: Int) => {
val x: Int = i.*(2);
scala.Tuple2.apply[Int, Int](i, x)
}))(immutable.this.IndexedSeq.canBuildFrom[(Int, Int)]).map[Int, scala.collection.immutable.IndexedSeq[Int]]((
(x$1: (Int, Int)) => (x$1: (Int, Int) #unchecked) match {
case (_1: Int, _2: Int)(Int, Int)((i # _), (x # _)) => x
}))(immutable.this.IndexedSeq.canBuildFrom[Int]);
private[this] val x2: scala.collection.immutable.IndexedSeq[Int] =
scala.this.Predef.intWrapper(1).to(5).map[Int, scala.collection.immutable.IndexedSeq[Int]](((i: Int) => {
val x: Int = i.*(2);
x
}))(immutable.this.IndexedSeq.canBuildFrom[Int]);
This can be simplified to:
val x1 = (1 to 5).map {i =>
val x: Int = i * 2
(i, x)
}.map {
case (i, x) => x
}
val x2 = (1 to 5).map {i =>
val x = i * 2
x
}
Instantiating variables inside for loops makes sense if you want to use that variable the for statement, like:
for (i <- is; a = something; if (a)) {
...
}
And the reason why your list is outdated, is that this translates to a foreach call, such as:
term_array.foreach {
term => val list= hashmap.get(term)
} foreach {
...
}
So when you reach ..., your hashmap has already been changed. The other example translates to:
term_array.foreach {
term => val list= hashmap.get(term)
...
}