Scala groupBy of a tuple to calculate stock basis - scala

I am working on an exercise to calculate stock basis given a list of stock purchases in the form of thruples (ticker, qty, stock_price). I've got it working, but would like to do the calculation part in more of a functional way. Anyone have an answer for this?
// input:
// List(("TSLA", 20, 200),
// ("TSLA", 20, 100),
// ("FB", 10, 100)
// output:
// List(("FB", (10, 100)),
// ("TSLA", (40, 150))))
def generateBasis(trades: Iterable[(String, Int, Int)]) = {
val basises = trades groupBy(_._1) map {
case (key, pairs) =>
val quantity = pairs.map(_._2).toList
val price = pairs.map(_._3).toList
var totalPrice: Int = 0
for (i <- quantity.indices) {
totalPrice += quantity(i) * price(i)
}
key -> (quantity.sum, totalPrice / quantity.sum)
}
basises
}

This looks like this might work for you. (updated)
def generateBasis(trades: Iterable[(String, Int, Int)]) =
trades.groupBy(_._1).mapValues {
_.foldLeft((0,0)){case ((tq,tp),(_,q,p)) => (tq + q, tp + q * p)}
}.map{case (k, (q,p)) => (k,q,p/q)} // turn Map into tuples (triples)

I came up with the solution below. Thanks everyone for their input. I'd love to hear if anyone had a more elegant solution.
// input:
// List(("TSLA", 20, 200),
// ("TSLA", 10, 100),
// ("FB", 5, 50)
// output:
// List(("FB", (5, 50)),
// ("TSLA", (30, 166)))
def generateBasis(trades: Iterable[(String, Int, Int)]) = {
val groupedTrades = (trades groupBy(_._1)) map {
case (key, pairs) =>
key -> (pairs.map(e => (e._2, e._3)))
} // List((FB,List((5,50))), (TSLA,List((20,200), (10,100))))
val costBasises = for {groupedTrade <- groupedTrades
tradeCost = for {tup <- groupedTrade._2 // (qty, cost)
} yield tup._1 * tup._2 // (trade_qty * trade_cost)
tradeQuantity = for { tup <- groupedTrade._2
} yield tup._1 // trade_qty
} yield (groupedTrade._1, tradeQuantity.sum, tradeCost.sum / tradeQuantity.sum )
costBasises.toList // List(("FB", (5, 50)),("TSLA", (30, 166)))
}

Related

How to find percentage from value of a (key,value) pair?

I want to find a percentage from the value of a key, value pair which is stored in the map.
For eg: Map('a'->10,'b'->20).I need to find percentage occurance of 'a' and 'b'
Adding to Thilo's answer, you can try this below code. The final result will again be a Map[String, Double].
val map = Map("a" -> 10.0, "b" -> 20.0)
val total = map.values.sum
val mapWithPerc = map.mapValues(x => (x * 100) / total)
println(mapWithPerc)
//prints Map(a -> 33.333333333333336, b -> 66.66666666666667)
def mapToPercentage(key: String)(implicit map: Map[String, Double]) = {
val valuesSum = map.values.sum
(map(key) * 100) / valuesSum
}
implicit val m: Map[String, Double] = Map("a" -> 10, "b" -> 20, "c" -> 30)
println(mapToPercentage("a")) // 16.666666666666668
println(mapToPercentage("b")) // 33.333333333333336
println(mapToPercentage("c")) // 50
See demo here
Note: there is absolutely no need to curry the function parameters or make the map implicit. I just think it looks nicer in this example. Something like def mapToPercentage(key: String, map: Map[String, Double]) = {...} and mapToPercentage("a", m) is also perfectly valid. That being said, if you want to get even fancier:
implicit class MapToPercentage (map: Map[String, Double]) {
def getPercentage(key: String) = {
val valuesSum = map.values.sum
(map(key) * 100) / valuesSum
}
}
val m: Map[String, Double] = Map("a" -> 10, "b" -> 20, "c" -> 30)
println(m.getPercentage("a")) // 16.666666666666668
println(m.getPercentage("b")) // 33.333333333333336
println(m.getPercentage("c")) // 50
See demo here
Point being, the logic behind getting the percentage can be written a few ways:
(map(key) * 100) / valuesSum // get the value corresponding to a given key,
// multiply by 100, divide by total sum or all values
// - will complain if key doesn't exist
(map.getOrElse(key, 0D) * 100) / valuesSum // more safe than above, will not fail
// if key doesn't exist
map.get(key).map(_ * 100 / valuesSum) // will return None if key doesn't exist
// and Some(result) if key does exist
val map = Map('a' -> 10, 'b' -> 20)
val total = map.values.sum
map.get('a').map(_ * 100 / total) // gives Some(33)

How to get the limit bound for a specific value from an array in Scala?

I have an Array
val bins = Array(0,100, 250, 500, 1000, 2000, 3000)
and here is my peice of code:
private var cumulativeDelay:Map[String ,Double] = linkIds.zip(freeFlowDelay).groupBy(_._1).mapValues(l => l.map(_._2).sum)
private var cumulativeCapacity:Map[String , Double] = linkIds.zip(linkCapacity).groupBy(_._1).mapValues(l => l.map(_._2).sum)
cumulativeCapacity foreach {
case(linkId , capacity) => {
val rangeToValue = bins.zip(bins.tail)
.collectFirst { case (left, right) if capacity >= left && capacity <= right =>
Map(s"$left-$right" -> cumulativeDelay.get(linkId))
}
.getOrElse(Map.empty[String, Double])
}
}
So value of rangeToValue is coming like Map(1000-2000 -> Some(625)) but I want rangeToValue:Map[String,Double] = (1000-2000 -> 625)
You should try something like this, but it doesn't work with values out of range:
val bins = Array(0, 100, 250, 500, 1000, 2000, 3000)
val effectiveValue = 625
val rangeToValue = bins.zip(bins.tail)
.collectFirst { case (left, right) if effectiveValue >= left && effectiveValue <= right =>
Map(s"$left-$right" -> effectiveValue)
}
.getOrElse(Map.empty[String, Int])
rangeToValue("500-1000")

Aggregating sum for RDD in Scala (Spark)

If I have a variable such as books: RDD[(String, Integer, Integer)], how do I want to merge keys with the same String (could represent title), and then sum the corresponding two integers (could represent pages and price).
ex:
[("book1", 20, 10),
("book2", 5, 10),
("book1", 100, 100)]
becomes
[("book1", 120, 110),
("book2", 5, 10)]
With an RDD you can use reduceByKey.
case class Book(name: String, i: Int, j: Int) {
def +(b: Book) = if(name == b.name) Book(name, i + b.i, j + b.j) else throw Exception
}
val rdd = sc.parallelize(Seq(
Book("book1", 20, 10),
Book("book2",5,10),
Book("book1",100,100)))
val aggRdd = rdd.map(book => (book.name, book))
.reduceByKey(_+_) // reduce calling our defined `+` function
.map(_._2) // we don't need the tuple anymore, just get the Books
aggRdd.foreach(println)
// Book(book1,120,110)
// Book(book2,5,10)
Just use Dataset:
val spark: SparkSession = SparkSession.builder.getOrCreate()
val rdd = spark.sparkContext.parallelize(Seq(
("book1", 20, 10), ("book2", 5, 10), ("book1", 100, 100)
))
spark.createDataFrame(rdd).groupBy("_1").sum().show()
// +-----+-------+-------+
// | _1|sum(_2)|sum(_3)|
// +-----+-------+-------+
// |book1| 120| 110|
// |book2| 5| 10|
// +-----+-------+-------+
Try converting it first to a key-tuple RDD and then performing a reduceByKey:
yourRDD.map(t => (t._1, (t._2, t._3)))
.reduceByKey((acc, elem) => (acc._1 + elem._1, acc._2 + elem._2))
Output:
(book2,(5,10))
(book1,(120,110))

Avoid duplicated computation in guards

Here is the code:
import java.util.{Calendar, Date, GregorianCalendar}
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.commons.conversions.scala._
case class Quota(date: Date, used: Int)
object MongoDateDemo extends App {
val client = InsertUsers.getClient
val db = client("github")
val quota = db("quota")
val rand = scala.util.Random
// quota.drop()
// (1 to 100).foreach { _ =>
// quota += DBObject("date" -> new Date(), "used" -> rand.nextInt(10))
// Thread.sleep(1000)
// }
val minuteInMilliseconds = 60 * 1000
def thresholdDate(minute: Int) = new Date(new Date() .getTime - minuteInMilliseconds * minute) // since a minute ago
val fields = DBObject("_id" -> 0, "used" -> 1)
val x = quota.find("date" $gte thresholdDate(28), fields).collect {
case x if x.getAs[Int]("used").isDefined => x.getAs[Int]("used").get
}
println(x.toList.sum)
// val y = x.map {
// case dbo: DBObject => Quota(dbo.getAs[Date]("date").getOrElse(new Date(0)), dbo.getAs[Int]("used").getOrElse(0))
// }
}
It's reading documents from a collection and filter out those that don't have "used" defined, then summing up the numbers.
The x.getAs[Int]("used") part is duplicated computation, how can I avoid it?
Not much of a Scala programmer, but isn't that what flatMap is for?
quota
.find("date" $gte thresholdDate(38), fields)
.flatMap(_.getAs[Int]("used").toList)
Since this is not possible to avoid, I had to do it in two steps, map into Options then collect. I used view method so that the collection is not traversed twice:
import java.util.{Calendar, Date, GregorianCalendar}
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.commons.conversions.scala._
case class Quota(date: Date, used: Int)
object MongoDateDemo extends App {
val client = InsertUsers.getClient
val db = client("github")
val quota = db("quota")
val rand = scala.util.Random
// quota.drop()
// (1 to 100).foreach { _ =>
// quota += DBObject("date" -> new Date(), "used" -> rand.nextInt(10))
// Thread.sleep(1000)
// }
val minuteInMilliseconds = 60 * 1000
def thresholdDate(minute: Int) = new Date(new Date() .getTime - minuteInMilliseconds * minute) // since a minute ago
val fields = DBObject("_id" -> 0, "used" -> 1)
val usedNumbers = quota.find("date" $gte thresholdDate(38), fields).toList.view.map {
_.getAs[Int]("used")
}.collect {
case Some(i) => i
}.force
println(usedNumbers.sum)
// val y = x.map {
// case dbo: DBObject => Quota(dbo.getAs[Date]("date").getOrElse(new Date(0)), dbo.getAs[Int]("used").getOrElse(0))
// }
}
Assuming getAs returns an Option, this should do what you want:
val x = quota.find("date" $gte thresholdDate(28), fields).flatMap { _.getAs[Int]("used") }
This is similar to doing this:
scala> List(Some(1), Some(2), None, Some(4)).flatMap(x => x)
res: List[Int] = List(1, 2, 4)
Or this:
scala> (1 to 20).flatMap(x => if(x%2 == 0) Some(x) else None)
res: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)

Making a scala function tail-recursive when using yield?

I've got a function that's working;
def iterate(start: Seq[TestObj], sources: Seq[TestObj]): Seq[TestSeq] = {
val currentTotal = start.foldLeft(TestObj(0, 0, 0))(_ + _)
(for (s: TestObj <- sources) yield {
val newTotal = currentTotal + s
if (newTotal < target) {
iterate(start :+ s, sources)
} else {
Seq(TestSeq(newTotal, target delta newTotal, (start :+ s).sortBy(e => (e.x, e.y, e.z))))
}
}).flatten
}
But I'd like to make this tail-recursive. I can not for the life of me figure out what the pattern is to do tail recursion while still maintaining my for/yield functionality.. Can one of the FP wizards around here point me in the right direction?
UPDATE: Thanks to the comments below, I've made some headway. My new test harness looks like the below. But it still bombs out.. Any hints to optimize are welcome, I'm pretty stumped..
case class TestObj(x: Int, y: Int, z: Int) {
def +(that: TestObj): TestObj = {
new TestObj(this.x + that.x, this.y + that.y, this.z + that.z)
}
def <(that: TestObj): Boolean = {
this.x < that.x && this.y < that.y && this.z < that.z
}
def delta(that: TestObj) = {
math.abs(this.x - that.x) + math.abs(this.y - that.y) + math.abs(this.z - that.z)
}
}
case class TestSeq(score: TestObj, delta: Int, sequence: Seq[TestObj])
val sources = Seq(
new TestObj(1, 2, 3), new TestObj(2, 3, 4), new TestObj(3, 4, 5),
new TestObj(2, 3, 4), new TestObj(3, 4, 5), new TestObj(4, 5, 6),
new TestObj(3, 4, 5), new TestObj(4, 5, 6), new TestObj(5, 6, 7))
val target = new TestObj(50, 60, 70)
def iterate(start: Seq[TestObj], maxItems: Int, sources: Seq[TestObj]): Set[TestSeq] = {
val currentTotal = start.foldLeft(TestObj(0, 0, 0))(_ + _)
if(maxItems == 0) {
return Set(TestSeq(currentTotal, target delta currentTotal, start.sortBy(f => (f.x,f.y,f.z))))
}
sources.flatMap(s => {
val newTotal = currentTotal + s
if (newTotal < target) {
iterate(start :+ s, maxItems - 1, sources)
} else {
Set(TestSeq(newTotal, target delta newTotal, start :+ s))
}
}).toSet
}
val possibleCombinations = iterate(Seq(new TestObj(0, 0, 0)), 10, sources)
val usableCombinations = possibleCombinations.map(c => TestSeq(c.score, c.delta, c.sequence.drop(1)))
println(usableCombinations)
usableCombinations.toSeq.sortBy(f => f.delta).foreach(uc => {
println(s"${uc.delta}: ${uc.sequence}")
})