How to combine two objects in a List by summing a member - scala

Given this case class:
case class Categories(fruit: String, amount: Double, mappedTo: String)
I have a list containing the following:
List(
Categories("Others",22.38394964594807,"Others"),
Categories("Others",77.6160503540519,"Others")
)
I want to combine two elements in the list by summing up their amount if they are in the same category, so that the end result in this case would be:
List(Categories("Others",99.99999999999997,"Others"))
How can I do that?

Since groupMapReduce was introduced in Scala 2.13, I'll try to provide another approch to Martinjn's great answer.
Assuming we have:
case class Categories(Fruit: String, amount: Double, mappedTo: String)
val categories = List(
Categories("Apple",22.38394964594807,"Others"),
Categories("Apple",77.6160503540519,"Others")
)
If you want to aggregate by both mappedTo and Fruit
val result = categories.groupBy(c => (c.Fruit, c.mappedTo)).map {
case ((fruit, mappedTo), categories) => Categories(fruit, categories.map(_.amount).sum, mappedTo)
}
Code run can be found at Scastie.
If you want to aggregate only by mappedTo, and choose a random Fruit, you can do:
val result = categories.groupBy(c => c.mappedTo).map {
case (mappedTo, categories) => Categories(categories.head.Fruit, categories.map(_.amount).sum, mappedTo)
}
Code run can be found at Scastie

You want to group your list entries by category, and reduce them to a single value. There is groupMapReduce for that, which groups entries, and then maps the group (you don't need this) and reduces the group to a single value.
given
case class Category(category: String, amount: Double)
if you have a val myList: List[Category], then you want to group on Category#category, and reduce them by merging the members, summing up the amount.
that gives
myList.groupMapReduce(_.category) //group
(identity) //map. We don't need to map, so we use the identity mapping
{
case (Category(name, amount1), Category(_, amount2)) =>
Category(name, amount1 + amount2) }
} //reduce, combine each elements by taking the name, and summing the amojunts
In theory just a groupReduce would have been enough, but that doesn't exist, so we're stuck with the identity here.

Related

How can I group by the individual elements of a list of elements in Scala

Forgive me if I'm not naming things by their actual name, I've just started to learn Scala. I've been looking around for a while, but can not find a clear answer to my question.
Suppose I have a list of objects, each object has two fields: x: Int and l: List[String], where the Strings, in my case, represent categories.
The l lists can be of arbitrary length, so an object can belong to multiple categories. Furthermore, various objects can belong to the same category. My goal is to group the objects by the individual categories, where the categories are the keys. This means that if an object is linked to say "N" categories, it will occur in "N" of the key-value pairs.
So far I managed to groupBy the lists of categories through:
objectList.groupBy(x => x.l)
However, this obviously groups the objects by list of categories rather than by categories.
I'm trying to do this with immutable collections avoiding loops etc.
If anyone has some ideas that would be much appreciated!
Thanks
EDIT:
By request the actual case class and what I am trying.
case class Car(make: String, model: String, fuelCapacity: Option[Int], category:Option[List[String]])
Once again, a car can belong to multiple categories. Let's say List("SUV", "offroad", "family").
I want to group by category elements rather than by the whole list of categories, and have the fuelCapacity as the values, in order to be able to extract average fuelCapacity per category amongst other metrics.
Using your EDIT as a guide.
case class Car( make: String
, model: String
, fuelCapacity: Option[Int]
, category:Option[List[String]] )
val cars: List[Car] = ???
//all currently known category strings
val cats: Set[String] = cars.flatMap(_.category).flatten.toSet
//category -> list of cars in this category
val catMap: Map[String,List[Car]] =
cats.map(cat => (cat, cars.filter(_.category.contains(cat)))).toMap
//category -> average fuel capacity for cars in this category
val fcAvg: Map[String,Double] =
catMap.map{case (cat, cars) =>
val fcaps: List[Int] = cars.flatMap(_.fuelCapacity)
if (fcaps.lengthIs < 1) (cat, -1d)
else (cat, fcaps.sum.toDouble / fcaps.length)
}
Something like the following?
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupBy { case (c,_) => c } // Map[String,Seq[(String,YourType)]]
.mapValues { items => c -> items.map { case (_, o) => o } } // Map[String, Seq[YourType]]
(Deliberately "heavy" to help you understand the idea behind it)
EDIT, or as of Scala 2.13 thanks to groupMap:
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupMap { case (c,_) => c } { case (_, o) => o } // Map[String,Seq[YourType]]
You are very close, you just need to split each individual element in the list before the group so try with something like this:
// I used a Set instead of a List,
// since I don't think the order of categories matters
// as well I would think having two times the same category doesn't make sense.
final case class MyObject(x: Int, categories: Set[String] = Set.empty) {
def addCategory(category: String): MyObject =
this.copy(categories = this.categories + category)
}
def groupByCategories(data: List[MyObject]): Map[String, List[Int]] =
data
.flatMap(o => o.categories.map(c => c -> o.x))
.groupMap(_._1)(_._2)

How to use result of scala's List::groupBy?

I have this code. PartnerReader.fetchParters() returns a List[Partner]. Partners have a country attribute that is a String, and a dates attribute that is a list of Calendar objects. I group all Partners by their country. I expect partnersByCountry is a Map[String, List[Partner]].
I then want to associate countries with all the dates from all their Partners. getAllDatesForPartners() returns a List[Calendar] resulting (hopefully) in a Map[String, List[Calendar]]. I attempt to do this with the assignment to allDatesForCountry, but this fails on the call to map with the error Cannot resolve overloaded method 'map'.
Why does this code not work and what's the correct way to do the transformation?
val partnersByCountry = PartnerReader.fetchPartners()
.groupBy(_.country)
val allDatesForCountry = partnersByCountry
.map((country: String, partners: List[Partner]) => {
country -> getAllDatesForPartners(partners)
})
def getAllDatesForPartners(partners: List[Partner]): List[Calendar] = ???
When you .map() over a Map[?,?] you get a sequence of tuples. To break each tuple into its constituent parts you need pattern matching.
.map{case (country: String, partners: List[Partner]) =>
country -> getAllDatesForPartners(partners)
}
Which is the long (documented) way to write...
.map(tup => (tup._1, getAllDatesForPartners(tup._2)))

zipping lists with an optional list to construct a list of object in Scala

I have a case class like this:
case class Metric(name: String, value: Double, timeStamp: Int)
I receive individual components to build metrics in separate lists and zip them to create a list of Metric objects.
def buildMetric(names: Seq[String], values: Seq[Double], ts: Seq[Int]): Seq[Metric] = {
(names, values, ts).zipped.toList map {
case (name, value, time) => Metric(name, value, time)
}
}
Now I need to add an optional parameter to both buildMetric function and Metric class.
case class Metric(name: String, value: Double, timeStamp: Int, type: Option[Type])
&
def buildMetric(names: Seq[String], values: Seq[Double], ts: Seq[Int], types: Option[Seq[Type]]): Seq[Metric]
The idea is that we some times receive a sequence of the type which if present matches the length of names and values lists. I am not sure how to modify the body of buildMetric function to create the Metric objects with type information idiomatically. I can think of a couple of approaches.
Do an if-else on types.isDefined and then zip the types with types.get with another list in one condition and leave as above in the other. This makes me write the same code twice.
The other option is to simply use a while loop and create a Metric object with types.map(_(i)) passed a last parameter.
So far I am using the second option, but I wonder if there is a more functional way of handling this problem.
The first option can't be done because zipped only works with tuples of 3 or fewer elements.
The second version might look like this:
def buildMetric(names: Seq[String], values: Seq[Double], ts: Seq[Int], types: Option[Seq[Type]]): Seq[Metric] =
for {
(name, i) <- names.zipWithIndex
value <- values.lift(i)
time <- ts.lift(i)
optType = types.flatMap(_.lift(i))
} yield {
Metric(name, value, time, optType)
}
One more option from my point of view, if you would like to keep this zipped approach - convert types from Option[Seq[Type]] to Seq[Option[Type]] with same length as names filled with None values in case if types is None as well:
val optionTypes: Seq[Option[Type]] = types.fold(Seq.fill(names.length)(None: Option[Type]))(_.map(Some(_)))
// Sorry, Did not find `zipped` for Tuple4 case
names.zip(values).zip(ts).zip(optionTypes).toList.map {
case (((name, value), time), optionType) => Metric(name, value, time, optionType)
}
Hope this helps!
You could just use pattern matching on types:
def buildMetric(names: Seq[String], values: Seq[Double], ts: Seq[Int], types: Option[Seq[Type]]): Seq[Metric] = {
types match {
case Some(types) => names.zip(values).zip(ts).zip(types).map {
case (((name, value), ts,), t) => Metric(name, value, ts, Some(t))
}
case None => (names, values, ts).zipped.map(Metric(_, _, _, None))
}
}

Add value with groupByKey

I have some troubles with groupByKey in scala and Spark.
I have 2 case classes :
case class Employee(id_employee: Long, name_emp: String, salary: String)
For the moment I use this 2nd case class:
case class Company(id_company: Long, employee:Seq[Employee])
However, I want to replace it with this new one:
case class Company(id_company: Long, name_comp: String employee:Seq[Employee])
There is a parent DataSet (df1) that I use with groupByKey to create Company objects :
val companies = df1.groupByKey(v => v.id_company)
.mapGroups(
{
case(k,iter) => Company(k, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)
}
).collect()
This code works, it returns objects like this one :
Company(1234,List(Employee(0987, John, 30000),Employee(4567, Bob, 50000)))
But I don't find the tip to add the Company name_comp to those objects (this field exist df1). In order to retrieve objects like this (using the new case class):
Company(1234, NYTimes, List(Employee(0987, John, 30000),Employee(4567, Bob, 50000)))
Since you want both the company id and name, what you can do is to use a tuple as the key when you group your data. This will make both values easily available when constructing the Company class:
df1.groupByKey(v => (v.id_company, v.name_comp))
.mapGroups{ case((id, name), iter) =>
Company(id, name, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)}
.collect()

Scala: More Efficient Way to Filter a List and Create a Sequence of Futures

Given a list of Order objects...
case class Order(val id: String, val orderType: Option[String])
case class Transaction (val id: String, ...)
val orders = List(Order(1, Some("sell")), Order(2, None), ...)
... I need to create a sequence of Futures for all those orders that have a type (i.e. orderType is defined):
val transactions: Seq[Future[Transaction]] = orders.filter(
_.orderType.isDefined).map { case order =>
trxService.findTransactions(order.id) // this returns a Future[Transaction]
}
)
The code above first invokes filter, which creates a new List containing only orders with orderType set to Some, and then creates a sequence of Futures out of it. Is there a more efficient way to accomplish this?
You can aggregate filter and map using collect
val transactions: Seq[Future[Transaction]] = orders.collect {
case order if order.orderType.isDefined => trxService.findTransactions(order.id)
}