Zip two lists based on specific condition in Scala - scala

I have two defined case classes and two lists like the following code.
case class Person(name: String, company: String, rank: Int, id: Long)
case class Employee(company: String, rank: Int, id: Long)
val persons = List(Person("Tom", "CompanyA", 1, null), Person("Jenny", "CompanyB", 1, null), Person("James", "CompanyA", 2, null))
val employees = List(Employee("CompanyA", 1, 1001), Employee("CompanyB", 1, 1002), Employee("CompanyA", 2, 1003))
since the combination of company and rank is unique, I want to use the information in employees so that I can combine the two lists into the following one (A list of Person with id fulfilled).
[Person("Tom", "CompanyA", 1, 1001), Person("Jenny", "CompanyB", 1, 1002), Person("James", "CompanyA", 2, 1003)]
I tried to implement it as this:
zipBasedOnCondition(persons, employees, (person, employee) => person.name == employee.name && person.rank === employee.rank)
However, I failed to come up with a solution to implement the zipBasedOnCondition function
Is there any solution to combine the two lists?

What you want can be achieved by:
for {
person <- persons
employee <- employees
if person.name == employee.name && person.rank === employee.rank
} yield person.copy(id = employee.id)
It has time complexity of O(persons.size*employees.size) but since List has no guarantees about things inside being sorted (and especially, being sorted by the things you want to compare against) you cannot optimize it anymore.
If you want, you could modify it so that it would took the first one of possible pairs, though how is beyond the scope of "zip with condition".

I'm still thinking of better solutions... but this should work:
def f(c: String): Option[Employee] = employees.filter(_.company == c).headOption
for {
p <- persons
e <- f(p.company)
} yield {
p.copy(id = e.id)
}

This would be a pretty generic approach:
def zipBasedOnCondition[A, B, C](as: List[A], bs: List[B], pred: (A, B) => Boolean, f: (A, B) => C): List[C] = {
as.map(a => f(a, bs.filter(b => pred(a, b)).head))
}
then you could call it like this:
zipBasedOnCondition[Person, Employee, Person](persons, employees, (p, e) => p.name == e.name && p.rank == e.rank, (a, b) => a.copy(id = b.id))
The implementation of zipBasedOnCondition would need improvement since it assumes that for every person object there is a corresponding employee object.

You are providing id fied as mandatory field in Person class and providing null value in person list that will give error.
First let's correct your Person class.
case class Person(name: String, company: String, rank: Int, id: Long = 0)
Now, Solution to your question.
def combineList( list1: List[Person], list2: List[Employee]): List[(Person)] = {
(for{
a <- list1
b <- list2
if (a.company == b.company && a.rank == b.rank)
} yield (a.copy(id = b.id)))
}
Output
List(Person(Tom,CompanyA,1,1001), Person(Jenny,CompanyB,1,1002), Person(James,CompanyA,2,1003))

Related

How can I group by the individual elements of a list of elements in Scala

Forgive me if I'm not naming things by their actual name, I've just started to learn Scala. I've been looking around for a while, but can not find a clear answer to my question.
Suppose I have a list of objects, each object has two fields: x: Int and l: List[String], where the Strings, in my case, represent categories.
The l lists can be of arbitrary length, so an object can belong to multiple categories. Furthermore, various objects can belong to the same category. My goal is to group the objects by the individual categories, where the categories are the keys. This means that if an object is linked to say "N" categories, it will occur in "N" of the key-value pairs.
So far I managed to groupBy the lists of categories through:
objectList.groupBy(x => x.l)
However, this obviously groups the objects by list of categories rather than by categories.
I'm trying to do this with immutable collections avoiding loops etc.
If anyone has some ideas that would be much appreciated!
Thanks
EDIT:
By request the actual case class and what I am trying.
case class Car(make: String, model: String, fuelCapacity: Option[Int], category:Option[List[String]])
Once again, a car can belong to multiple categories. Let's say List("SUV", "offroad", "family").
I want to group by category elements rather than by the whole list of categories, and have the fuelCapacity as the values, in order to be able to extract average fuelCapacity per category amongst other metrics.
Using your EDIT as a guide.
case class Car( make: String
, model: String
, fuelCapacity: Option[Int]
, category:Option[List[String]] )
val cars: List[Car] = ???
//all currently known category strings
val cats: Set[String] = cars.flatMap(_.category).flatten.toSet
//category -> list of cars in this category
val catMap: Map[String,List[Car]] =
cats.map(cat => (cat, cars.filter(_.category.contains(cat)))).toMap
//category -> average fuel capacity for cars in this category
val fcAvg: Map[String,Double] =
catMap.map{case (cat, cars) =>
val fcaps: List[Int] = cars.flatMap(_.fuelCapacity)
if (fcaps.lengthIs < 1) (cat, -1d)
else (cat, fcaps.sum.toDouble / fcaps.length)
}
Something like the following?
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupBy { case (c,_) => c } // Map[String,Seq[(String,YourType)]]
.mapValues { items => c -> items.map { case (_, o) => o } } // Map[String, Seq[YourType]]
(Deliberately "heavy" to help you understand the idea behind it)
EDIT, or as of Scala 2.13 thanks to groupMap:
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupMap { case (c,_) => c } { case (_, o) => o } // Map[String,Seq[YourType]]
You are very close, you just need to split each individual element in the list before the group so try with something like this:
// I used a Set instead of a List,
// since I don't think the order of categories matters
// as well I would think having two times the same category doesn't make sense.
final case class MyObject(x: Int, categories: Set[String] = Set.empty) {
def addCategory(category: String): MyObject =
this.copy(categories = this.categories + category)
}
def groupByCategories(data: List[MyObject]): Map[String, List[Int]] =
data
.flatMap(o => o.categories.map(c => c -> o.x))
.groupMap(_._1)(_._2)

Lazy filter + flatMap + concat

What's the best way to do lazy transformations (without creating intermediate collections)
When
Doing a flatMap with filtering done both before and after the
flat map
Concatenating collections
Usually I use withFilter for such lazy filtering, but it doesn't quite work in the more complicated use cases
Filter + flatMap
1) Naive approach
case class Item(size: Int, color: String)
// Assume an order can have a lot of items
case class Orders(price:Int, country: String, items: Seq[C])
val orders: Seq[Orders]
val ca = orders.filter(_.country = "CA").flatMap(_.items).filter(_.size > 4)
val rest = orders.filter(_.country != "CA").flatMap(_.items).filter(_.size > 6)
val res = (ca ++ rest).filter(_.color == "red").take(100)
2) Single path, but intermediate collections of items are created for each order. And I think flatMap also produces a collection
orders.flatMap {
case order if order.country = "CA") => order.items.filter(_.size > 4)
case order => order.items
}.withFilter(_.color == "red").take(100)
3) Iterators. But I am not 100% sure how exactly it is going get executed
orders.iterator.flatMap {
case order if order.country = "CA") => order.items.iterator.filter(_.size > 4)
case order => order.items.iterator
}.filter(_.color == "red").take(100)
4) Stream.
orders.toStream.flatMap {
case order if order.country = "CA") => order.items.toStream.filter(_.size > 4)
case order => order.items.toStream
}.filter(_.color == "red").take(100)
5) Views: Not sure if an intermediate collection will be created for the items in each order (I think it will), and also in general I am not a fan of views (forgetting "force" can lead to bugs)
orders.view.flatMap {
case order if order.country = "CA") => order.items.filter(_.size > 4)
case order => order.items
}.filter(_.color == "red").take(100)
Concat
Similar options, but for
val items1 = items.filter(filter1)
val items2 = items.filter(filter2)
val items3 = items.filter(filter3)
val res = (items1 ++ items2 ++ items3).filter(_.color == "Red").take(100)
Is the solution just to noramlize the classes? In your example you essentially have a nested sequence and to go from nested to one sequence you would need to flatten at some point.
A solution could be a List[Tuple2[Order, Item]] with item removed from Order.
That way you can do one collect.
For eaxmple:
object Main {
case class Item(size: Int, color: String)
case class Order(price:Int, country: String)
def main(args: Array[String]): Unit = {
val orders: Seq[Tuple2[Order, Item]] = Seq(Order(0, "CA") -> Item(5, "red"), Order(0, "GB")-> Item(1, "blue"))
val filtered: Seq[Item] = orders.collect {
case (order, item) if order.country == "CA" && item.size > 4 && item.color == "red" => item
}
println(orders)
println(filtered)
}
}
// Output:
// List((Order(0,CA),Item(5,red)), (Order(0,GB),Item(1,blue)))
// List(Item(5,red))
Try it online!

How to map sequence only if a condition applies with scala using an immutable approach?

Given a sequence of Price objects, I want to map it to applyPromo function if a condition, i.e. promo == "FOO" applies, otherwise return the sequence as is.
This is my applyPromo:
val pricePromo = price => price.copy(amount = price.amount - someDiscount)
In a mutable way I probably would write it like this:
var prices: Seq[Price] = Seq(price1, price2, ...)
.map(doStuff)
.map(doSomeOtherStuff)
if (promo == "FOO") {
prices = prices.map(applyPromo)
}
prices
I was wondering if I could do something similar like this while keeping the immutable approach of scala. Instead of creating a temp var, I prefer to keep the chain.
Pseudo-code:
val prices = Seq(price1, price2, ...)
prices
.map(dosStuff)
.map(doOtherStuff)
.mapIf(promo == "FOO", applyPromo)
I don't want to check the condition within the map function in this case, as it applies for all elements:
prices.map(price => {
if (promo == "FOO") {
applyDiscount(price)
} else
price
}
)
You just need to use else to make it functional (and you can create an implicit class to add the mapIf method if you prefer):
val prices: Seq[Price] = Seq(price1, price2,...).map(doStuff).map(doSomeOtherStuff)
/* val resultPrices = */ if (promo == "FOO") {
prices.map(price => {
price.copy(amount = price.amount - someDiscount)
})
} else prices
Something like this:
implicit class ConditionalMap[T](seq: Seq[T]) extends AnyVal {
def mapIf[Q](cond: =>Boolean, f: T => Q): Seq[Q] = if (cond) seq.map(f) else seq
}
You can also map(x => x) in the else case:
val discountFunction = if (promo == "FOO") (price: Price) =>
price.copy(amount = price.amount - someDiscount) else (x: Price) => x
val prices: Seq[Price] = Seq(price1, price2,...).
map(doStuff).
map(doSomeOtherStuff).
map(discountFunction)
I'd do it like this:
val maybePromo: (Price => Price) =
if(promo == "FOO") applyPromo else identity _
prices.map(maybePromo)
Or you can inline it within map itself:
prices.map(if(promo == "FOO") applyPromo else identity)
In scalaz, a function A => A is called an endomorphism and is a Monoid whose associative binary operation is function composition and whose identity is the identity function. This is useful because there is a bunch of syntax available where monoids are concerned. For example, scalaz adds the ?? operation to boolean along these lines:
def ??[A: Monoid](a: A) = if (self) a else Monoid[A].zero
Thus:
prices
.map(doStuff)
.map(doSomeOtherStuff)
.map(((promo === "FOO") ?? deductDiscount).run)
Where:
val deductDiscount: Endo[Price] = Endo(px => px.copy(amount = px.amount - someDiscount))
The above all requires
import scalaz._
import Scalaz._
Notes
=== is typesafe equals syntax
?? is boolean syntax
oxbow_lakes has an interesting answer
Easy way solve to me is wrapping Seq in a Option context.
scala> case class Price(amount: Double)
defined class Price
when condition matches,
scala> val promo = "FOO"
promo: String = FOO
scala> Some(Seq(Price(1), Price(2), Price(3))).collect{
case prices if promo == "FOO" => prices.map { p => p.copy(p.amount - 1 )}
case prices => prices}
res6: Option[Seq[Price]] = Some(List(Price(0.0), Price(1.0), Price(2.0)))
when condition does not match
scala> val promo = "NOT-FOO"
promo: String = NOT-FOO
scala> Some(Seq(Price(1), Price(2), Price(3))).collect{
case prices if promo == "FOO" => prices.map { p => p.copy(p.amount - 1 )}
case prices => prices}
res7: Option[Seq[Price]] = Some(List(Price(1.0), Price(2.0), Price(3.0)))

Working scala code using a var in a pure function. Is this possible without a var?

Is it possible (or even worthwhile) to try to write the below code block without a var? It works with a var. This is not for an interview, it's my first attempt at scala (came from java).
The problem: Fit people as close to the front of a theatre as possible, while keeping each request (eg. Jones, 4 tickets) in a single theatre section. The theatre sections, starting at the front, are sized 6, 6, 3, 5, 5... and so on. I'm trying to accomplish this by putting together all of the potential groups of ticket requests, and then choosing the best fitting group per section.
Here are the classes. A SeatingCombination is one possible combination of SeatingRequest (just the IDs) and the sum of their ticketCount(s):
class SeatingCombination(val idList: List[Int], val seatCount: Int){}
class SeatingRequest(val id: Int, val partyName: String, val ticketCount: Int){}
class TheatreSection(val sectionSize: Int, rowNumber: Int, sectionNumber: Int) {
def id: String = rowNumber.toString + "_"+ sectionNumber.toString;
}
By the time we get to the below function...
1.) all of the possible combinations of SeatingRequest are in a list of SeatingCombination and ordered by descending size.
2.) all of the TheatreSection are listed in order.
def getSeatingMap(groups: List[SeatingCombination], sections: List[TheatreSection]): HashMap[Int, TheatreSection] = {
var seatedMap = new HashMap[Int, TheatreSection]
for (sect <- sections) {
val bestFitOpt = groups.find(g => { g.seatCount <= sect.sectionSize && !isAnyListIdInMap(seatedMap, g.idList) })
bestFitOpt.filter(_.idList.size > 0).foreach(_.idList.foreach(seatedMap.update(_, sect)))
}
seatedMap
}
def isAnyListIdInMap(map: HashMap[Int, TheatreSection], list: List[Int]): Boolean = {
(for (id <- list) yield !map.get(id).isEmpty).reduce(_ || _)
}
I wrote the rest of the program without a var, but in this iterative section it seems impossible. Maybe with my implementation strategy it's impossible. From what else I've read, a var in a pure function is still functional. But it's been bothering me I can't think of how to remove the var, because my textbook told me to try to avoid them, and I don't know what I don't know.
You can use foldLeft to iterate on sections with a running state (and again, inside, on your state to add iteratively all the ids in a section):
sections.foldLeft(Map.empty[Int, TheatreSection]){
case (seatedMap, sect) =>
val bestFitOpt = groups.find(g => g.seatCount <= sect.sectionSize && !isAnyListIdInMap(seatedMap, g.idList))
bestFitOpt.
filter(_.idList.size > 0).toList. //convert option to list
flatMap(_.idList). // flatten list from option and idList
foldLeft(seatedMap)(_ + (_ -> sect))) // add all ids to the map with sect as value
}
By the way, you can simplify the second method using exists and map.contains:
def isAnyListIdInMap(map: HashMap[Int, TheatreSection], list: List[Int]): Boolean = {
list.exists(id => map.contains(id))
}
list.exists(predicate: Int => Boolean) is a Boolean which is true if the predicate is true for any element in list.
map.contains(key) checks if map is defined at key.
If you want to be even more concise, you don't need to give a name to the argument of the predicate:
list.exists(map.contains)
Simply changing var to val should do it :)
I think, you may be asking about getting rid of the mutable map, not of the var (it doesn't need to be var in your code).
Things like this are usually written recursively in scala or using foldLeft, like other answers suggest. Here is a recursive version:
#tailrec
def getSeatingMap(
groups: List[SeatingCombination],
sections: List[TheatreSection],
result: Map[Int, TheatreSection] = Map.empty): Map[Int, TheatreSection] = sections match {
case Nil => result
case head :: tail =>
val seated = groups
.iterator
.filter(_.idList.nonEmpty)
.filterNot(_.idList.find(result.contains).isDefined)
.find(_.seatCount <= head.sectionSize)
.fold(Nil)(_.idList.map(id => id -> sect))
getSeatingMap(groups, tail, result ++ seated)
}
btw, I don't think you need to test every id in list for presence in the map - should suffice to just look at the first one. You could also make it a bit more efficient, probably, if instead of checking the map every time to see if the group is already seated, you'd just drop it from the input list as soon as the section is assigned.
#tailrec
def selectGroup(
sect: TheatreSection,
groups: List[SeatingCombination],
result: List[SeatingCombination] = Nil
): (List[(Int, TheatreSection)], List[SeatingCombination]) = groups match {
case Nil => (Nil, result)
case head :: tail
if(head.idList.nonEmpty && head.seatCount <= sect.sectionSize) => (head.idList.map(_ -> sect), result.reverse ++ tail)
case head :: tail => selectGroup(sect, tail, head :: result)
}
and then in getSeatingMap:
...
case head :: tail =>
val(seated, remaining) => selectGroup(sect, groups)
getSeatingMap(remaining, tail, result ++ seated)
Here is how I was able to achieve without using the mutable.HashMap, the suggestion by the comment to use foldLeft was used to do it:
class SeatingCombination(val idList: List[Int], val seatCount: Int){}
class SeatingRequest(val id: Int, val partyName: String, val ticketCount: Int){}
class TheatreSection(val sectionSize: Int, rowNumber: Int, sectionNumber: Int) {
def id: String = rowNumber.toString + "_"+ sectionNumber.toString;
}
def getSeatingMap(groups: List[SeatingCombination], sections: List[TheatreSection]): Map[Int, TheatreSection] = {
sections.foldLeft(Map.empty[Int, TheatreSection]) { (m, sect) =>
val bestFitOpt = groups.find(g => {
g.seatCount <= sect.sectionSize && !isAnyListIdInMap(m, g.idList)
}).filter(_.idList.nonEmpty)
val newEntries = bestFitOpt.map(_.idList.map(_ -> sect)).getOrElse(List.empty)
m ++ newEntries
}
}
def isAnyListIdInMap(map: Map[Int, TheatreSection], list: List[Int]): Boolean = {
(for (id <- list) yield map.get(id).isDefined).reduce(_ || _)
}

Slick one to many and grouping

I'm trying to model the following with Slick 3.1.0;
case class Review(txt: String, userId: Long, id: Long)
case class User(name: String, id: Long)
case class ReviewEvent(event: String, reviewId: Long)
I need to populate a class called a FullReview, which looks like;
case class FullReview(r: Review, user: User, evts: Seq[ReviewEvent])
Assuming I have the right tables for each of the models, I'm trying to fetch a FullReview using a combination of join and group by, like so:
val withUser = for {
(r, u) <- RTable join UTable on (_.userId === _.id)
}
val withUAndEvts = (for {
((r, user), evts) <- withUser joinLeft ETable on {
case ((r, _), ev) => r.id === ev.reviewId
}
} yield (r, user, events)).groupBy(_._1._id)
This seems to yield, when a nested Query type, from what I can see. What am I doing wrong here?
If I understand you correctly, you can use following example:
val users = TableQuery[Users]
val reviews = TableQuery[Reviews]
val events = TableQuery[ReviewEvents]
override def findAllReviews(): Future[Seq[FullReview]] = {
val query = reviews
.join(users).on(_.userId === _.id)
.joinLeft(events).on(_._1.id === _.reviewId)
db.run(query.result).map { a =>
a.groupBy(_._1._1.id).map { case (_, tuples) =>
val ((review, user), _) = tuples.head
val reviewEvents = tuples.flatMap(_._2)
FullReview(review, user, reviewEvents)
}.toSeq
}
}
If you want to add pagination to this request, I've already answered here and here is full example.
From some tinkering around, I figured it would just be better to do the aggregation on the client. What that would mean, indirectly, is that if 100 rows on the table ETable would match a single row on the RTable, you would get multiple rows on the client. The client then has to implement its own aggregation to group all the ReviewEvent by Review.
As far as pagination is concerned, you may do something like;
def withUser(page: Int, pageSize: Int) = for {
(r, u) <- RTable.drop(page * pageSize).take(pageSize) join UTable on (_.userId === _.id)
}
I guess this is elegant enough for now. If someone has a better answer, I'd be happy to hear it.