Scala Tuple of seq to seq of object - scala

I am having tuples of format as (DBIO[Seq[Person]], DBIO[Seq[Address]]) as one to one mapping. Person and Address is separate table in RDBMS. Profile definition is Profile(person: Person, address: Address). Now I want to convert the former into DBIO[Seq[Profile]]. Following is code snippet for how I have got (DBIO[Seq[Person]], DBIO[Seq[Address]])
for {
person <- personQuery if person.personId === personId
address <- addressQuery if address.addressId === profile.addressId
} yield (person.result, address.result)
Need help with this transformation to DBIO[Seq[Profile].

Assuming you can't use a join and you need to work with two actions (two DBIOs), what you can do is combine the two actions into a single one:
// Combine two actions into a single action
val pairs: DBIO[ ( Seq[Person], Seq[Address] ) ] =
(person.result).zip(address.result)
(zip is just one of many combinators you can use to manipulate DBIO).
From there you can use DBIO.map to convert the pair into the datastructure you want.
For example:
// Use Slick's DBIO.map to map the DBIO value into a sequence of profiles:
val profiles: DBIO[Seq[Profile]] = pairs.map { case (ppl, places) =>
// We now use a regular Scala `zip` on two sequences:
ppl.zip(places).map { case (person, place) => Profile(person, place) }
}

I am unfamiliar with whatever DBIO is. Assuming it is a case class of some type T :
val (DBIO(people), DBIO(addresses)) = for {
person <- personQuery if person.personId === personId
address <- addressQuery if address.addressId === profile.addressId
} yield (person.result, address.result)
val profiles = DBIO(people.zip(addresses).map{ case (person, address) => Profile(person, address)})

Related

Create Map from Elements from List of case class

case class Student(id:String, name:String, teacher:String )
val myList = List( Student("1","Ramesh","Isabela"), Student("2","Elena","Mark"),Student("3","invalidKey","Someteacher"))
val a = myList.foreach( i=> (i.name -> i.teacher)).toMap.filter(i.name != "invalidKey")
I have a list of case class of student. I Want to build a map of student, teacher which are name ( key of the map) will always be unique. Preferably map can filter out a certain name.
You're using foreach, which returns Unit as the result.
I would suggest either of these 2 below. First one is as Luis Miguel mentioned:
val myMap = myList.collect {
case student if student.name != "invalidKey" => student.name -> student.teacher
}.toMap
Or:
val myMap2 = myList.foldLeft[Map[String, String]](Map.empty) {
case (elementsMap, newElement) if newElement.name != "invalidKey" =>
elementsMap + (newElement.name -> newElement.teacher)
case (elementsMap, _) => elementsMap
}
Differences:
First approach is much easier to read and shorter (being shorter is not an advantage though :D). Second one has less iterations (first one has another iteration to convert to Map).

Scala broadcast join with "one to many" relationship

I am fairly new to Scala and RDDs.
I have a very simple scenario yet it seems very hard to implement with RDDs.
Scenario:
I have two tables. One large and one small. I broadcast the smaller table.
I then want to join the table and finally aggregate the values after the join to a final total.
Here is an example of the code:
val bigRDD = sc.parallelize(List(("A",1,"1Jan2000"),("B",2,"1Jan2000"),("C",3,"1Jan2000"),("D",3,"1Jan2000"),("E",3,"1Jan2000")))
val smallRDD = sc.parallelize(List(("A","Fruit","Apples"),("A","ZipCode","1234"),("B","Fruit","Apples"),("B","ZipCode","456")))
val broadcastVar = sc.broadcast(smallRDD.keyBy{ a => (a._1,a._2) } // turn to pair RDD
.collectAsMap() // collect as Map
)
//first join
val joinedRDD = bigRDD.map( accs => {
//get list of groups
val groups = List("Fruit", "ZipCode")
val i = "Fruit"
//for each group
//for(i <- groups) {
if (broadcastVar.value.get(accs._1, i) != None) {
( broadcastVar.value.get(accs._1, i).get._1,
broadcastVar.value.get(accs._1, i).get._2,
accs._2, accs._3)
} else {
None
}
//}
}
)
//expected after this
//("A","Fruit","Apples",1, "1Jan2000"),("B","Fruit","Apples",2, "1Jan2000"),
//("A","ZipCode","1234", 1,"1Jan2000"),("B","ZipCode","456", 2,"1Jan2000")
//then group and sum
//cannot do anything with the joinedRDD!!!
//error == value copy is not a member of Product with Serializable
// Final Expected Result
//("Fruit","Apples",3, "1Jan2000"),("ZipCode","1234", 1,"1Jan2000"),("ZipCode","456", 2,"1Jan2000")
My questions:
Is this the best approach first of all with RDDs?
Disclaimer - I have done this whole task using dataframes successfully. The idea is to create another version using only RDDs to compare performance.
Why is the type of my joinedRDD not recognised after it was created so that I can continue to use functions like copy on it?
How can I get away with not doing a .collectAsMap() when broadcasting the variable. I currently have to include the first to items to enforce uniqueness and not dropping any values.
Thanks for the help in advance!
Final solution for anyone interested
case class dt (group:String, group_key:String, count:Long, date:String)
val bigRDD = sc.parallelize(List(("A",1,"1Jan2000"),("B",2,"1Jan2000"),("C",3,"1Jan2000"),("D",3,"1Jan2000"),("E",3,"1Jan2000")))
val smallRDD = sc.parallelize(List(("A","Fruit","Apples"),("A","ZipCode","1234"),("B","Fruit","Apples"),("B","ZipCode","456")))
val broadcastVar = sc.broadcast(smallRDD.keyBy{ a => (a._1) } // turn to pair RDD
.groupByKey() //to not loose any data
.collectAsMap() // collect as Map
)
//first join
val joinedRDD = bigRDD.flatMap( accs => {
if (broadcastVar.value.get(accs._1) != None) {
val bc = broadcastVar.value.get(accs._1).get
bc.map(p => {
dt(p._2, p._3,accs._2, accs._3)
})
} else {
None
}
}
)
//expected after this
//("Fruit","Apples",1, "1Jan2000"),("Fruit","Apples",2, "1Jan2000"),
//("ZipCode","1234", 1,"1Jan2000"),("ZipCode","456", 2,"1Jan2000")
//then group and sum
var finalRDD = joinedRDD.map(s => {
(s.copy(count=0),s.count) //trick to keep code to minimum (count = 0)
})
.reduceByKey(_ + _)
.map(pair => {
pair._1.copy(count=pair._2)
})
In your map statement you return either a tuple or None based on the if condition. These types do not match so you fall back the a common supertype so joinedRDD is an RDD[Product with Serializable] Which is not what you want at all (it's basically RDD[Any]). You need to make sure all paths return the same type. In this case, you probably want an Option[(String, String, Int, String)]. All you need to do is wrap the tuple result into a Some
if (broadcastVar.value.get(accs._1, i) != None) {
Some(( broadcastVar.value.get(accs._1, i).get.group_key,
broadcastVar.value.get(accs._1, i).get.group,
accs._2, accs._3))
} else {
None
}
And now your types will match up. This will make joinedRDD and RDD[Option(String, String, Int, String)]. Now that the type is correct the data is usable, however, it means that you will need to map the Option to work with the tuples. If you don't need the None values in the final result, you can use flatmap instead of map to create joinedRDD which will unwrap the Options for you, filtering out all the Nones.
CollectAsMap is the correct way to turnan RDD into a Hashmap, but you need multiple values for a single key. Before using collectAsMap but after mapping the smallRDD into a Key,Value pair, use groupByKey to group all of the values for a single key together. When when you look up a key from your HashMap, you can map over the values, creating a new record for each one.

Consolidate a list of Futures into a Map in Scala

I have two case classes P(id: String, ...) and Q(id: String, ...), and two functions returning futures:
One that retrieves a list of objects given a list of id-s:
def retrieve(ids: Seq[String]): Future[Seq[P]] = Future { ... }
The length of the result might be shorter than the input, if not all id-s were found.
One that further transforms P to some other type Q:
def transform(p: P): Future[Q] = Future { ... }
What I would like in the end is, the following. Given ids: Seq[String], calculate a Future[Map[String, Option[Q]]].
Every id from ids should be a key in the map, with id -> Some(q) when it was retrieved successfully (ie. present in the result of retrieve) and also transformed successfully. Otherwise, the map should contain id -> None or Empty.
How can I achieve this?
Is there an .id property on P or Q? You would need one to create the map. Something like this?
for {
ps <- retrieve(ids)
qs <- Future.sequence(ps.map(p => transform(p))
} yield ids.map(id => id -> qs.find(_.id == id)).toMap
Keep in mind that Map[String,Option[X]] is usually not necessary, since if you have Map[String,X] the .get method on the map will give you an Option[X].
Edit: Now assumes that P has a member id that equals the original id-String, otherwise the connection between ids and ps gets lost after retrieve.
def consolidatedMap(ids: Seq[String]): Future[Map[String, Option[Q]]] = {
for {
ps <- retrieve(ids)
qOpts <- Future.traverse(ps){
p => transform(p).map(Option(_)).recover {
// TODO: don't sweep `Throwable` under the
// rug in your real code
case t: Throwable => None
}
}
} yield {
val qMap = (ps.map(_.id) zip qOpts).toMap
ids.map{ id => (id, qMap.getOrElse(id, None)) }.toMap
}
}
Builds an intermediate Map from retrieved Ps and transformed Qs, so that building of ids-to-q-Options map works in linear time.

Filter a list based on a parameter

I want to filter the employees based on name and return the id of each employee
case class Company(emp:List[Employee])
case class Employee(id:String,name:String)
val emp1=Employee("1","abc")
val emp2=Employee("2","def")
val cmpy= Company(List(emp1,emp2))
val a = cmpy.emp.find(_.name == "abc")
val b = a.map(_.id)
val c = cmpy.emp.find(_.name == "def")
val d = c.map(_.id)
println(b)
println(d)
I want to create a generic function that contains the filter logic and I can have different kind of list and filter parameter for those list
Ex employeeIdByName which takes the parameters
Updated
criteria for filter eg :_.name and id
list to filter eg:cmpy.emp value
for comparison eg :abc/def
Any better way to achieve the result
I have used map and find
If you really want a "generic" filter function, that can filter any list of elements, by any property of these elements, based on a closed set of "allowed" values, while mapping results to some other property - it would look something like this:
def filter[T, P, R](
list: List[T], // input list of elements with type T (in our case: Employee)
propertyGetter: T => P, // function extracting value for comparison, in our case a function from Employee to String
values: List[P], // "allowed" values for the result of propertyGetter
resultMapper: T => R // function extracting result from each item, in our case from Employee to String
): List[R] = {
list
// first we filter only items for which the result of
// applying "propertyGetter" is one of the "allowed" values:
.filter(item => values.contains(propertyGetter(item)))
// then we map remaining values to the result using the "resultMapper"
.map(resultMapper)
}
// for example, we can use it to filter by name and return id:
filter(
List(emp1, emp2),
(emp: Employee) => emp.name, // function that takes an Employee and returns its name
List("abc"),
(emp: Employee) => emp.id // function that takes an Employee and returns its id
)
// List(1)
However, this is a ton of noise around a very simple Scala operation: filtering and mapping a list; This specific usecase can be written as:
val goodNames = List("abc")
val input = List(emp1, emp2)
val result = input.filter(emp => goodNames.contains(emp.name)).map(_.id)
Or even:
val result = input.collect {
case Employee(id, name) if goodNames.contains(name) => id
}
Scala's built-in map, filter, collect functions are already "generic" in the sense that they can filter/map by any function that applies to the elements in the collection.
You can use Shapeless. If you have a employees: List[Employee], you can use
import shapeless._
import shapeless.record._
employees.map(LabelledGeneric[Employee].to(_).toMap)
To convert each Employee to a map from field key to field value. Then you can apply the filters on the map.

scala slick one-to-many collections

I have a database that contain activities with a one-to-many registrations relation.
The goal is to get all activities, with a list of their registrations.
By creating a cartesian product of activities with registrations, all necessary data to get that data is out is there.
But I can't seem to find a nice way to get it into a scala collection properly;
let's of type: Seq[(Activity, Seq[Registration])]
case class Registration(
id: Option[Int],
user: Int,
activity: Int
)
case class Activity(
id: Option[Int],
what: String,
when: DateTime,
where: String,
description: String,
price: Double
)
Assuming the appropriate slick tables and tablequeries exist, I would write:
val acts_regs = (for {
a <- Activities
r <- Registrations if r.activityId === a.id
} yield (a, r))
.groupBy(_._1.id)
.map { case (actid, acts) => ??? }
}
But I cannot seem to make the appropriate mapping. What is the idiomatic way of doing this? I hope it's better than working with a raw cartesian product...
In Scala
In scala code it's easy enough, and would look something like this:
val activities = db withSession { implicit sess =>
(for {
a <- Activities leftJoin Registrations on (_.id === _.activityId)
} yield a).list
}
activities
.groupBy(_._1.id)
.map { case (id, set) => (set(0)._1, set.map(_._2)) }
But this seems rather inefficient due to the unnecessary instantiations of Activity which the table mapper will create for you.
Neither does it look really elegant...
Getting a count of registrations
The in scala method is even worse when only interested in a count of registrations like so:
val result: Seq[Activity, Int] = ???
In Slick
My best attempt in slick would look like this:
val activities = db withSession { implicit sess =>
(for {
a <- Activities leftJoin Registrations on (_.id === _.activityId)
} yield a)
.groupBy(_._1.id)
.map { case (id, results) => (results.map(_._1), results.length) }
}
But this results in an error that slick cannot map the given types in the "map"-line.
I would suggest:
val activities = db withSession { implicit sess =>
(for {
a <- Activities leftJoin Registrations on (_.id === _.activityId)
} yield a)
.groupBy(_._1)
.map { case (activity, results) => (activity, results.length) }
}
The problem with
val activities = db withSession { implicit sess =>
(for {
a <- Activities leftJoin Registrations on (_.id === _.activityId)
} yield a)
.groupBy(_._1.id)
.map { case (id, results) => (results.map(_._1), results.length) }
}
is that you can't produce nested results in group by. results.map(_._1) is a collection of items. SQL does implicit conversions from collections to single rows in some cases, but Slick being type-safe doesn't. What you would like to do in Slick is something like results.map(_._1).head, but that is currently not supported. The closest you could get is something like (results.map(_.id).max, results.map(_.what).max, ...), which is pretty tedious. So grouping by the whole activities row is probably the most feasible workaround right now.
A solution for getting all registrations per activity:
// list of all activities
val activities = Activities
// map of registrations belonging to those activities
val registrations = Registrations
.filter(_.activityId in activities.map(_.id))
.list
.groupBy(_.activityId)
.map { case (aid, group) => (aid, group.map(_._2)) }
.toMap
// combine them
activities
.list
.map { a => (a, registrations.getOrElse(a.id.get, List()))
Which gets the job done in 2 queries. It should be doable to abstract this type of "grouping" function into a scala function.