Scala Retrieve aggregate pattern inside flat map - scala

I have this scenario when trying to aggregate the result of few methods .
What im trying to do is im getting a Future of an object that contain a list of objects .
Then im flat mapping the future to a list of strings
and then i would like to iterate this list
Than invoke couple of methods if it possible in a async way and then merge the result to one object i have to wait that all method will finish
and send the result to a database.
This is where i stuck ....
UPDATE
I edit the method as suggested in the comments now i getting type mismatch error
expected List[postMd.PostMD]....
,
def getComplatePost(url: String): Unit = {
val postMd = new PostMetaData
val com = new Comments
val post = new Post
val fullPost = new CompletePost
val postMdList: Future[List[postMd.PostMD]] = postMd.getPostMetaData(url, "396697410351933") // get the list of id
postMdList.flatMap(x => {
val fromid = x.map(_.fromID) //extract the Future to a list of string
for {
id <- fromid
val c = com.getComments(id)
val p = post.getPost(id)
}yield (c,p)
})
}
thanks
miki

You can use a 'for - yield' this case. 'for' executes the yield operation after each of its statement completes.
This mainly hold as, 'for' is a syntactic sugar which further gets mapped to nested flatMap or map or foreach ... depending on the operations being performed.
Below is a modified snippet from your code:
for {
id <- fromid
//invoke methods
c = com.getComments(id)
p = post.getPost(id)
k = post.getPostLikes(id)
} // merge results and send data to DB

Related

Combine Scala Future[Seq[X]] with Seq[Future[Y]] to produce Future[(X,Seq[Y])]

I have below relationship in my entity classes,
customer -> * Invoice
now I have to implement a method which returns customers with their invoices
type CustomerWithInvoices = (Custimer,Seq[Invoice])
def findCustomerWitnInvoices:Future[Seq[CustomerWithInvoices]] = {
for{
customers <- findCustomers
eventualInvoices: Seq[Future[Seq[Invoice]]] = customers.map(customer => findInvoicesByCustomer(customer))
} yield ???
}
using existing repository methods as below
def findCustomers:Future[Seq[Customers]] = {...}
def findInvoicesByCustomer(customer:Customer):Future[Seq[Invoice]] = {...}
I try to use for expression as above but I can't figure the proper way to do it, as I'm fairly new to Scala, highly appreciate any help..
i would use Future.sequence, the simplified method signature is
sequence takes M[Future[A]] and returns Future[M[A]]
That is what we need to solve your problem, here's the code i would write:
val eventualCustomersWithInvoices: Future[Seq[(Customer, Seq[Invoice])]] = for {
customers <- findCustomers()
eventualInvoices <- Future.sequence(customers.map(customer => findInvoicesByCustomer(customer)))
} yield eventualInvoices
note that the type of eventualInvoices is Future[Seq[(Customer, Seq[Invoice])]] hence Future[Seq[CustomerWithInvoices]]

Gatling feeder through the iteration over a map

I wish to create a custom feeder in Gatling scala, which fills the parameter element via iterating through a map.
I have the following code snippet:
val idPostFeeder = Iterator.continually(
Map("postId" -> getValues())
)
getValues is a Collection containing String elements
I tried also the following way:
val idPostFeeder = (for (i <- getFile().get(l.get(b)) yield {
Map("postId" -> s"$i")
} )
val l = getFile().keysIterator.toList;
var b = l.indexOf() until l.indexOf(mapLenght)
getFile is a Map[String, String], from which I need the values passed to the feeder.
Is there a way to fill a feeder via an iteration over a Collection or a Map?
Thank you!
for a collection of Strings, you just need to convert to a Map
getValues.map(s => Map("postId" -> s)).toIterator
you now have an iterator that maps "postId" to each value of your collection

How to define a function in scala for flatMap

New to Scala, I want to try to rewrite some code in flatMap by calling a function instead of writing the whole process inside "()".
The original code is like:
val longForm = summary.flatMap(row => {
/*This is the code I want to replace with a function*/
val metric = row.getString(0)
(1 until row.size).map{i=>
(metric,schema(i).name,row.getString(i).toDouble)
})
}/*End of function*/)
The function I wrote is:
def tfunc(line:Row):List[Any] ={
val metric = line.getString(0)
var res = List[Any]
for (i<- 1 to line.size){
/*Save each iteration result as a List[tuple], then append to the res List.*/
val tup = (metric,schema(i).name,line.getString(i).toDouble)
val tempList = List(tup)
res = res :: tempList
}
res
}
The function did not passed compilation with the following error:
error: missing argument list for method apply in object List
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing apply _ or apply(_) instead of apply.
var res = List[Any]
What is wrong with this function?
And for flatMap, is it the write way to return the result as a List?
You haven't explained why you want to replace that code block. Is there a particular goal you're after? There are many, many, different ways that block could be rewritten. How can we know which would be better at meeting you requirements?
Here's one approach.
def tfunc(line :Row) :List[(String,String,Double)] ={
val metric = line.getString(0)
List.tabulate(line.tail.length){ idx =>
(metric, schema(idx+1).name, line.getString(idx+1).toDouble)
}
}

Scala Spark not returning value outside loop [duplicate]

I am new to Scala and Spark and would like some help in understanding why the below code isn't producing my desired outcome.
I am comparing two tables
My desired output schema is:
case class DiscrepancyData(fieldKey:String, fieldName:String, val1:String, val2:String, valExpected:String)
When I run the below code step by step manually, I actually end up with my desired outcome. Which is a List[DiscrepancyData] completely populated with my desired output. However, I must be missing something in the code below because it returns an empty list (before this code gets called there are other codes that is involved in reading tables from HIVE, mapping, grouping, filtering, etc etc etc):
val compareCols = Set(year, nominal, adjusted_for_inflation, average_private_nonsupervisory_wage)
val key = "year"
def compare(table:RDD[(String, Iterable[Row])]): List[DiscrepancyData] = {
var discs: ListBuffer[DiscrepancyData] = ListBuffer()
def compareFields(fieldOne:String, fieldTwo:String, colName:String, row1:Row, row2:Row): DiscrepancyData = {
if (fieldOne != fieldTwo){
DiscrepancyData(
row1.getAs(key).toString, //fieldKey
colName, //fieldName
row1.getAs(colName).toString, //table1Value
row2.getAs(colName).toString, //table2Value
row2.getAs(colName).toString) //expectedValue
}
else null
}
def comparison() {
for(row <- table){
var elem1 = row._2.head //gets the first element in the iterable
var elem2 = row._2.tail.head //gets the second element in the iterable
for(col <- compareCols){
var value1 = elem1.getAs(col).toString
var value2 = elem2.getAs(col).toString
var disc = compareFields(value1, value2, col, elem1, elem2)
if (disc != null) discs += disc
}
}
}
comparison()
discs.toList
}
I'm calling the above function as such:
var outcome = compare(groupedFiltered)
Here is the data in groupedFiltered:
(1991,CompactBuffer([1991,7.14,5.72,39%], [1991,4.14,5.72,39%]))
(1997,CompactBuffer([1997,4.88,5.86,39%], [1997,3.88,5.86,39%]))
(1999,CompactBuffer([1999,5.15,5.96,39%], [1999,5.15,5.97,38%]))
(1947,CompactBuffer([1947,0.9,2.94,35%], [1947,0.4,2.94,35%]))
(1980,CompactBuffer([1980,3.1,6.88,45%], [1980,3.1,6.88,48%]))
(1981,CompactBuffer([1981,3.15,6.8,45%], [1981,3.35,6.8,45%]))
The table schema for groupedFiltered:
(year String,
nominal Double,
adjusted_for_inflation Double,
average_provate_nonsupervisory_wage String)
Spark is a distributed computing engine. Next to "what the code is doing" of classic single-node computing, with Spark we also need to consider "where the code is running"
Let's inspect a simplified version of the expression above:
val records: RDD[List[String]] = ??? //whatever data
var list:mutable.List[String] = List()
for {record <- records
entry <- records }
{ list += entry }
The scala for-comprehension makes this expression look like a natural local computation, but in reality the RDD operations are serialized and "shipped" to executors, where the inner operation will be executed locally. We can rewrite the above like this:
records.foreach{ record => //RDD.foreach => serializes closure and executes remotely
record.foreach{entry => //record.foreach => local operation on the record collection
list += entry // this mutable list object is updated in each executor but never sent back to the driver. All updates are lost
}
}
Mutable objects are in general a no-go in distributed computing. Imagine that one executor adds a record and another one removes it, what's the correct result? Or that each executor comes to a different value, which is the right one?
To implement the operation above, we need to transform the data into our desired result.
I'd start by applying another best practice: Do not use null as return value. I also moved the row ops into the function. Lets rewrite the comparison operation with this in mind:
def compareFields(colName:String, row1:Row, row2:Row): Option[DiscrepancyData] = {
val key = "year"
val v1 = row1.getAs(colName).toString
val v2 = row2.getAs(colName).toString
if (v1 != v2){
Some(DiscrepancyData(
row1.getAs(key).toString, //fieldKey
colName, //fieldName
v1, //table1Value
v2, //table2Value
v2) //expectedValue
)
} else None
}
Now, we can rewrite the computation of discrepancies as a transformation of the initial table data:
val discrepancies = table.flatMap{case (str, row) =>
compareCols.flatMap{col => compareFields(col, row.next, row.next) }
}
We can also use the for-comprehension notation, now that we understand where things are running:
val discrepancies = for {
(str,row) <- table
col <- compareCols
dis <- compareFields(col, row.next, row.next)
} yield dis
Note that discrepancies is of type RDD[Discrepancy]. If we want to get the actual values to the driver we need to:
val materializedDiscrepancies = discrepancies.collect()
Iterating through an RDD and updating a mutable structure defined outside the loop is a Spark anti-pattern.
Imagine this RDD being spread over 200 machines. How can these machines be updating the same Buffer? They cannot. Each JVM will be seeing its own discs: ListBuffer[DiscrepancyData]. At the end, your result will not be what you expect.
To conclude, this is a perfectly valid (not idiomatic though) Scala code but not a valid Spark code. If you replace RDD with an Array it will work as expected.
Try to have a more functional implementation along these lines:
val finalRDD: RDD[DiscrepancyData] = table.map(???).filter(???)

Slick 3.0 how to update variable column list, which number is know only in Runtime

Is it possible to update variable column list, which number is know only in runtime by slick 3.0?
Below is example what I want to do (won't compile)
var q: Query[UserTable, UserTable#TableElementType, Seq] = userTable
var columns = List[Any]()
var values = List[Any]()
if (updateCommands.name.isDefined) {
columns = q.name :: columns
values = updateCommands.name.get :: values
}
if (updateCommands.surname.isDefined) {
columns = q.surname :: columns
values = updateCommands.surname.get :: values
}
q = q.filter(_.id === updateCommands.id).map(columns).update(values)
Here is what I've done in Slick 3.1. I wasn't sure what worse, editing plain SQL statement or multiple queries. So I decided to go with latter assuming Postgres optimizer would see same WHERE clause in update queries of single transaction. My update method looks like this
def updateUser(user: User, obj: UserUpdate): Future[Unit] = {
val actions = mutable.ArrayBuffer[DBIOAction[Int, NoStream, Write with Transactional]]()
val query = users.withFilter(_.id === user.id)
obj.name.foreach(v => actions += query.map(_.name).update(v))
obj.email.foreach(v => actions += query.map(_.email).update(Option(v)))
obj.password.foreach(v => actions += query.map(_.pwdHash).update(Option(encryptPassword(v))))
slickDb.run(DBIO.seq(actions.map(_.transactionally): _*))
}
In Slick 3.0 they adopted slightly different approach, instead of having updateAll methods, as far as I userstand path of combinators was adopted.
So main idea is to define some actions on the data and then combine them ont he database to make a single run.
Example:
// let's assume that you have some table classes defined somewhere
// then let's define some actions, they might be really different
val action: SqlAction = YourTable.filter(_id === idToAssert)
val anotherAction = AnotherTable.filter(_.pets === "fun")
// and then we can combine them on a db.run
val combinedAction = for {
someResult <- action
anotherResult <- anotherAction
} yeild (someResult,anotherResult)
db.run(combinedAction) // that returns actual Future of the result type
In the same way you can deal with lists and sequences, for that please take a look here: http://slick.typesafe.com/doc/3.1.0-M1/dbio.html
DBIO has some functions that allows you to combine list of actions to one action.
I hope that idea is clear, if you have questions you are wellcome to the comments.
to update a variable number of columns you may use this way as I used for slick 3:
def update(id: Long, schedule: Schedule, fieldNames: Seq[String]): Future[_] = {
val columns = schedules.baseTableRow.create_*.map(_.name).toSeq.filter(fieldNames.map(_.toUpperCase).contains)
val toBeStored = schedule.withDefaults
val actions = mutable.ArrayBuffer[DBIOAction[Int, NoStream, Write with Transactional]]()
val query = schedules.withFilter(_.id === id)
//this is becasue of limitations in slick, multiple columns are not possible to be updated!
columns.find("NAME".equalsIgnoreCase).foreach(x => actions += query.map(_.name).update(toBeStored.name))
columns.find("NAMESPACE".equalsIgnoreCase).foreach(x => actions += query.map(_.namespace).update(toBeStored.namespace))
columns.find("URL".equalsIgnoreCase).foreach(x => actions +=
db.run(DBIO.seq(actions: _ *).transactionally.withPinnedSession)
}