Run a group task from previous task list result - celery

I have been trying to execute a pipeline using celery. Initial task should create a list of items to process and I would use group to further parallelize each item processing. Finally I should collect results from group task.
#app.task()
def prepare():
return [item1, item2, item3]
#app.task()
def parallel_process(items, additional_param):
return group(process.s(i, additional_param) for i in items)() # I get an error kombu.exceptions.EncodeError: Object of type GroupResult is not JSON serializable
#app.task()
def process(i, param):
return mapping_func(item, param)
#app.task()
def collect(results):
print(results)
pipeline = prepare.s() | parallel_process.s(param) | collect.s()
pipeline.apply_async()
I get an error kombu.exceptions.EncodeError: Object of type GroupResult is not JSON serializable
Process task gets called, but collect task does not. Final result never comes. Is there any other way of doing this? Could not find appropriate example online.

According to your error message, you should change CELERY_RESULT_SERIALIZER to pickle, because GroupResult type is not JSON serializable.
Reference: https://docs.celeryproject.org/en/stable/userguide/configuration.html#result-serializer

Related

A good way to validate Items in a List Scala

Hi i have a question regarding validating a List in Scala. I have a method that looks somewhat like this:
def validate(item: Item): Try[Unit] = {
if (item.isValid) Success(())
else Failure(new isNotValidException("not valid"))
}
Now i am using this method to validate a itemList: List[Item] as follow:
def listIsValid(list: List[Item]): Try[Unit] = {
list
.map(validate(_))
.collectFirst{ case f # Failure(_: Exception) => f }
.getOrElse(Success(()))
}
Which i ultimately want to resolve to a single Try[Unit] (either Success if all items are valid, or a Failure if at least one item is not valid.
Is this a good way to validate all items in a List? Or is there a better way to validate items in lists? It should fail fast and if one item fails i don't need to know if other items are invalid at the time. The list validation is used in a for comprehension and ultimately needs to resolve to a single Try[Unit] again
i would use the method exists on Lists, and write this method as follows :
def listIsValid(list: List[Item]): Boolean = {
list.exists(!_.isValid)
}
If you want to check if every element in a sequence fulfills a predicate you use forall:
list.forall(_.isValid)
For more details check the forall documentation

Evaluating a Scala List[Either] into a Boolean

I am using scanamo to query a dynamodb and all i want to do is check that the db actually exists. I'm not really concerned with the record I get back, just that there were no errors. For the query part I'm using this:
trait DynamoTestTrait extends AbstractDynamoConfig { def test(): Future[List[Either[DynamoReadError, T]]] = ScanamoAsync.exec(client)table.consistently.limit(1).scan())}
that returns the Future List. I want to evaluate the first? item in the list and just return true if it is not a read error.
I thought this would work but it doesn't:
val result = test() match {
case r: DynamoReadError => Future.successful(false)
case r: Registration => Future.successful(true)
}
I'm new to scala so struggling with return types and things. This is a Play api call so i need to evaluate that boolen future at some point. With something like this:
def health = Action {
val isHealthy = h.testDynamo()
val b: Boolean = Await.result(isHealthy, scala.concurrent.duration.Duration(5, "seconds"))
Ok(Json.toJson(TestResponse(b.toString)))
}
I think this is probably wrong also as i don't want to use Await but i can't get async to work either.
Sorry, i'm kind of lost.
When i try to evaluate result i only get a message about the Future:
{
"status": 500,
"message": "Future(<not completed>) (of class scala.concurrent.impl.Promise$DefaultPromise)"
}
The result is a Future so you can't test the result without doing something like Await.result (as you do later). What you can do is modify the result returned by the Future to be the result you need.
In your case you can do this:
test().map(_.headOption.forall(_.isRight))
This will return Future[Boolean] which you can then use in your Await.result call.
Here is how it works:
map calls a function on the result of the Future, which is type List[Either[DynamoReadError, T]] and returns a new Future that gives the result of that function call.
_.headOption takes the head of the list and returns an Option[Either[DynamoReadError, T]]. This is Some(...) if there are one or more elements in the list, or None if the list is empty.
forall checks the contents of the Option and returns the result of the test on that option. If the Option is None then it returns true.
_.isRight tests the value Either[...] and returns true if the value is Right[...] and false if it is Left[...].
This does what you specified, but perhaps it would be better to check if any of the results failed, rather than just the first one? If so, it is actually a bit simpler:
test().map(_.forall(_.isRight))
This checks that all the entries in the List are Right, and fails as soon as a Left is found.
The problem with returning this from Play is a separate issue and should probably be in a separate question.

How can I get a return value from ScalaTest indicating test suite failure?

I'm running a ScalaTest (FlatSpec) suite programmatically, like so:
new MyAwesomeSpec().execute()
Is there some way I can figure out if all tests passed? Suite#execute() returns Unit here, so does not help. Ideally, I'd like to run the whole suite and then get a return value indicating whether any tests failed; an alternative would be to fail/return immediately on any failed test.
I can probably achieve this by writing a new FlatSpec subclass that overrides the Scalatest Suite#execute() method to return a value, but is there a better way to do what I want here?
org.scalatest.Suite also has run function, which returns the status of a single executed test.
With a few tweaking, we can access the execution results of each test. To run a test, we need to provide a Reporter instance. An ad-hoc empty reporter will be enough in our simple case:
val reporter = new Reporter() {
override def apply(e: Event) = {}
}
So, let's execute them:
import org.scalatest.events.Event
import org.scalatest.{Args, Reporter}
val testSuite = new MyAwesomeSpec()
val testNames = testSuite.testNames
testNames.foreach(test => {
val result = testSuite.run(Some(test), Args(reporter))
val status = if (result.succeeds()) "OK" else "FAILURE!"
println(s"Test: '$test'\n\tStatus=$status")
})
This will produce output similar to following:
Test: 'This test should pass'
Status=OK
Test: 'Another test should fail'
Status=FAILURE!
Having access to each test case name and its respective execution result, you should have enough data to achieve your goal.

How to save & return data from within Future callback

I've been facing an issue the past few days regarding saving & handling data from Futures in Scala. I'm new to the language and the concept of both. Lagom's documentation on Cassandra says to implement roughly 9 files of code and I want to ensure my database code works before spreading it out over that much code.
Specifically, I'm currently trying to implement a proof of concept to send data to/from the cassandra database that lagom implements for you. So far I'm able to send and retrieve data to/from the database, but I'm having trouble returning that data since this all runs asynchronously, and also returning that the data returned successfully.
I've been playing around for a while; The retrieval code looks like this:
override def getBucket(logicalBucket: String) = ServiceCall[NotUsed, String] {
request => {
val returnList = ListBuffer[String]()
println("Retrieving Bucket " + logicalBucket)
val readingFromTable = "SELECT * FROM data_access_service_impl.s3buckets;"
//DB query
var rowsFuture: Future[Seq[Row]] = cassandraSession.selectAll(readingFromTable)
println(rowsFuture)
Await.result(rowsFuture, 10 seconds)
rowsFuture onSuccess {
case rows => {
println(rows)
for (row <- rows) println(row.getString("name"))
for (row <- rows) returnList += row.getString("name")
println("ReturnList: " + returnList.mkString)
}
}
rowsFuture onFailure {
case e => println("An error has occured: " + e.getMessage)
Future {"An error has occured: " + e.getMessage}
}
Future.successful("ReturnList: " + returnList.mkString)
}
}
When this runs, I get the expected database values to 'println' in the onSuccess callback. However, that same variable, which I use in the return statement, outside of the callback prints as empty (and returns empty data as well). This also happens in the 'insertion' function I use, where it doesn't always return variables I set within callback functions.
If I try to put the statement within the callback function, I'm given an error of 'returns Unit, expects Future[String]'. So I'm stuck where I can't return from within the callback functions, so I can't guarantee I'm returning data).
The goal for me is to return a string to the API so that it shows a list of all the s3 bucket names within the DB. That would mean iterating through the Future[Seq[Row]] datatype, and saving the data into a concatenated string. If somebody could help with that, they'll solve 2 weeks of problems I've had reading through Lagom, Akka, Datastax, and Cassandra documentation. I'm flabbergasted at this point (information overload) and there's no clearcut guide I've found on this.
For reference, here's the cassandraSession documentation:
LagomTutorial/Documentation Style Information with their only cassandra-query example
CassandraSession.scala code
The key thing to understand about Future, (and Option, and Either, and Try) is that you do not (in general) get values out of them, you bring computations into them. The most common way to do that is with the map and flatMap methods.
In your case, you want to take a Seq[Row] and transform it into a String. However, your Seq[Row] is wrapped in this opaque data structure called Future, so you can't just rows.mkString as you would if you actually had a Seq[Row]. So, instead of getting the value and performing computation on it, bring your computation rows.mkString to the data:
//DB query
val rowsFuture: Future[Seq[Row]] = cassandraSession.selectAll(readingFromTable)
val rowToString = (row: Row) => row.getString("name")
val computation = (rows: Seq[Row]) => rows.map(rowToString).mkString
// Computation to the data, rather than the other way around
val resultFuture = rowsFuture.map(computation)
Now, when rowsFuture is completed, the new future that you created by calling rowsFuture.map will be fulfilled with the result of calling computation on the Seq[Row] that you actually care about.
At that point you can just return resultFuture and everything will work as anticipated, because the code that calls getBucket is expecting a Future and will handle it as is appropriate.
Why is Future opaque?
The simple reason is because it represents a value that may not currently exist. You can only get the value when the value is there, but when you start your call it isn't there. Rather than have you poll some isComplete field yourself, the code lets you register computations (callbacks, like onSuccess and onFailure) or create new derived future values using map and flatMap.
The deeper reason is because Future is a Monad and monads encompass computation, but do not have an operation to extract that computation out of them
Replace the select for your specific select and the field that you want to obtain for your specific field.The example is only for test, is not a architecture propose.
package ldg.com.dbmodule.model
/**
* Created by ldipotet on 05/11/17.
*/
import com.datastax.driver.core.{Cluster, ResultSet, ResultSetFuture}
import scala.util.{Failure, Success, Try}
import java.util.concurrent.TimeUnit
import scala.collection.JavaConversions._
//Use Implicit Global Context is strongly discouraged! we must create
//our OWN execution CONTEXT !
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{Future, _}
import scala.concurrent.duration._
object CassandraDataStaxClient {
//We create here a CallBack in Scala with the DataStax api
implicit def resultSetFutureToScala(rf: ResultSetFuture):
Future[ResultSet] = {
val promiseResult = Promise[ResultSet]()
val producer = Future {
getResultSet(rf) match {
//we write a promise with an specific value
case Success(resultset) => promiseResult success resultset
case Failure(e) => promiseResult failure (new
IllegalStateException)
}
}
promiseResult.future
}
def getResultSet: ResultSetFuture => Try[ResultSet] = rsetFuture => {
Try(
// Other choice can be:
// getUninterruptibly(long timeout, TimeUnit unit) throws
TimeoutException
// for an specific time
//can deal an IOException
rsetFuture.getUninterruptibly
)
}
def main(args: Array[String]) {
def defaultFutureUnit() = TimeUnit.SECONDS
val time = 20 seconds
//Better use addContactPoints and adds more tha one contact point
val cluster = Cluster.builder().addContactPoint("127.0.0.1").build()
val session = cluster.connect("myOwnKeySpace")
//session.executeAsync es asyncronous so we'll have here a
//ResultSetFuture
//converted to a resulset due to Implicitconversion
val future: Future[ResultSet] = session.executeAsync("SELECT * FROM
myOwnTable")
//blocking on a future is strongly discouraged!! next is an specific
//case
//to make sure that all of the futures have been completed
val results = Await.result(future,time).all()
results.foreach(row=>println(row.getString("any_String_Field"))
session.close()
cluster.close()
}
}

Share an object over multiple processes

Goal
I want to share an object with multiple processes. The Object has a list with method references. Each process should check for work in the List and process it.
Problem
As soon as I share the object the list gets empty
Code
###manager.BlockNet.py:###
import manager.BlockNetProcess
from multiprocessing import queues
class blockmgr():
WAITLIST=[]
kill=False
def __init__(self,):
pass
def adJob(self,job):
if not job in self.WAITLIST:
self.WAITLIST.append(job)
def getJob(self):
j=self.WAITLIST.pop(0)
return j
def __empty(self,id):
print('empty')
def startProcesses(self,mgr,number=8):
self.qeue=queues.Queue()
self.queue.put(mgr)
result=manager.BlockNetProcess.start(q,number)
return result
mgr is the object I want to share between all processes
###manager.BlockNetProcess.py:###
from multiprocessing import pool,queues
def worker(q):
'''this is the method each process executes'''
print('start Worker')
mgr=q.get()
while not mgr.kill:
mgr=q.get()
job=mgr.getJob()
print(job)
#job[0](job[1])
return []
def start(q,number):
for i in range(number):
p = pool.Pool(processes=1)
result=p.apply_async(worker,[q])
return result
After starting the processes I get an error because I cann not pop from a list if it's empty. But I've added thing to the list before calling the startProcesses Method.
Where is my mistake? Or is there a better Method to exchange Objects between processes?