Celery - error handling and data storage - celery

I'm trying to better understand common strategies regarding results and errors in Celery.
I see that results have statuses/states and stores results if requested -- when would I use this data? Should error handling and data storage be contained within the task?
Here is a sample scenario, in case it helps better understand my objective:
I have a geocoding task that goeocodes user addresses. If the task fails or succeeds, I'd like to update a field in the database letting the user know. (Error handling) On success I'd like the geocoded data to be inserted into the database (Data storage)
What approach should take?

Let me preface this by saying that I'm still getting a feel for Celery myself. That being said, I have some general inclinations about how I'd go about tackling this, and since no one else has responded, I'll give it a shot.
Based on what you've written, a relatively simple (though I suspect non-optimized) solution is to follow the broad contours of the blog comment spam task example from the documentation.
app.models.py
class Address(models.Model):
GEOCODE_STATUS_CHOICES = (
('pr', 'pre-check'),
('su', 'success'),
('fl', 'failed'),
)
address = models.TextField()
...
geocode = models.TextField()
geocode_status = models.CharField(max_length=2,
choices=GEOCODE_STATUS_CHOICES,
default='pr')
class AppUser(models.Model):
name = models.CharField(max_length=100)
...
address = models.ForeignKey(Address)
app.tasks.py
from celery import task
from app.models import Address, AppUser
from some_module import geocode_function #assuming this returns a string
#task()
def get_geocode(appuser_pk):
user = AppUser.objects.get(pk=appuser_pk)
address = user.address
try:
result = geocode_function(address.address)
address.geocode = result
address.geocode_status = 'su' #set address object as successful
address.save()
return address.geocode #this is optional -- your task doesn't have to return anything
on the other hand, you could also choose to decouple the geo-
code function from the database update for the object instance.
Also, if you're thinking about chaining tasks together, you
might think about if it's advantageous to pass a parameter as
an input or partial input into the child task.
except Exception as e:
address.geocode_status = 'fl' #address object fails
address.save()
#do something_else()
raise #re-raise the error, in case you want to trigger retries, etc
app.views.py
from app.tasks import *
from app.models import *
from django.shortcuts import get_object_or_404
def geocode_for_address(request, app_user_pk):
app_user = get_object_or_404(AppUser, pk=app_user_pk)
...etc.etc. --- **somewhere calling your tasks with appropriate args/kwargs
I believe this meets the minimal requirements you've outlined above. I've intentionally left the view undeveloped since I don't have a sense of how exactly you want to trigger it. It sounds like you also may want some sort of user notification when their address can't be geocoded ("I'd like to update a field in a database letting a user know"). Without knowing more about the specifics of this requirement, I would it sounds like something that might be best accomplished in your html templates (if instance.attribute value is X, display q in template) or by using a django.signals (set up a signal for when a user.address.geocode_status switches to failure -- say, by emailing the user to let them know, etc.).
In the comments to the code above, I mentioned the possibility of decoupling and chaining the component parts of the get_geocode task above. You could also think about decoupling the exception handling from the get_geocode task, by writing a custom error handler task, and using the link_error parameter (for instance., add.apply_async((2, 2), link_error=error_handler.s(), where error_handler has been defined as a task in app.tasks.py ). Also, whether you choose to handle errors via the main task (get_geocode) or via a linked error handler, I would think that you would want to get much more specific about how to handle different sorts of errors (e.g., do something with connection errors different than with address data being incorrectly formatted).
I suspect there are better approaches, and I'm just beginning to understand how inventive you can get by chaining tasks, using groups and chords, etc. Hope this helps at least get you thinking about some of the possibilities. I'll leave it to others to recommend best practices.

Related

How to send message by message to Kafka

I'm new to reactive programming and I try to implement a very basic scenario.
I want to send a message to kafka each time a file is dropped to a specific folder.
I think that I don't understand well the basics things... so please could you help me?
So I have a few questions :
What is the difference between smallrye-reactive-messaging and smallrye-reactive-streams-operators ?
I have this simple code :
#Outgoing( "my-topic" )
public PublisherBuilder<Message<MessageWrapper>> generate() {
if(Objects.isNull(currentMessage)){
//currentMessage is an instance variable which is null when I start the application
return ReactiveStreams.of(new MessageWrapper()).map(Message::of);
}
else {
//currentMessage has been correctly set with the file information
LOGGER.info(currentMessage);
return ReactiveStreams.of(currentMessage).map(Message::of);
}
}
When the code goes in the if statement, everything is ok and I got a JSON serialization of my object will null values. However I don't understand why when my code goes to the else statement, nothing goes to the topic? It seems that the .of instructions of the if statement has broke the streams or something like that...
How to keep a continuous streams that 'react' to the new dropped files ? (or other events like HTTP GET request or something like that) ...
If I don't return an instance of PublisherBuilder but an Integer for example, then my kafka topic will be populated by a very huge stream of Integer value. This is why examples are using some intervals when sending messages...
Should I use some CompletationStage or CompletableFuture ? RxJAva2? It's a bit confusing which lib to use (vertx, smallrye, rxjava2, microprofile, ...)
What are the differences between :
ReactiveStreams.fromCompletionStage
ReactiveStreams.fromProcessor
ReactiveStreams.fromPublisher
ReactiveStreams.fromSubscriber
Which one to use on which scenario ?
Thank you very much !
Let's start with the difference between smallrye-reactive-messaging & smallrye-reactive-streams-operators: smallrye-reactive-streams-operators is the same as smallrye-reactive-messaging but in addition it has a support to MicroProfile-context-propagation. Since most reactive-messaging providers use Vert.x behind the scene, it will process your message in an event-loop style, which means it will run in separate thread. Sometimes you need to propagate some ctx from your base thread into the new thread (ex: populating CDI and Tx context to execute some JPA Entity manager logic). Here where ctx propagation help.
For method signatures. You can take a look at the official documentation of SmallRye-reactive-streams sections 3,4 & 5. Each one has a different use case. It is up to you which flavor do you want to use.
When to use what ? If you are not running within reactive context, you can use the below to send messages.
#Inject
#Channel("my-channel")
Emitter emitter;
For Message consumption you can use method signature like this :
#Incoming("channel-2")
public CompletionStage doSomething(Message anEvent)
Or
#Incoming("channel-2")
public void doSomething(String anEvent)
Hope that helps.

How to use delta trigger in flink?

I want to use the deltatrigger in apache flink (flink 1.3) but I have some trouble with this code :
.trigger(DeltaTrigger.of(100, new DeltaFunction[uniqStruct] {
override def getDelta(oldFp: uniqStruct, newFp: uniqStruct): Double = newFp.time - oldFp.time
}, TypeInformation[uniqStruct]))
And I have this error:
error: object org.apache.flink.api.common.typeinfo.TypeInformation is not a value [ERROR] }, TypeInformation[uniqStruct]))
I don't understand why DeltaTrigger need TypeSerializer[T]
and I don't know what to do to remove this error.
Thanks a lot everyone.
I would read into this a bit https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/types_serialization.html sounds like you can create a serializer using typeInfo.createSerializer(config) on your type info. Note what you're passing in currently is a type itself and NOT the type info which is why you're getting the error you are.
You would need to do something more like
val uniqStructTypeInfo: TypeInformation[uniqStruct] = createTypeInformation[uniqStruct]
val uniqStrictTypeSerializer = typeInfo.createSerializer(config)
To quote the page above regarding the config param you need to pass to create serializer
The config parameter is of type ExecutionConfig and holds the
information about the program’s registered custom serializers. Where
ever possibly, try to pass the programs proper ExecutionConfig. You
can usually obtain it from DataStream or DataSet via calling
getExecutionConfig(). Inside functions (like MapFunction), you can get
it by making the function a Rich Function and calling
getRuntimeContext().getExecutionConfig().
DeltaTrigger needs a TypeSerializer because it uses Flink's managed state mechanism to store each element for later comparison with the next one (it just keeps one element, the last one, which is updated as new elements arrive).
You will find an example (in Java) here.
But if all you need is a window that triggers every 100msec, then it'll be easier to just use a TimeWindow, such as
input
.keyBy(<key selector>)
.timeWindow(Time.milliseconds(100)))
.apply(<window function>)
Updated:
To have hour-long windows that trigger every 100msec, you could use sliding windows. However, you would have 10 * 60 * 60 windows, and every event would be placed into each of these 36000 windows. So that's not a great idea.
If you use a GlobalWindow with a DeltaTrigger, then the window will be triggered only when events are more than 100msec apart, which isn't what you've said you want.
I suggest you look at ProcessFunction. It should be straightforward to get what you want that way.

Any way to ensure frisby.js test API calls go in sequential order?

I'm trying a simple sequence of tests on an API:
Create a user resource with a POST
Request the user resource with a GET
Delete the user resource with a DELETE
I've a single frisby test spec file mytest_spec.js. I've broken the test into 3 discrete steps, each with their own toss() like:
f1 = frisby.create("Create");
f1.post(post_url, {user_id: 1});
f1.expectStatus(201);
f1.toss();
// stuff...
f2 = frisby.create("Get");
f2.get(get_url);
f2.expectStatus(200);
f2.toss();
//Stuff...
f3 = frisby.create("delete");
f3.get(delete_url);
f3.expectStatus(200);
f3.toss();
Pretty basic stuff, right. However, there is no guarantee they'll execute in order as far as I can tell as they're asynchronous, so I might get a 404 on test 2 or 3 if the user doesn't exist by the time they run.
Does anyone know the correct way to create sequential tests in Frisby?
As you correctly pointed out, Frisby.js is asynchronous. There are several approaches to force it to run more synchronously. The easiest but not the cleanest one is to use .after(() -> ... you can find more about after() in Fisby.js docs.

CQ5 / AEM5.6 Workflow: Access workflow instance properties from inside OR Split

TL;DR version:
In CQ workflows, is there a difference between what's available to the OR Split compared to the Process Step?
Is it possible to access the /history/ nodes of a workflow instance from within an OR Split?
How?!
The whole story:
I'm working on a workflow in CQ5 / AEM5.6.
In this workflow I have a custom dialog, which stores a couple of properties on the workflow instance.
The path to the property I'm having trouble with is: /workflow/instances/[this instance]/history/[workItem id]/workItem/metaData and I've called the property "reject-or-approve".
The dialog sets the property fine (via a dropdown that lets you set it to "reject" or "approve"), and I can access other properties on this node via a process step (in ecma script) using:
var actionReason;
var history = workflowSession.getHistory(workItem.getWorkflow());
// loop backwards through workItems
// and as soon as we find a Action Reason that is not empty
// store that as 'actionReason' and break.
for (var index = history.size() - 1; index >= 0; index--) {
var previous = history.get(index);
var tempActionReason = previous.getWorkItem().getMetaDataMap().get('action-message');
if ((tempActionReason != '')&&(tempActionReason != null)) {
actionReason = tempActionReason;
break;
}
}
The process step is not the problem though. Where I'm having trouble is when I try to do the same thing from inside an OR Split.
When I try the same workflowSession.getHistory(workItem.getWorkflow()) in an OR Split, it throws an error saying workItem is not defined.
I've tried storing this property on the payload instead (i.e. storing it under the page's jcr:content), and in that case the property does seem to be available to the OR Split, but my problems with that are:
This reject-or-approve property is only relevant to the current workflow instance, so storing it on the page's jcr:content doesn't really make sense. jcr:content properties will persist after the workflow is closed, and will be accessible to future workflow instances. I could work around this (i.e. don't let workflows do anything based on the property unless I'm sure this instance has written to the property already), but this doesn't feel right and is probably error-prone.
For some reason, when running through the custom dialog in my workflow, only the Admin user group seems to be able to write to the jcr:content property. When I use the dialog as any other user group (which I need to do for this workflow design), the dialog looks as though it's working, but never actually writes to the jcr:content property.
So for a couple of different reasons I'd rather keep this property local to the workflow instance instead of storing it on the page's jcr:content -- however, if anyone can think of a reason why my dialog isn't setting the property on the jcr:content when I use any group other than admin, that would give me a workaround even if it's not exactly the solution I'm looking for.
Thanks in advance if anyone can help! I know this is kind of obscure, but I've been stuck on it for ages.
a couple of days ago i ran into the same issue. The issue here is that you don't have the workItem object, because you don't really have an existing workItem. Imagine the following: you are going through the workflow, you got a couple of workItems, with means, either process step, either inbox item. When you are in an or split, you don't have existing workItems, you can ensure by visiting the /workItems node of the workflow instance. Your workaround seems to be the only way to go through this "issue".
I've solved it. It's not all that elegant looking, but it seems to be a pretty solid solution.
Here's some background:
Dialogs seem to reliably let you store properties either on:
the payload's jcr:content node (which wasn't practical for me, because the payload is locked during the workflow, and doesn't let non-admins write to its jcr:content)
the workItem/metaData for the current workflow step
However, Split steps don't have access to workItem. I found a fairly un-helpful confirmation of that here: http://blogs.adobe.com/dmcmahon/2013/03/26/cq5-failure-running-script-etcworkflowscriptscaworkitem-ecma-referenceerror-workitem-is-not-defined/
So basically the issue was, the Dialog step could store the property, but the OR Split couldn't access it.
My workaround was to add a Process step straight after the Dialog in my workflow. Process steps do have access to workItem, so they can read the property set by the Dialog. I never particularly wanted to store this data on the payload's jcr:content, so I looked for another location. It turns out the workflow metaData (at the top level of the workflow instance node, rather than workItem/metaData, which is inside the /history sub-node) is accessible to both the Process step and the OR Split. So, my Process step now reads the workItem's approveReject property (set by the Dialog), and then writes it to the workflow's metaData node. Then, the OR Split reads the property from its new location, and does its magic.
The way you access the workflow metaData from the Process step and the OR Split is not consistent, but you can get there from both.
Here's some code: (complete with comments. You're welcome)
In the dialog where you choose to approve or reject, the name of the field is set to rejectApprove. There's no ./ or anything before it. This tells it to store the property on the workItem/metaData node for the current workflow step under /history/.
Straight after the dialog, a Process step runs this:
var rejectApprove;
var history = workflowSession.getHistory(workItem.getWorkflow());
// loop backwards through workItems
// and as soon as we find a rejectApprove that is not empty
// store that as 'rejectApprove' and break.
for (var index = history.size() - 1; index >= 0; index--) {
var previous = history.get(index);
var tempRejectApprove = previous.getWorkItem().getMetaDataMap().get('rejectApprove');
if ((tempRejectApprove != '')&&(tempRejectApprove != null)) {
rejectApprove = tempRejectApprove;
break;
}
}
// steps up from the workflow step into the workflow metaData,
// and stores the rejectApprove property there
// (where it can be accessed by an OR Split)
workItem.getWorkflowData().getMetaData().put('rejectApprove', rejectApprove);
Then after the Process step, the OR Split has the following in its tabs:
function check() {
var match = 'approve';
if (workflowData.getMetaData().get('rejectApprove') == match) {
return true;
} else {
return false;
}
}
Note: use this for the tab for the "approve" path, then copy it and replace var match = 'approve' with var match = 'reject'
So the key here is that from a Process step:
workItem.getWorkflowData().getMetaData().put('rejectApprove', rejectApprove);
writes to the same property that:
workflowData.getMetaData().get('rejectApprove') reads from when you execute it in an OR Split.
To suit our business requirements, there's more to the workflow I've implemented than just this, but the method above seems to be a pretty reliable way to get values that are entered in a dialog, and access them from within an OR Split.
It seems pretty silly that the OR Split can't access the workItem directly, and I'd be interested to know if there's a less roundabout way of doing this, but for now this has solved my problem.
I really hope someone else has this same problem, and finds this useful, because it took me waaay to long to figure out, to only apply it once!

Django Tastypie, Remove Elements From ManyToMany Fields

I am using Tastypie, Django for my project.
To Update a many to many field I have used save_m2m hook.
def save_m2m(self, bundle):
for field_name, field_object in self.fields.items():
if not getattr(field_object, 'is_m2m', False):
continue
if not field_object.attribute:
continue
if field_object.readonly:
continue
related_mngr = getattr(bundle.obj, field_object.attribute)
related_objs = []
print bundle.data[field_name]
for related_bundle in bundle.data[field_name]:
try:
stock = Stock.objects.get(nse_symbol = related_bundle.obj.nse_symbol)
print stock.__dict__
except Stock.DoesNotExist as e:
dataa = {"error_message": e}
raise ImmediateHttpResponse(response=HttpBadRequest(content=json.dumps(dataa), content_type="application/json; charset=UTF-8"))
related_objs.append(stock)
related_mngr.add(*related_objs)
Now I want to remove elements from the same many to many field.
How should I achieve this. Do I have to send a patch request or delete request and how to handle this.
I am begineer in tastypie. I googled it some time and I couldn't find a proper way. Please guide me how to complete this.
Thanks.
I've thought a lot about handing m2m relationships, since most of our app depends on m2m links.
I've settled for the approach of an update method. Pass in the all the references of the relationships you want changed (add and remove), then update the db accordingly. We only pass in the changed values, since if you have a paginated list, you only want to update the items the user has identified. Generally I use a custom hook for this defined in override_urls.
I used to have a separate add and remove method, which worked well until we changed the gui and allowed users simply to change checkboxes. In that approach having an update method was much more useful. You'll have to decide on which method suits your application the best.