SparkJobServer - is validate() always called before runJob() - spark-jobserver

According to the SparkJobServer documentation:
validate allows for an initial validation of the context and any
provided configuration. If the context and configuration are OK to run the job, returning spark.jobserver.SparkJobValid will let the job execute, otherwise
returning spark.jobserver.SparkJobInvalid(reason) prevents the job from running and provides means to convey the reason of failure. In this case, the call immediately returns an HTTP/1.1 400 Bad Request status code.
validate helps you preventing running jobs that will eventually fail due to missing or wrong configuration and save both time and resources.
Can I therefore assume that validate() would always be called before runJob()?
If I load and verify the job configuration in validate(), can my runJob() assume it was loaded correctly and is available where validate() left it?

Yes, your assumption is correct. See https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server/src/spark.jobserver/JobManagerActor.scala#L268

Related

Should a wrong parameter passed via REST call throw an error?

I was accessing REST calls, when I passed wrong parameter to GET request it does not throw any http error. Should the design be changed to throw a http error or wrong parameter can be passed to REST call.
Example 1:(parameters are optional)
https://example.com/api/fruits?fruit=apple
Give list of all apple elements
Example 2:
https://example.com/api/fruits?abc=asb
Give list of all fruits
My question is related to example 2, should example 2 throw an error or is it behaving properly?
It's pretty common to ignore parameters that you aren't necessarily expecting. I think example 2 is behaving as it should.
I know that depending on the browser I would sometimes append an extra variable with a timestamp to make sure that the rest call wouldn't be cached. Something like:
https://example.com/api/fruits?ihateie=2342342342
If you're not explicitly doing anything with the extra parameter then I can't see the harm in allowing it.
For a GET request, the request-line is defined as follows
request-line = 'GET' SP request-target SP HTTP-version CRLF
where request-target "...identifies the target resource upon which to apply the request".
That means that the path /api/fruits, the question-mark ? and the query abc=asb are all part of the identifier.
The fact that your implementation happens to use the path to route the request to a handler, and the query to provide arguments, is an accident of your current implementation.
That leaves you with the freedom to decide that
/api/fruits?abc=asb does exist, and its current state is a list of all fruits
/api/fruits?abc=asb does exist, and its current state is an empty list
/api/fruits?abc=asb does exist, and its current state is something else
/api/fruits?abc=asb does not exist, and attempting to access its current state is an error.
My question is related to example 2, should example 2 throw an error or is it behaving properly?
If abc=asb indicates that there is some sort of error in the client, then you should return a 4xx status to indicate that.
Another way of thinking about the parameter handling is in terms of Must Ignore vs Must Understand.
As a practical matter, if I'm a consumer expecting that my filter is going to result in a small result set, and instead I end up drinking a billion unfiltered records out of a fire hose, I'm not going to be happy.
I'd recommend that in the case of a bad input you find a way to fail safely. On the web, that would probably mean a 404, with an HTML representation explaining the problem, enumerating recognized filters, maybe including a web form that helps resend the query, etc. Translate that into your API in whatever way makes sense.
But choosing to treat that as a successful request and return some representation also works, it's still REST, the web is going to web. If doing it that way gives you consumers a better experience, thereby increasing adoption and making your api more successful, then the answer is easy.

RESTful APIs: what to return when updating an entity produces side-effects

One of our APIs has a tasks resource. Consumers of the API might create, delete and update a given task as they wish.
If a task is completed (i.e., its status is changed via PUT /tasks/<id>), a new task might be created automatically as a result.
We are trying to keep it RESTful. What would be the correct way to tell the calling user that a new task has been created? The following solutions came to my mind, but all of them have weaknesses in my opinion:
Include an additional field on the PUT response which contains information about an eventual new task.
Return only the updated task, and expect the user to call GET /tasks in order to check if any new tasks have been created.
Option 1 breaks the RESTful-ness in my opinion, since the API is expected to return only information regarding the updated entity. Option 2 expects the user to do stuff, but if he doesn't then no one will realize that a new task was created.
Thank you.
UPDATE: the PUT call returns an HTTP 200 code along the full JSON representation of the updated task.
#tophallen suggests having a task tree so that (if I got it right) the returned entity in option 2 contains the new task as a direct child.
You really have 2 options with a 200 status PUT, you can do headers (which if you do, check out this post). Certainly not a bad option, but you would want to make sure it was normalized site-wide, well documented, and that you didn't have anything such as firewalls/F5's/etc/ re-writing your headers.
Something like this would be a fair option though:
HTTP/1.1 200 OK
Related-Tasks: /tasks/11;/tasks/12
{ ...task response... }
Or you have to give some indication to the client in the response body. You could have a task structure that supports child tasks being on it, or you could normalize all responses to include room for "meta" stuff, i.e.
HTTP/1.1 200 OK
{
"data": { ...the task },
"related_tasks": [],
"aggregate_status": "PartiallyComplete"
}
Something like this used everywhere (a bit of work as it sounds like you aren't just starting this project) can be very useful, as you can also use it for scenarios like paging.
Personally, I think if you made the related_tasks property just contain either routes to call for the child tasks, or id's to call, that might be best, lighter responses, since the client might not always care to call to check on said child-task immediately anyways.
EDIT:
Actually, the more I think about it - the more headers would make sense in your situation - as a client can update a task at any point during the task processing, there may or may not be a child task in play - so modifying the data structure for the off-chance the client calls to update a task when a child task has started seems more work than benefit. A header would allow you to easily add a child task and notify the user at any point - you could apply the same thing for a POST of a task that happens to immediately finish and kicks off a child task, etc. It can easily support more than one task. I think this as well keeps it the most restful and reduces server calls, a client would always be able to know what is going on in the process chain. The details of the header could define, but I believe it is more traditional in a scenario like this to have it point to a resource, rather than a key within a resource.
If there are other options though, I'm very interested to hear them.
It looks like you're very concerned about being RESTful, but you're not using HATEOAS, which is contradictory. If you use HATEOAS, the related entity is just another link and the client can follow them as they please. What you have is a non-problem in REST. If this sounds new to you, read this: http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
Option 1 breaks the RESTful-ness in my opinion, since the API is
expected to return only information regarding the updated entity.
This is not true. The API is expected to return whatever is documented as the information available for that media-type. If you documented that a task has a field for related side-effects tasks, there's nothing wrong with it.

Execute batch job (JSR 352) on startup

I have a batchjob and need to run it up the application. He makes the call for the job, but the job does not reach the method.
BatchRuntime.getJobOperator().start(JOB_NAME, new Properties());
Throws no errors. So it seems that he is looking for the resource that indicates which class Implementing this job, but not yet loaded. Any idea?
The start() method is asynch so the caller isn't going to always see exceptions on failure.
Is the XML corresponding to JOB_NAME found? Any errors in the logs?

Accumulated methods invocation order in RequestContext#fire()?

Javadoc for RequestContext#fire() says only:
Send the accumulated changes and method invocations associated with the RequestContext.
GWT Moving Parts wiki entry under Flow section says only:
All accumulated operations will be applied to the domain objects by traversing properties of the proxies.
All method invocations in the payload are executed.
But will these methods be executed on the server side in the same order they were "executed" (accumulated) on ReqestContext instance on client side?
For my situation, if I execute on client side:
context.persist().using(proxy)
context.find(proxy.stableId().to(updatingReceiver))
context.fire()
Then may I be sure that on server side find() will be invoked after persist() so my updatingReceiver will get proxy of updated (persist()'ed) entity as an argument?
EDIT:
Going further, may I be sure that back on client after response Recievers will be invoked in exactly the same order in which their corresponding methods were accumulated?
Finally, is there a way to add some action that will be invoked at the end of response handling, after all Receivers' actions?
I thought something like this may work:
requestContext.fire(new Receiver<Void>() {
#Override
public void onSuccess(Void response) {
//Things to do after all receivers
});
And it really seems to work as I expected but because all that Javadoc is telling me about RequestContext.fire(Receiver) method is:
For receiving errors or validation failures only.
I'm not 100% sure whether my assumption is correct.
Yes, order of method invocations is preserved, both on the server-side and then back on the client side when calling Recievers.
The queue is a simple ArrayList in which invocation objects are appended. On the server-side, they're processed in the order they're received.
The Request-Context-level Receiver is always called after the ones for invocations. Its onSuccess is always called, whatever the result of the invocations (even if they all fail), to signal that the batch of invocations was processed successfully. Its onFailure is only called in case of a general failure, i.e. a network error, or an error when (de)serializing requests/responses on the server-side.
See http://code.google.com/p/google-web-toolkit/source/browse/trunk/user/src/com/google/web/bindery/requestfactory/shared/impl/AbstractRequestContext.java?r=10835#345

Why my custom activity never returns?

I'm a newbie to WF and rather lost. Here's what I have so far:
I've created a workflow service app (xamlx), added needed variables
I've created a custom NativeActivity where I'm calling CreateBookmark from within Execute, which is between the Receive & Send activity for the service. (Ultimately this will actually do something besides creating the bookmark).
The bookmark gets created just fine, but after stepping out of the Execute method, nothing happens for one minute until the service times out, giving me that message "The request channel timed out while waiting for a reply after 00:00:59.9699970. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout." (I tried posting an image of the xamlx, but as a newbie it won't let me; suffice it to say I'm getting from my Receive, into my custom native activity, but never getting as far as the SendReply).
I assume I'm missing something rather fundatmental, but I can't see what. I've originally tried using NativeActivity<T> to return what I want, but that behaves the same.
Found out what I was doing wrong: needed to use overload of CreateBookmark that has BookmarkOptions parameter and set it to BookmarkOptions.NonBlocking.
Strangely, I did not find one example anywhere that mentioned this.