Step/Tasklet executed always even if job fails - spring-batch

I designed a tasklet which sends a mail based on reading an "status" execution context param I customly store.
|STEP 1| - |STEP 2| - |STEP 3| - |STEP 4| (Happy Path)
| | |
On fail, interrupt flow and go starightforward into STEP 4 (Mail Sender)
Any step on the flow must be forwarded to that tasklet in case something fails, which implies I must deal with JobConfiguration flows. Somehow I feel like my design is a bit clumsy, but the alternative does not convince me either.
Is the aforementioned approach preferable to having all this code as a side-effect on each step by means of ExecutionListeners + MailService?
Is there a cleaner alternative approach? What does spring-batch experts usually do?

Related

What is the benefit of effect system (e.g. ZIO)?

I'm having hard time understanding what value effect systems, like ZIO or Cats Effect.
It does not make code readable, e.g.:
val wrappedB = for {
a <- getA() // : ZIO[R, E, A]
b <- getB(a) // : ZIO[R, E, B]
} yield b
is no more readable to me than:
val a = getA() // : A
val b = getB(a) // : B
I could even argue, that the latter is more straight forward, because calling a function executes it, instead of just creating an effect or execution pipeline.
Delayed execution does not sound convincing, because all examples I've encountered so far are just executing the pipeline right away anyways. Being able to execute effects in parallel or multiple time can be achieved in simpler ways IMHO, e.g. C# has Parallel.ForEach
Composability. Functions can be composed without using effects, e.g. by plain composition.
Pure functional methods. In the end the pure instructions will be executed, so it seems like it's just pretending DB access is pure. It does not help to reason, because while construction of the instructions is pure, executing them is not.
I may be missing something or just downplaying the benefits above or maybe benefits are bigger in certain situations (e.g. complex domain).
What are the biggest selling points to use effect systems?
Because it makes it easy to deal with side effects. From your example:
a <- getA() // ZIO[R, E, A] (doesn't have to be ZIO btw)
val a = getA(): A
The first getA accounts in the effect and the possibility of returning an error, a side effect. This would be like getting an A from some db where the said A may not exist or that you lack permission to access it. The second getA would be like a simple def getA = "A".
How do we put these methods together ? What if one throws an error ? Should we still proceed to the next method or just quit it ? What if one blocks your thread ?
Hopefully that addresses your second point about composability. To quickly address the rest:
Delayed execution. There are probably two reasons for this. The first is you actually don't want to accidentally start an execution. Or just because you write it it starts right away. This breaks what the cool guys refer to as referential transparency. The second is concurrent execution requires a thread pool or execution context. Normally we want to have a centralized place where we can fine tune it for the whole app. And when building a library we can't provide it ourselves. It's the users who provide it. In fact we can also defer the effect. All you do is define how the effect should behave and the users can use ZIO, Monix, etc, it's totally up to them.
Purity. Technically speaking wrapping a process in a pure effect doesn't necessarily mean the underlying process actually uses it. Only the implementation knows if it's really used or not. What we can do is lift it to make it compatible with the composition.
what makes programming with ZIO or Cats great is when it comes to concurrent programming. They are also other reasons but this one is IMHO where I got the "Ah Ah! Now I got it".
Try to write a program that monitor the content of several folders and for each files added to the folders parse their content but not more than 4 files at the same time. (Like the example in the video "What Java developpers could learn from ZIO" By Adam Fraser on youtube https://www.youtube.com/watch?v=wxpkMojvz24 .
I mean this in ZIO is really easy to write :)
The all idea behind the fact that you combine data structure (A ZIO is a data structure) in order to make bigger data structure is so easy to understand that I would not want to code without it for complex problems :)
The two examples are not comparable since an error in the first statement will mark as faulty the value equal to the objectified sequence in the first form while it will halt the whole program in the second. The second form shall then be a function definition to properly encapsulate the two statements, followed by an affectation of the result of its call.
But more than that, in order to completely mimic the first form, some additional code has to be written, to catch exceptions and build a true faulty result, while all these things are made for free by ZIO...
I think that the ability to cleanly propagate the error state between successive statements is the real value of the ZIO approach. Any composite ZIO program fragment is then fully composable itself.
That's the main benefit of any workflow based approach, anyway.
It is this modularity which gives to effect handling its real value.
Since an effect is an action which structurally may produce errors, handling effects like this is an excellent way to handle errors in a composable way. In fact, handling effects consists in handling errors !

ReactiveX Retry with Multiple Consumers

Quick question, because I feel like I must be missing something.
I'm using rxjs here because it's what I've got in-front of me, this is a general reactiveX question, I believe.
Let's say I have a set of Observables like so:
network_request = some_thing // An observable that produces the result of a network call
event_stream = network_request.flatMapLatest(function(v) {
return connectToThing(v) // This is another observable that needs v
}) // This uses the result of the network call to form a long-term event-based connection
So, this works ok.
Problem, though.
Sometimes the connection thing fails.
So, if I do event_stream.retry() it works great. When it fails, it redoes the network call and gets a new v to use to make a new connection.
Problem
What happens if I want two things chained off of my network_request?
Maybe I want the UI to do something every time the network call completes, like show something about v in the UI?
I can do:
shared = network_request.share() // Other implementations call this refCount
event_stream = shared.flatMapLatest(...) // same as above
ui_stream = shared.flatMapLatest(...) // Other transformation on network response
If I didn't do share then it would have made two requests, which isn't what I want, but with share, when event_stream later has an error, it doesn't retry the network request because the refcount is still at 1 (due to ui_stream), so it immediately returns completed.
What I want
This is obviously a small example I've made up to explain my confusion.
What I want is that every time the result of event_stream (that long term connection) has an error all of the following happens:
the network request is made again
the new response of that request is used to build a new connection and event_stream goes on with new events like nothing happened
that same response is also emitted in ui_stream to lead to further processing
This doesn't feel like a complicated thing, so I must just be misunderstanding something fundamental when it comes to splitting / fanning out RX things.
Workarounds I think I could do but would like to avoid
I'm looking to export these observables, so I can't just build them again and then say "Hey, here's the new thing". I want event_stream and all the downstream processing to not know there's been a disconnection.
Same for ui_stream. It just got a new value.
I could probably work something out using a Subject as a generation counter that I ping every time I want everything to restart, and put the network_request into a flatMap based on that, so that I can break the share...
But that feels like a really hacky solution, so I feel there has to be a better way than that.
What have I fundamentally misunderstood?
As I've been thinking about this more I've come to the same realization as ionoy, which is that retry just disconnects and reconnects, and upstream doesn't know it was due to an error.
When I thought about what I wanted, I realized I really wanted something like a chain, and also a spectator, so I've got this for now:
network_request = some_thing
network_shadow = new Rx.Subject()
event_stream = network_request.do(network_shadow).flatMapLatest(...)
ui_stream = network_shadow.whatever
This has the property where an retry in event_stream or downstream will cause the whole thing to restart, whereas ui_stream is its own thing.
Any errors over there don't do anything, since network_shadow isn't actually a subscriber to event_stream, but it does peel the values off so long as the main event chain is running.
I feel like this isn't ideal, but it is better than what I was concerned I would have to do, which is have a restartEverything.onNext() in an doOnError, which would have been gross.
I'm going to work with this for now, and we'll see where it bites me...
You need to make your cold observable hot by using Publish. Read http://www.introtorx.com/Content/v1.0.10621.0/14_HotAndColdObservables.html#HotAndCold for a great explanation.

Are Futures in Scala really functional?

I am reading this blog post that claims Futures are not "functional" since they are just wrappers of side-effectful computations. For instance, they contain RPC calls, HTTP requests, etc. Is it correct ?
The blog post gives the following example:
def twoUsersFeed(a: UserHandle, b: UserHandle)
(implicit ec: ExecutionContext): Future[Html] =
for {
feedA <- usersFeed(a)
feedB <- usersFeed(b)
} yield feedA ++ feedB
you lose the desired property: consistent results (the referential transparency). Also you lose the property of making as few requests as possible. It is difficult to use multi-valued requests and have composable code.
I am afraid I don't get it. Could you explain how we lose the consistent result in this case ?
The blog post fails to draw a proper distinction between Future itself and the way it's commonly used, IMO. You could write pure-functional code with Future, if you only ever wrote Futures that called pure, total functions; such code would be referentially transparent and "functional" in every remotely reasonable sense of the word.
What is true is that Futures give you limited control of side effects, if you use them with methods that have side effects. If you create a Futurewrapping webClient.get, then creating that Future will send a HTTP call. But that's not a fact about Future, that's a fact about webClient.get!
There is a grain of truth in this blog post. Separating expressing your computation from executing it, completely, via e.g. the Free monad, can result in more efficient and more testable code. E.g. you can create a "query language", where you express an operation like "fetch the profile photos of all the mutual friends of A and B" without actually running it. This makes it easier to test if your logic is correct (because it's very easy to make e.g. a test implementation that can "run" the same queries - or even just inspect the "query object" directly) and, as I think the blog post is trying to suggest, means you could e.g. combine multiple requests to fetch the same profile. (This isn't even purely a functional-programming concern - some OO books have the idea of a "command pattern" - though IME functional programming tools like for/yield syntax make it much easier to work in this way). Whereas if all you have is a fetchProfile method that, when run, immediately fires off a HTTP request, then if your code logic requests the same profile twice, there's no way to avoid fetching the same profile twice.
But that isn't really about Future per se, and IMO this blog post is more confusing than helpful.

what to choose between require and assert in scala

Both require and assert are used to perform certain checks during runtime to verify certain conditions.
So what is the basic difference between them?
The only one I see is that require throws IllegalArgumentException and assert throws AssertionError.
How do I choose which one to use?
As Kigyo mentioned there is a semantic difference
assert means that your program has reached an inconsistent state this might be a problem with the current method/function (I like to think of it a bit as HTTP 500 InternalServerError)
require means that the caller of the method is at fault and should fix its call (I like to think of it a bit as HTTP 400 BadRequest)
There is also a major technical difference:
assert is annotated with #elidable(ASSERTION)
meaning you can compile your program with -Xelide-below ASSERTION or with -Xdisable-assertions and the compiler will not generate the bytecode for the assertions. This can significantly reduce bytecode size and improve performance if you have a large number of asserts.
Knowing this, you can use an assert to verify all the invariants everywhere in your program (all the preconditions/postconditions for every single method/function calls) and not pay the price in production.
You would usually have the "test" build with all the assertions enabled, it would be slower as it would verify all the assertions at all times, then you could have the "production" build of your product without the assertions, which you would eliminate all the internal state checks done through assertion
require is not elidable, it makes more sense for use in libraries (including internal libraries) to inform the caller of the preconditions to call a given method/function.
This is only my subjective point of view.
I use require whenever I want a constraint on parameters.
As an example we can take the factorial for natural numbers. As we do not want to address negative numbers, we want to throw an IllegalArgumentException.
I would use assert, whenever you want to make sure some conditions (like invariants) are always true during execution. I see it as a way of testing.
Here is an example implementation of factorial with require and assert
def fac(i: Int) = {
require(i >= 0, "i must be non negative") //this is for correct input
#tailrec def loop(k: Int, result: Long = 1): Long = {
assert(result == 1 || result >= k) //this is only for verification
if(k > 0) loop(k - 1, result * k) else result
}
loop(i)
}
When result > 1 is true, then the loop was executed at least once. So the result has to be bigger or equal to k. That would be a loop invariant.
When you are sure that your code is correct, you can remove the assert, but the require would stay.
You can see here for a detailed discussion within Scala language.
I can add that, the key to distinguish between require and assert is to understand these two. These two are both tools of software quality but from different toolboxes of different paradigms. In summary assert is a Software testing tool which takes a corrective approach, whereas require is a design by contract tool which takes a preventive approach.
Both require and assert are means of controlling validity of state. Historically there were 2 distinct paradigms for dealing with invalid states. The first one which is mainstream collectively called software testing discipline methodologies and tools. The other, called design by contract. These are two paradigms which are not comparable.
Software testing ensures a code versatile enough to be capable of error prone actions, were not misused. Design by contract controls code from having such capability. In other words Software testing is corrective, and design by contract is preventive.
assert is used to write unit tests, i.e. if a method passes all tests each written by an assert expression, the code is qualified as error free. So assert seats besides operational code, and is an independent body.
require is embedded within code and part of it to assure nothing harmful can happen.
In very simple language:
Require is used to enforce a precondition on the caller of a function or the creator of an object of some class. Whereas, assert is used to check the code of the function itself.
So, if a precondition fails, then you get an illegal argument exception. Whereas, if an assertion fails and it's not the caller's fault and consequently you get an assertion error.
require, ensure and invariance are concepts in Contract By Design (CBD) development process.
require checks for the pre-conditions that the caller should satisfy to consume the routine.
ensure checks for the correctness in the return value (and to also verify only the desired change has happened and nothing more)
invariance checks for the validness of the class at all critical times.
CBD is a development methodology to build correct/robust software. For more details on CBD Google and you should hit a link from Eiffel Software. Hope this helps.
Scaladocs/javadocs are pretty good as well:
assert()
Tests an expression, throwing an AssertionError if false. Calls to this method will not be generated if -Xelide-below is greater than ASSERTION.
require()
Tests an expression, throwing an IllegalArgumentException if false. This method is similar to assert, but blames the caller of the method for violating the condition.

What should a socket-based protocol look like for a C daemon?

I am writing a C daemon which my web application will use as a proxy to communicate with FTP servers. My web application enables users to connect and interact with FTP sites via AJAX. The reason I need a C daemon is that I have no way of keeping FTP connections alive across AJAX calls.
My web application will need to be able to tell my daemon to do list, get, put, delete, move, and rename files to a given FTP server for a given user account. So when my application talks to the daemon, it needs to pass the following via some protocol I define: 1) action, 2) connection id, 3) user id, 4) any additional parameters for action (note: connection information is stored in a database, so the daemon will talk to that as well).
So that's what I need my daemon to do. I'm thinking communication between my web app and the daemon will take place via a TCP socket, but I don't know exactly what data I would send. I need an example. For instance, should I just send something like this over the socket to the daemon?
action=list&connection_id=345&user_id=12345&path=/some/path
or should I do something hardcore at the byte level, like this?
+-----------------+-------------------------+-------------------+-----------------------------------+
| 1 byte (action) | 4 bytes (connection id) | 4 bytes (user id) | 255 bytes (additional parameters) |
+-----------------+-------------------------+-------------------+-----------------------------------+
| 0x000001 | 345 | 12345 | /some/path |
+-----------------+-------------------------+-------------------+-----------------------------------+
What does such communication over a socket normally look like?
Really it's mostly about whatever format is easiest for you to encode and parse, which is why rather than reinventing the wheel with my own protocol, I personally would go with an existing remote procedure call solution. My second choice would be the bitfields, as that's easy to pack into and out of a struct.
You don't necessarily need to implement your own protocol. Have you thought about using something like XML-RPC, or even just plain XML? There are C libraries that should let you parse it.
Binary protocols are a bit easier to deal with. Just prepend the length to the message (or just the variable part of it) - TCP doesn't know about your application-level message boundaries. Pay attention to number endianness.
On the other hand, text-based protocols are more flexible.
Also, take a look at Google Protocol Buffers - could be very useful, though I'm not sure ajax is supported.