DBIOAction cleanUp or asTry do not lead to internal execution - scala

I'm trying to test whether side effects get executed or not. Neither
DBIOAction.successful(()).cleanUp(_.fold {
println("yeay!")
DBIOAction.successful(())
} { _ =>
println("aww.")
DBIOAction.successful(())
})
nor
DBIOAction.successful(()).asTry.map {
case Succeed(_) => println("yeay!")
case Failure(_) => println("aww.")
}
print anything. I am not too familiar w/ Slick but it may need to place run somewhere. Is there a way to provide a lightweight runtime for testing purposes?

Consulting the documentation of I/O actions:
Operations that can be executed on a database are called database I/O
actions (DBIOAction). Several operations on queries and tables create
I/O actions, for example myQuery.result, myQuery.result.headOption,
myQuery += data or myTable.schema.create. Actions can be composed with
combinators like andThen, flatMap, DBIO.seq or transactionally.
Just like a query, an I/O action is only a description of an operation.
Creating or composing actions does not execute anything on a database.
Combined actions always consist of strictly linear sequences of other
actions. Parts of an action never run concurrently.
and about results:
Any action can be run on a database to obtain the results (or perform
side effects such as updating the database). Execution is always
asynchronous, i.e. it does not block the caller thread. Any kind of
action can be run to obtain a Future that is eventually completed with
a result when the execution is finished (myDatabase.run(myAction)).
Actions that produce a sequence of values usually support streaming
results as well. Such an action can be combined with a database to
produce a Reactive Streams Publisher (myDatabase.stream(myAction)).
The action is executed when a consumer subscribes to the Publisher.
you have to database.run(ioAction) to have any side effect (including println) evaluated.

Related

postgresql libpqxx: Several queries as one transaction

When inserting/updating data in postgresql, it is easy to execute multiple statements in one transaction. (My goal here is to avoid a server round-trip for each statement, although the transactional isolation is often useful.)
When querying, I'm unclear if this is possible. I'd somehow need to know what function is going to consume each bit and how to separate the bits.
connection c("dbname=test user=postgres hostaddr=127.0.0.1");
work w(c);
w.exec("SELECT a, b FROM my_table WHERE c = 3;");
w.exec("SELECT x, y, z FROM my_other_table WHERE c = 'dog';");
w.commit();
Assume I've got functions my_parse_function() and my_other_parse_function() that can read rows from each of these queries, were I doing them separately.
If your goal is to avoid round trips, transactions don't help.
Transaction isolation in Postgres (as with most RDBMSs) doesn't rely on the server executing all of your statements at once. Each statement in your transaction will be sent and executed at the point of the exec() call; isolation is provided by the engine's concurrency control model, allowing multiple clients to issue commands simultaneously, and presenting each with a different view of the database state.
If anything, wrapping a sequence of statements in a transaction will add more communication overhead, as additional round-trips are required to issue the BEGIN and COMMIT commands.
If you want to issue several commands in one round-trip, you can do so by calling exec() with a single semicolon-separated multi-statement string. These statements will be implicitly treated as a single transaction, provided that there is no explicit transaction already active, and that the string doesn't include any explicit BEGIN/COMMIT commands.
If you want to send multiple queries, the protocol does allow for multiple result sets to be returned in response to a multi-query string, but exec() doesn't give you access to them; it's just a wrapper for libpq's PQexec(), which discards all but the last result.
Instead, you can use a pipeline, which lets you issue asynchronous queries via insert(), and then retrieve() the results at your leisure (blocking until they arrive, if necessary). Setting a retain() limit will allow the pipeline to accumulate statements, and then send them together as a multi-command string.

Parallel design of program working with Flink and scala

This is the context:
There is an input event stream,
There are some methods to apply on
the stream, which applies different logic to evaluates each event,
saying it is a "good" or "bad" event.
An event can be a real "good" one only if it passes all the methods, otherwise it is a "bad" event.
There is an output event stream who has result of event and its eventID.
To solve this problem, I have two ideas:
We can apply each method sequentially to each event. But this is a kind of batch processing, and doesn't apply the advantages of stream processing, in the same time, it takes Time(M(ethod)1) + Time(M2) + Time(M3) + ....., which maybe not suitable to real-time processing.
We can pass the input stream to each method, and then we can run each method in parallel, each method saves the bad event into a permanent storage, then the Main method could query the permanent storage to get the result of each event. But this has some problems to solve:
how to execute methods in parallel in the programming language(e.g. Scala), how about the performance(network, CPUs, memory)
how to solve the synchronization problem? It's sure that those methods need sometime to calculate and save flag into the permanent storage, but the Main just need less time to query the flagļ¼Œ which a delay issue occurs.
etc.
This is not a kind of tech and design question, I would like to ask your guys' ideas, if you have some new ideas or ideas to solve the problem ? Looking forward to your opinions.
Parallel streams, each doing the full set of evaluations sequentially, is the more straightforward solution. But if that introduces too much latency, then you can fan out the evaluations to be done in parallel, and then bring the results back together again to make a decision.
To do the fan-out, look at the split operation on DataStream, or use side outputs. But before doing this n-way fan-out, make sure that each event has a unique ID. If necessary, add a field containing a random number to each event to use as the unique ID. Later we will use this unique ID as a key to gather back together all of the partial results for each event.
Once the event stream is split, each copy of the stream can use a MapFunction to compute one of evaluation methods.
Gathering all of these separate evaluations of a given event back together is a bit more complex. One reasonable approach here is to union all of the result streams together, and then key the unioned stream by the unique ID described above. This will bring together all of the individual results for each event. Then you can use a RichFlatMapFunction (using Flink's keyed, managed state) to gather the results for the separate evaluations in one place. Once the full set of evaluations for a given event has arrived at this stateful flatmap operator, it can compute and emit the final result.

Can you publish to a queue and write to the db within a Slick transaction, and still guarantee atomicity?

I have a Slick server for which I'd like to use the transactionally keyword to perform a double-write of data to my database and a RabbitMQ message queue. My code looks something like this:
val a = (for {
_ <- coffees.map(c => (c.name, c.supID, c.price)) += ("Colombian_Decaf", 101, 8.99)
ch.txSelect()
ch.basicPublish("SELL " + c.name, QUEUE_NAME, MessageProperties.PERSISTENT_BASIC, "nop".getBytes())
ch.txCommit()
} yield ()).transactionally
My question is: Is it possible for the queue publish action to commit successfully, but have the DB insert fail? In this case, my system would be in an inconsistent state, as I would only want the message to be published to the queue if the value was successfully inserted into the database, and vice versa.
Thanks!
Unfortunately for you the answer is that you can't easily guarantee consistency for such a system. What you want is distributed transactions and they are fundamentally hard. To see why this is so you can make the following thought experiment: what happens if your computer blows up (or less radically gets cut of electricity) at the most unfortunate moment? For this code one of such bad moments is exactly after the line ch.txCommit() is fully executed (so it is before the outer DB transaction is committed as well). Fundamentally there is nothing you can do about such scenario unless somehow these two concurrent transactions a aware of each other. Unfortunately I don't know about any distributed transaction coordinators to cover both traditional SQL DBs and RabbitMQ. So your choices are:
Give up and do nothing (and develop some procedure to recover after disastrous events afterward in a manual mode)
Implement some distributed transaction algorithm such as 2-phase commit yourself. This most probably requires some re-design and a complicated implementation.
Re-design your system to use some form of eventual consistency. This probably requires a bigger re-design but still might be easier to implement than #2.

Does Slick's DBIO.seq method run actions sequentially?

Slick has DBIO.seq and DBIO.sequence for running many DBIOActions whereby the results of previous actions aren't required for subsequent actions.
I've looked at the source, and it's not obvious to me if DBIO.seq runs the actions sequentially. On the other hand, DBIO.fold has a very simple implementation which definitely does run the actions sequentially, as it just uses flatMap internally.
My question is: will order be guaranteed when using seq and sequence, like it is with fold?
The documentation states that the actions in DBIO.seq are run sequentially:
The simplest combinator is DBIO.seq which takes a varargs list of
actions to run in sequence
Also, in the source code for DBIO.seq, you will see that SynchronousDatabaseAction is called inside foreach, which means that each are sequantially and (internally) synchronously called without any parallel run.

How to use Scala Futures the right way?

I'm wondering if Futures are better to be used in conjunction with Actors only, rather than in a program that does not use Actor. Said differently, is performing asynchronous computation with future something that should better be done within an Actors system?
Here why i'm saying that:
1 -
You perform a computation for which the result, would trigger some action that you may do in another thread.
For instance, i have a long operation to determine the price of something, from my main thread, i decide to launch an asynchronous process for it. In the mean time i could be doing other thing, then when the response is ready/availble or communicated back to me, i go on on that path.
I can see that with actor this is handy, because you can pipe a result to an actor. But with a typical threading model, you can either block or .... ?
2 -
Another issue, let say i need to update the age of a list of participant, by getting some information online. Let assume i just have one future for that task. Isn't closing over the participant list something wrong to do. Multiple thread maybe accessing that participant list at the same time. So making the update within the future would simply be wrong and in that case, we would need java concurrent collection isn't it ?
Maybe i see it the wrong way, future are not meant to do side effect
at all
But in that case, fair enough, no side effect, but we still have the problem of getting a value back from the calling thread, which can only be blocking. I mean let's imagine that, the result, would help the calling thread, to update some data structure. How to do that update asynchronously without closing over that data structure somehow.
I believe the call backs such as OnComplete can be use for
side-effecting (Am it right here?)
still, the call back would have to close over the data structure anyway. Hence i don't see how not using Actor.
PS: I like actors, i'm just trying to understand better the usage of future without actors. I read everywhere, that one should use actor only when necessary that is when state need to be manage. It seems to me that overall, using future, without actor, always involve blocking somewhere down the line, if the result need to be communicated back at some point to the thread that initiated the asynchronous task.
Actors are good when you are dealing with mutable state because they encapsulate the mutable state. and allow only message-based interaction.
You can use Future to execute in a different thread. You don't have to block on a Future because Scala's Future compose. So if you have multiple Futures in your code, you don't have to wait/block for all of them to compete. For example, if your pipeline is completely non-block or asyn (e.g., Play and Spray) you can return a Future back to the client.
Futures are lightweight compared to actors because you don't need a complete actorsystem.
Here is a quote from Martin Odersky that I really like.
There is no silver bullet for all concurrency issues; the right
solution depends on what one needs to achieve. Do you want to define
asynchronous computations that react to events or streams of values?
Or have autonomous, isolated entities communicating via messages? Or
define transactions over a mutable store? Or, maybe the primary
purpose of parallel execution is to increase the performance? For each
of these tasks, there is an abstraction that does the job: futures,
reactive streams, actors, transactional memory, or parallel
collections.
So choose your abstraction based on your use case and needs.