Slick has DBIO.seq and DBIO.sequence for running many DBIOActions whereby the results of previous actions aren't required for subsequent actions.
I've looked at the source, and it's not obvious to me if DBIO.seq runs the actions sequentially. On the other hand, DBIO.fold has a very simple implementation which definitely does run the actions sequentially, as it just uses flatMap internally.
My question is: will order be guaranteed when using seq and sequence, like it is with fold?
The documentation states that the actions in DBIO.seq are run sequentially:
The simplest combinator is DBIO.seq which takes a varargs list of
actions to run in sequence
Also, in the source code for DBIO.seq, you will see that SynchronousDatabaseAction is called inside foreach, which means that each are sequantially and (internally) synchronously called without any parallel run.
Related
I'm trying to test whether side effects get executed or not. Neither
DBIOAction.successful(()).cleanUp(_.fold {
println("yeay!")
DBIOAction.successful(())
} { _ =>
println("aww.")
DBIOAction.successful(())
})
nor
DBIOAction.successful(()).asTry.map {
case Succeed(_) => println("yeay!")
case Failure(_) => println("aww.")
}
print anything. I am not too familiar w/ Slick but it may need to place run somewhere. Is there a way to provide a lightweight runtime for testing purposes?
Consulting the documentation of I/O actions:
Operations that can be executed on a database are called database I/O
actions (DBIOAction). Several operations on queries and tables create
I/O actions, for example myQuery.result, myQuery.result.headOption,
myQuery += data or myTable.schema.create. Actions can be composed with
combinators like andThen, flatMap, DBIO.seq or transactionally.
Just like a query, an I/O action is only a description of an operation.
Creating or composing actions does not execute anything on a database.
Combined actions always consist of strictly linear sequences of other
actions. Parts of an action never run concurrently.
and about results:
Any action can be run on a database to obtain the results (or perform
side effects such as updating the database). Execution is always
asynchronous, i.e. it does not block the caller thread. Any kind of
action can be run to obtain a Future that is eventually completed with
a result when the execution is finished (myDatabase.run(myAction)).
Actions that produce a sequence of values usually support streaming
results as well. Such an action can be combined with a database to
produce a Reactive Streams Publisher (myDatabase.stream(myAction)).
The action is executed when a consumer subscribes to the Publisher.
you have to database.run(ioAction) to have any side effect (including println) evaluated.
I'm using Monix Scheduler to execute some tasks periodically. But I don't know how to not just execute them, but also to collect result from them to some collection...
Lets say I have a scheduled task that returns a random number every time:
val task = Task { Math.random() }
implicit val io: SchedulerService = Scheduler.io()
task.map(_ + 2).map(println).executeOn(io).delayExecution(1.seconds).loopForever.runAsyncAndForget
Theoretically speaking, I can create mutable and concurrent list before task execution, and in task.map I can put a result in that list... But I've heard that using mutable, shared between threads, collections isn't the best practice at all... Is there any nice way to collect all scheduled Task results? What instrument should I use to achieve this goal in a proper, scala idiomatic way, avoiding mutable collections?
The idiomatic way to collect repeated results using Monix would be to use an Observable instead of a Task. It has many methods such as zipMap to combine results with another Observable, and many methods such as foldLeft to combine results with previous results of the same Observable.
Note this generally requires collecting all your Observables into one method instead of the fire and forget method in your example. Ideally, you have exactly one runAsync in your entire program, in your main function.
This is the context:
There is an input event stream,
There are some methods to apply on
the stream, which applies different logic to evaluates each event,
saying it is a "good" or "bad" event.
An event can be a real "good" one only if it passes all the methods, otherwise it is a "bad" event.
There is an output event stream who has result of event and its eventID.
To solve this problem, I have two ideas:
We can apply each method sequentially to each event. But this is a kind of batch processing, and doesn't apply the advantages of stream processing, in the same time, it takes Time(M(ethod)1) + Time(M2) + Time(M3) + ....., which maybe not suitable to real-time processing.
We can pass the input stream to each method, and then we can run each method in parallel, each method saves the bad event into a permanent storage, then the Main method could query the permanent storage to get the result of each event. But this has some problems to solve:
how to execute methods in parallel in the programming language(e.g. Scala), how about the performance(network, CPUs, memory)
how to solve the synchronization problem? It's sure that those methods need sometime to calculate and save flag into the permanent storage, but the Main just need less time to query the flag, which a delay issue occurs.
etc.
This is not a kind of tech and design question, I would like to ask your guys' ideas, if you have some new ideas or ideas to solve the problem ? Looking forward to your opinions.
Parallel streams, each doing the full set of evaluations sequentially, is the more straightforward solution. But if that introduces too much latency, then you can fan out the evaluations to be done in parallel, and then bring the results back together again to make a decision.
To do the fan-out, look at the split operation on DataStream, or use side outputs. But before doing this n-way fan-out, make sure that each event has a unique ID. If necessary, add a field containing a random number to each event to use as the unique ID. Later we will use this unique ID as a key to gather back together all of the partial results for each event.
Once the event stream is split, each copy of the stream can use a MapFunction to compute one of evaluation methods.
Gathering all of these separate evaluations of a given event back together is a bit more complex. One reasonable approach here is to union all of the result streams together, and then key the unioned stream by the unique ID described above. This will bring together all of the individual results for each event. Then you can use a RichFlatMapFunction (using Flink's keyed, managed state) to gather the results for the separate evaluations in one place. Once the full set of evaluations for a given event has arrived at this stateful flatmap operator, it can compute and emit the final result.
Reading the scala source code for scala.concurrent.Future and scala.concurrent.impl.Future, it seems that every future composition via map dispatches a new task for an executor. I assume this commonly triggers a context switch for the current thread, and/or an assignment of a thread for the job.
Considering that function flows need to pass around Futures between them to act on results of Futures without blocking (or without delving into callback spaghetti), isn't this "reactive" paradigm very pricy in practice, when code is well written in a modular way where each function only does something small and passes along to other ones?
It depends on the execution context. So you can choose the strategy.
You're executor can also just do it in the calling thread, keeping the map-calls on the same thread. You can pass your own strategy by passing explicitly the execution context or use the implicit.
I would first test what the default fork/join pool does, by logging which thread was used. Afaik newer versions of it sometimes utilize the submitting thread. However, I don't know if that's used / applied for scala future callbacks.
How well to scala parallel collection operations get along with the concurrency/parallelism used by Akka Actors (and Futures) with respect to efficient scheduling on the system?
Actors' and Futures' execution is handled by an ExecutionContext generally provided by the Dispatcher. What I find on parallel collections indicates they use a TaskSupport object. I found a ExecutionContextTaskSupport object that may connect the two but am not sure.
What is the proper way to mix the two concurrency solutions, or is it advised not to?
At present this is not supported / handled well.
Prior to Scala 2.11-M7, attempting to use the dispatcher as the ContextExecutor throws an exception.
That is, the following code in an actor's receive will throw a NotImplementedError:
val par = List(1,2,3).par
par.tasksupport = new ExecutionContextTaskSupport(context.dispatcher)
par foreach println
Incidentally, this has been fixed in 2.11-M7, though it was not done to correct the above issue.
In reading through the notes on the fix it sounds like the implementation provided by ExecutionContextTaskSupport in the above case could have some overhead over directly using one of the other TaskSupport implementations; however, I have done nothing to test that interpretation or evaluate the magnitude of any impact.
A Note on Parallel Collections:
By default Parallel Collections will use the global ExecutorContext (ExecutionContext.Implicits.global) just as you might use for Futures. While this is well behaved, if you want to be constrained by the dispatcher (using context.dispatcher)—as you are likely to do with Futures in Akka—you need to set a different TaskSupport as shown in the code sample above.