state manager parallel transactions in runasync - service-fabric-stateful

In service fabric stateful services, there is RunAsync(cancellationToken) with using() for state manager transaction.
The legacy code I want to refactor contains two queues with dequeue attempts inside the while(true) with 1 second delay. I would like to get rid of this unnecessary delay and instead use two distinct reactive queues (semaphores with reliable queues).
The problem is, now the two distinct workflows depending on these two queues need to be separated into two separate threads because if these two queues run in single thread one wait() will block the other code from running. (I know probably best practice would separate these two tasks into two microservices, next project.)
I came up with below code as a solution:
protected override async Task RunAsync(CancellationToken cancellationToken)
{
await Task.WhenAll(AsyncTask1(cancellationToken), AsyncTask2(cancellationToken)).ConfigureAwait(false);
}
And each task contains something like:
while (true)
{
cancellationToken.ThrowIfCancellationRequested();
using (var tx = this.StateManager.CreateTransaction())
{
var maybeMessage = await messageQueue.TryDequeueAsync(tx, cancellationToken).ConfigureAwait(false);
if (maybeMessage.HasValue)
{
DoWork();
}
await tx.CommitAsync().ConfigureAwait(false);
}
}
Seems working but I just want to make sure if the using(statemanger.createTansaction()) is ok to be used in this parallel way..

According to documentation
Depending on the replica role for single-entry operation (like TryDequeueAsync) the ITransaction uses the Repeatable Read isolation level (when primary) or Snapshot isolation level (when **secondary).
Repeatable Read
Any Repeatable Read operation by default takes Shared locks.
Snapshot
Any read operation done using Snapshot isolation is lock free.
So if DoWork doesn't modifies the reliable collection then multiple transaction can be executed in parallel with no problems.
In case of multiple reads / updates - this can cause deadlocks and should be done with care.

Related

When are GCD queues used and when do you know you need them? Swift

After reading about Concurrent and Serial queues, sync and async, I think I have an idea about how to create queues and the order they are executed in. My problem is that in any of the tutorials I have seen, none of them actually tell you many use cases. For example:
I have a network manager that uses URLSessions and serializes json to send a request to my api. Does it make sense to wrap it in a .utility Queue or in a .userInitiated or do I just don't wrap it in a queue.
let task = LoginTask(username: username, password: password)
let networkQueue = DispatchQueue(label: "com.messenger.network",
qos: DispatchQoS.userInitiated)
networkQueue.async {
task.dataTask(in: dispatcher) { (user, httpCode, error) in
self.presenter?.loginUserResponse(user: user, httpCode: httpCode, error: error)
}
}
My question is: Is there any guidlines I can follow to know when there is a need to use queues or not because I cant find this information anywhere. I realise apple provides example usage howver it is very vague
Dispatch queues are used in a multitude of use cases, so it's hard to enumerate them, but two very common use cases are as follows:
You have some expensive and/or time-consuming process that you want to run on some thread other than the current thread. Often this is used when you're on the main thread and you want to run something on a background thread.
A good example of this would be image manipulation, which is a notoriously computationally (and memory) intensive process. So, you'd create a queue for image manipulation and then you'd dispatch each image manipulation task to that queue. You might also dispatch the UI update when it's done back to the main queue (because all UI updates must happen on the main thread). A common pattern would be:
imageQueue.async {
// manipulate the image here
// when done, update the UI:
DispatchQueue.main.async {
// update the UI and/or model objects on the main thread
}
}
You have some shared resource (it could be a simple variable, it could be some interaction with some other shared resource like a file or database) that you want to synchronize regardless of from which thread to invoke it. This is often part of a broader strategy of making something that is not inherently thread-safe behave in a thread safe manner.
The virtue of dispatch queues is that it greatly simplifies writing multi-threaded code, an otherwise very complicated technology.
The thing is that your example, initiating a network request, already runs the request on a background thread and URLSession manages all of this for you, so there's little value in using queues for that.
In the interest of full disclosure, there is a surprising of variety of different tools using GCD directly (e.g. dispatch groups or dispatch sources) or indirectly (e.g. operation queues) above and beyond the basic dispatch queues discussed above:
Dispatch groups: Sometimes you will initiate a series of asynchronous tasks and you want to be notified when they're all done. You can use a dispatch group (see https://stackoverflow.com/a/28101212/1271826 for a random example). This eliminates you from needing to keep track of when all of these tasks are done yourself.
Dispatch "apply" (now called concurrentPerform): Sometimes when you're running some massively parallel task, you want to use as many threads as you reasonably can. So concurrentPerform lets you effectively perform a for loop in parallel, and Apple has optimized it for the number of cores and CPUs your particular device, while not flooding it with too many concurrent tasks at any one time, exhausting the limited number of worker threads. See the https://stackoverflow.com/a/39949292/1271826 for an example of running a for loop in parallel.
Dispatch sources:
For example, if you have some background task that is doing a lot of work and you want to update the UI with the progress, sometimes those UI updates can come more quickly than the UI can handle them. So you can use a dispatch source (a DispatchSourceUserDataAdd) to decouple the background process from the UI updates. See aforementioned https://stackoverflow.com/a/39949292/1271826 for an example.
Traditionally, a Timer runs on the main run loop. But sometimes you want to run it on a background thread, but doing that with a Timer is complicated. But you can use a DispatchSourceTimer (a GCD timer) to run a timer on a queue other than the main queue. See https://stackoverflow.com/a/38164203/1271826 for example of how to create and use a dispatch timer. Dispatch timers also can be used to avoid some of the strong reference cycles that are easily introduced with target-based Timer objects.
Barriers: Sometimes when using a concurrent queue, you want most things to run concurrently, but for other things to run serially with respect to everything else on the queue. A barrier is a way to say "add this task to the queue, but make sure it doesn't run concurrently with respect to anything else on that queue."
An example of a barrier is the reader-writer pattern, where reading from some memory resource can happen concurrently with respect to all other reads, but any writes must not happen concurrently with respect to anything else on the queue. See https://stackoverflow.com/a/28784770/1271826 or https://stackoverflow.com/a/45628393/1271826.
Dispatch semaphores: Sometimes you need to let two tasks running on separate threads communicate to each other. You can use semaphores for one thread to "wait" for the "signal" from another.
One common application of semaphores is to make an inherently asynchronous task behave in a more synchronous manner.
networkQueue.async {
let semaphore = DispatchSemaphore(0)
let task = session.dataTask(with: url) { data, _, error in
// process the response
// when done, signal that we're done
semaphore.signal()
}
task.resume()
semaphore.wait(timeout: .distantFuture)
}
The virtue of this approach is that the dispatched task won't finish until the asynchronous network request is done. So if you needed to issue a series of network requests, but not have them run concurrently, semaphores can accomplish that.
Semaphores should be used sparingly, though, because they're inherently inefficient (generally blocking one thread waiting for another). Also, make sure you never wait for a semaphore from the main thread (because you're defeating the purpose of having the asynchronous task). That's why in the above example, I'm waiting on the networkQueue, not the main queue. All of this having been said, there's often better techniques than semaphores, but it is sometimes useful.
Operation queues: Operation queues are built on top of GCD dispatch queues, but offer some interesting advantages including:
The ability to wrap an inherently asynchronous task in a custom Operation subclass. (This avoids the disadvantages of the semaphore technique I discussed earlier.) Dispatch queues are generally used when running inherently synchronous tasks on a background thread, but sometimes you want to manage a bunch of tasks that are, themselves, asynchronous. A common example is the wrapping of asynchronous network requests in Operation subclass.
The ability to easily control the degree of concurrency. Dispatch queues can be either serial or concurrent, but it's cumbersome to design the control mechanism to, for example, to say "run the queued tasks concurrent with respect to each other, but no more than four at any given time." Operation queues make this much easier with the use of maxConcurrentOperationCount. (See https://stackoverflow.com/a/27022598/1271826 for an example.)
The ability to establish dependencies between various tasks (e.g. you might have a queue for downloading images and another queue for manipulating the images). With operation queues you can have one operation for the downloading of an image and another for the processing of the image, and you can make the latter dependent upon the completion of the former.
There are lots of other GCD related applications and technologies, but these are a few that I use with some frequency.

How both sync and async work with thread safe?

In Swift, we can leverage DispatchQueue to prevent race condition. By using serial queue, all things are performed in order, from https://developer.apple.com/library/content/documentation/General/Conceptual/ConcurrencyProgrammingGuide/OperationQueues/OperationQueues.html
Serial queues (also known as private dispatch queues) execute one task
at a time in the order in which they are added to the queue. The
currently executing task runs on a distinct thread (which can vary
from task to task) that is managed by the dispatch queue. Serial
queues are often used to synchronize access to a specific resource.
But we can easily create deadlock How do I create a deadlock in Grand Central Dispatch? by perform a sync inside async
let serialQueue = DispatchQueue(label: "Cache.Storage.SerialQueue")
serialQueue.async {
serialQueue.sync {
print("perform some job")
}
print("this can't be reached")
}
The only way to prevent deadlock is to use 2 serial queues, each for sync and async function versions. But this can cause rare condition when writeSync and writeAsync happens at the same time.
I see in fs module that it supports both sync and async functions, like fs.writeFileSync(file, data[, options]) and fs.writeFile(file, data[, options], callback). By allowing both 2 versions, it means users can use them in any order they want? So they can easily create deadlock like what we did above?
So maybe fs has a clever way that we can apply to Swift? How do we support both sync and async in a thread safe manner?
serialQueue.async {
serialQueue.sync {
print("perform some job")
}
}
This deadlocks because this code queues a second task on the same dispatch queue and then waits for that second task to finish. The second task can't even start, however, because it is a serial queue and the first task is still executing (albeit blocked on an internal sempahore).
The way to avoid this kind of deadlock is to never do that. It's especially stupid when you consider that you can achieve the same effect with the following:
serialQueue.async {
print("perform some job")
}
There are some use-cases for running synchronous tasks in a different queue to the one you are in e.g.
if the other queue is the main queue and you want to do some stuff in the UI before carrying on
as a means of synchronisation between tasks in different queues, for example if you want to make sure that all the current tasks in another queue have finished before carrying on.
however, there is never a reason to synchronously do something on the same queue, you might as well just do the something. Or to put it another way, if you just write statements one after the other, they are already executing synchronously on the same queue.
I see in fs module that it supports both sync and async functions, like fs.writeFileSync(file, data[, options]) and fs.writeFile(file, data[, options], callback). By allowing both 2 versions, it means users can use them in any order they want? So they can easily create deadlock like what we did above?
That depends on how the two APIs are implemented. The synchronous version of the call might just do the call without messing about on other threads. If it does grab another thread and then wait around until that other thread is finished, then yes there is a potential for deadlock if the node.js server runs out of threads.

Mongo DB reading 4 million documents

We have Scheduled jobs that runs daily,This jobs looks for matching Documents for that day and takes the document and do minimal transform and sent it a queue for downstream processing. Typically we have 4 millions Documents to be processed for a day. Our aim is to complete the processing within one hour. I am looking for suggestions on the best practices to read 4 million Documents from MongoDB quickly ?
The MongoDB Async driver is the first stop for low overhead querying. There's a good example of using the SingleResultCallback on that page:
Block<Document> printDocumentBlock = new Block<Document>() {
#Override
public void apply(final Document document) {
System.out.println(document.toJson());
}
};
SingleResultCallback<Void> callbackWhenFinished = new SingleResultCallback<Void>() {
#Override
public void onResult(final Void result, final Throwable t) {
System.out.println("Operation Finished!");
}
};
collection.find().forEach(printDocumentBlock, callbackWhenFinished);
It is a common pattern in asynchronous database drivers to allow results to be passed on for processing as soon as they are available. The use of OS-level async I/O will help with low CPU overhead. Which brings up the next problem - how to get the data out.
Without seeing the specifics of your work, you probably want to place the results into an in memory queue to be picked up by another thread at this point so the reader thread can keep reading results. An ArrayBlockingQueue is probably appropriate. put is more appropriate than add because it will block the reader thread if the worker(s) isn't able to keep up (keeping things balanced). Ideally, you don't want it to back up which is where multiple threads will be necessary. If the order of the results is important, use a single worker thread, otherwise use a ThreadPoolExecutor with the queue passed into the constructor. Using the in-memory queue does open up the possibility for data-loss if the results are being somehow discarded as they are read (i.e. if you were immediately sending off another query to delete them), and the reader process crashed.
At this point, either do the 'minimal transforms' on the worker thread(s), or serialize them in the workers and put them on a real queue (e.g. RabbitMQ, ZeroMQ). Putting them onto a real queue allows the work to be divided up amoungst multiple machines trivially, and provides optional persistence allowing recovery of work, and those queues have great clustering options for scalability. Those machines can then put the results into the queue you mentioned in the question (assuming it has the capacity).
The bottleneck in a system like that is how quickly one machine can get through a single mongo query, and how many results the final queue can handle. All the other parts (MongoDB, queues, # of worker machines) are individually scalable. By doing as little work as possible on the querying machine and pushing that work onto other machines that impact can be greatly reduced. It sounds like your destination queue is out of your control.
When trying to work out where bottlenecks are, measurements are critical. Adding metrics to your application up front will let you know which areas need improvement when things aren't going well.
That set-up can build a pretty scalable system. I've built many similar systems before. Beyond that, you'll want to investigate getting your data into something like Apache Storm.

Can we prevent deadlocks and timeouts on ReliableQueue's in Service Fabric?

We have a stateful service in Service Fabric with both a RunAsync method and a couple of service calls.
One service calls allows to enqueue something in a ReliableQueue
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.EnqueueAsync(tx, message);
queueLength = await queue.GetCountAsync(tx);
await tx.CommitAsync();
}
The RunAsync on the other hand tries to dequeue things:
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.TryDequeueAsync(tx);
queueLength = await queue.GetCountAsync(tx);
await tx.CommitAsync();
}
The GetCountAsync seems to cause deadlocks, because the two transactions block each other. Would it help if we would switch the order: so first counting and then the dequeue/enqueue?
This is likely due to the fact that the ReliableQueue today is strict FIFO and allows only one reader or writer at a time. You're probably not seeing deadlocks, you're seeing timeouts (please correct me if that is not the case). There's no real way to prevent the timeouts other than to:
Ensure that the transactions are not long lived - any longer than you need and you're blocking other work on the queue.
Increase the default transaction timeout (the default is 4 seconds, you can pass in a different value)
Reordering things shouldn't cause any change.
Having two transactions in two different places shouldn't cause deadlocks, as they act like mutexes. What will cause them though is creating transactions within transactions.
Perhaps that is what's happening? I've developed the habit lately of naming functions that create transactions Transactional, ie DoSomethingTransactionalAsync, and if it's a private helper I'll usually create two versions with one taking a tx and one creating a tx.
For example:
AddToProcessingQueueAsync(ITransaction tx, int num) and AddToProcessingQueueTransactionalAsync(int num).

Service Fabric reliable queue long operation

I'm trying to understand some best practices for service fabric.
If I have a queue that is added to by a web service or some other mechanism and a back end task to process that queue what is the best approach to handle long running operations in the background.
Use TryPeekAsync in one transaction, process and then if successful use TryDequeueAsync to finally dequeue.
Use TryDequeueAsync to remove an item, put it into a dictionary and then remove from the dictionary when complete. On startup of the service, check the
dictionary for anything pending before the queue.
Both ways feel slightly wrong, but I can't work out if there is a better way.
One option is to process the queue in RunAsync, something along the lines of this:
protected override async Task RunAsync(CancellationToken cancellationToken)
{
var store = await StateManager.GetOrAddAsync<IReliableQueue<T>>("MyStore").ConfigureAwait(false);
while (!cancellationToken.IsCancellationRequested)
{
using (var tx = StateManager.CreateTransaction())
{
var itemFromQueue = await store.TryDequeueAsync(tx).ConfigureAwait(false);
if (!itemFromQueue.HasValue)
{
await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken).ConfigureAwait(false);
continue;
}
// Process item here
// Remmber to clone the dequeued item if it is a custom type and you are going to mutate it.
// If success, await tx.CommitAsync();
// If failure to process, either let it run out of the Using transaction scope, or call tx.Abort();
}
}
}
Regarding the comment about cloning the dequeued item if you are to mutate it, look under the "Recommendations" part here:
https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/
One limitation with Reliable Collections (both Queue and Dictionary), is that you only have parallelism of 1 per partition. So for high activity queues it might not be the best solution. This might be the issue you're running into.
What we've been doing is to use ReliableQueues for situations where the write amount is very low. For higher throughput queues, where we need durability and scale, we're using ServiceBus Topics. That also gives us the advantage that if a service was Stateful only due to to having the ReliableQueue, it can now be made stateless. Though this adds a dependency to a 3rd party service (in this case ServiceBus), and that might not be an option for you.
Another option would be to create a durable pub/sub implementation to act as the queue. I've done tests before with using actors for this, and it seemed to be a viable option, without spending too much time on it, since we didn't have any issues depending on ServiceBus. Here is another SO about that Pub/sub pattern in Azure Service Fabric
If very slow use 2 queues.. One a fast one where you store the work without interruptions and a slow one to process it. RunAsync is used to move messages from the fast to the slow.