how to use couchbase as fifo queue - queue

With Java client, how can I use couchbase to implement FIFO queue, thread safe? There can be many threads popping from the queue, and pushing into the queue. Each object in the queue is a string[].

Couchbase doesn't have any built-in functionality for creating queues, but you can do that by yourself.
I'll explain how to do that in short example below.
I.e. we have queue with name queue and it will have items with names item:<index>. To implement queue you'll need to store your values with key like: <queue_name>:item:<index>, where index will be separate key queue:index, that you need to increment while pushing to queue, and decrement while popping.
In couchbase you could use increment and decrement operations to implement queue, because that operations are atomic and threadsafe.
So code of your push and pop functions will be like:
void push(string queue, string[] value){
int index = couchbase.increment(queue + ':index');
couchbase.set(queue + ':item:' + index, value);
}
string[] pop(string queue){
int index = couchbase.get(queue + ':index');
string[] result = couchbase.get(queue + ':item:' + index);
couchbase.decrement(queue + ':index');
return result;
}
Sorry for code, Ive used java and couchbase java client a long time ago. If now java client have callbacks, like nodejs client, you can rewrite that code to use callbacks. It will be better, I think.
Also you can add additional check into set operation - use add (in C# client it called StoreMode.Add) operation that will throw exception if item with given key has already exists. And you can catch that exception and call push function again for same arguments.
UPD: I'm sorry, it was too early in the morning, so I couldn't think clear.
For fifo, as #avsej said you'll need two counters: queue:head and queue:tail. So for fifo:
void push(string queue, string[] value){
int index = couchbase.increment(queue + ':tail');
couchbase.set(queue + ':item:' + index, value);
}
string[] pop(string queue){
int index = couchbase.increment(queue + ':head') - 1;
string[] result = couchbase.get(queue + ':item:' + index);
return result;
}
Note: code can look slightly different depending on start values of queue:tail and queue:head (will it be zero or one or something else).
Also you can set some max value for counters, after reaching it, queue:tail and queue:head will be reseted to 0 (just to limit number of documents). Also you can set expire value to each document, if you actually need this.

Couchbase already CouchbaseQueue data structure.
Example usage: taken from the below SDK documentation
Queue<String> shoppingList = new CouchbaseQueue<String>("queueDocId", collection, String.class, QueueOptions.queueOptions());
shoppingList.add("loaf of bread");
shoppingList.add("container of milk");
shoppingList.add("stick of butter");
// What does the JSON document look like?
System.out.println(collection.get("queueDocId").contentAsArray());
//=> ["stick of butter","container of milk","loaf of bread"]
String item;
while ((item = shoppingList.poll()) != null) {
System.out.println(item);
// => loaf of bread
// => container of milk
// => stick of butter
}
// What does the JSON document look like after draining the queue?
System.out.println(collection.get("queueDocId").contentAsArray());
//=> []
Java SDK 3.1 CouchbaseQueue Doc

Related

Vertx CompositeFuture

I am working on a solution where I am using vertx 3.8.4 and vertx-mysql-client 3.9.0 for asynchronous database calls.
Here is the scenario that I have been trying to resolve, in a proper reactive manner.
I have some mastertable records which are in inactive state.
I run a query and get the list of records from the database.
This I did like this :
Future<List<Master>> locationMasters = getInactiveMasterTableRecords ();
locationMasters.onSuccess (locationMasterList -> {
if (locationMasterList.size () > 0) {
uploadTargetingDataForAllInactiveLocations(vertx, amazonS3Utility,
locationMasterList);
}
});
Now in uploadTargetingDataForAllInactiveLocations method, i have a list of items.
What I have to do is, I need to iterate over this list, for each item, I need to download a file from aws, parse the file and insert those data to db.
I understand the way to do it using CompositeFuture.
Can someone from vertx dev community help me with this or with some documentation available ?
I did not find good contents on this by googling.
I'm answering this as I was searching for something similar and I ended up spending some time before finding an answer and hopefully this might be useful to someone else in future.
I believe you want to use CompositeFuture in vertx only if you want to synchronize multiple actions. That means that you either want an action to execute in the case that either all your other actions on which your composite future is built upon succeed or at least one of the action on which your composite future is built upon succeed.
In the first case I would use CompositeFuture.all(List<Future> futures) and in the second case I would use CompositeFuture.any(List<Future> futures).
As per your question, below is a sample code where a list of item, for each item we run an asynchronous operation (namely downloadAnProcessFile()) which returns a Future and we want to execute an action doAction() in the case that all the async actions succeeded:
List<Future> futures = new ArrayList<>();
locationMasterList.forEach(elem -> {
Promise<Void> promise = Promise.promise();
futures.add(promise.future());
Future<Boolean> processStatus = downloadAndProcessFile(); // doesn't need to be boolean
processStatus.onComplete(asyncProcessStatus -> {
if (asyncProcessStatus.succeeded()){
// eventually do stuff with the result
promise.complete();
} else {
promise.fail("Error while processing file whatever");
}
});
});
CompositeFuture.all(futures).onComplete(compositeAsync -> {
if (compositeAsync.succeeded()){
doAction(); // <-- here do what you want to do when all future complete
} else {
// at least 1 future failed
}
});
This solution is probably not perfect and I suppose can be improved but this is what I found works for me. Hopefully will work for someone else.

When exactly do we use async-await and then?

I am very confused about this. I request you to clarify the concept.
Consider the following scenarios:
Case 1:
int number = 0;
void calculate() {
number = number + 2;
print(number);
}
I know this works just fine. "2" will be printed on the terminal.
But why shouldn't I use async-await here, like this:
int number = 0;
void calculate() async {
void addition() async {
number = number + 2;
}
await addition();
print(number);
}
This seems logical to me, since print(number) should wait for number = number + 2 to finish. Why isn't this necessary? How does dart know which operation to execute first?
How is it ensured that print(number) isn't executed before number = number + 2 and "0" is printed on the terminal?
Does the sequence in which we write these operations in the function matter?
Case 2:
Consider the case where I am interacting with SQFLite database and values fetched depend on each other.
Note: number1, number2, number3 will still have values before the following function is called.
void getValues() async {
void calculate1() {
number1 = await db.getNumber1(10);
}
void calculate2() {
number2 = await db.getNumber2(number1);
}
await calculate1().then((_) async {
await calculate2().then((_) async {
number3 = await db.getNumber3(number2);
});
});
}
I have a lot of these types of functions in my app and I am doing this everywhere.
I am kind of paranoid, thinking if old values of number1and number2 are taken as a parameter in getNumber2() and getNumber3() respectively, then I'll be doomed.
async/await are just syntax sugar for the underlying Future framework. 95% of the time, they will suffice, and are preferred by the style guide.
One exception is that you may have multiple futures that you want to wait until all are complete in parallel. In that case, you'll need to use Future.wait([future1, future2, future3]), which cannot be expressed using await.
Dart is executed line by line. So when the function is called calculation will be done first then it will be printed. So you will always get 2 printed
You can see it like there is one main thread in general which is the UI thread. Any operations you are writing in this thread will be performed line by line and after completely executing one line it will move to next line.
Now suppose you have something which you know that it will take time to be computed or fully executed with either a result or error. If you will write this in the main UI thread (synchronous thread) that means you're stopping the UI of the app, which in turn makes the app to crash(Application Not Responding Error) as the operating system feels that the app has frozen but as you know this is happening because of the compute you are running in the UI thread which is taking time and the UI is waiting for it to be completely executed.
So to overcome this issue we use Asynchronous methods to compute the time taking computations like getting some data from a database which will return a value or error in "future". The main UI thread doesn't waits for the asynchronous threads. If you don't have anything to show to the user until any asynchronous task is completed you place the loading indicators for the time being.
Hope this helps!

RXJS : Idiomatic way to create an observable stream from a paged interface

I have paged interface. Given a starting point a request will produce a list of results and a continuation indicator.
I've created an observable that is built by constructing and flat mapping an observable that reads the page. The result of this observable contains both the data for the page and a value to continue with. I pluck the data and flat map it to the subscriber. Producing a stream of values.
To handle the paging I've created a subject for the next page values. It's seeded with an initial value then each time I receive a response with a valid next page I push to the pages subject and trigger another read until such time as there is no more to read.
Is there a more idiomatic way of doing this?
function records(start = 'LATEST', limit = 1000) {
let pages = new rx.Subject();
this.connect(start)
.subscribe(page => pages.onNext(page));
let records = pages
.flatMap(page => {
return this.read(page, limit)
.doOnNext(result => {
let next = result.next;
if (next === undefined) {
pages.onCompleted();
} else {
pages.onNext(next);
}
});
})
.pluck('data')
.flatMap(data => data);
return records;
}
That's a reasonable way to do it. It has a couple of potential flaws in it (that may or may not impact you depending upon your use case):
You provide no way to observe any errors that occur in this.connect(start)
Your observable is effectively hot. If the caller does not immediately subscribe to the observable (perhaps they store it and subscribe later), then they'll miss the completion of this.connect(start) and the observable will appear to never produce anything.
You provide no way to unsubscribe from the initial connect call if the caller changes its mind and unsubscribes early. Not a real big deal, but usually when one constructs an observable, one should try to chain the disposables together so it call cleans up properly if the caller unsubscribes.
Here's a modified version:
It passes errors from this.connect to the observer.
It uses Observable.create to create a cold observable that only starts is business when the caller actually subscribes so there is no chance of missing the initial page value and stalling the stream.
It combines the this.connect subscription disposable with the overall subscription disposable
Code:
function records(start = 'LATEST', limit = 1000) {
return Rx.Observable.create(observer => {
let pages = new Rx.Subject();
let connectSub = new Rx.SingleAssignmentDisposable();
let resultsSub = new Rx.SingleAssignmentDisposable();
let sub = new Rx.CompositeDisposable(connectSub, resultsSub);
// Make sure we subscribe to pages before we issue this.connect()
// just in case this.connect() finishes synchronously (possible if it caches values or something?)
let results = pages
.flatMap(page => this.read(page, limit))
.doOnNext(r => this.next !== undefined ? pages.onNext(this.next) : pages.onCompleted())
.flatMap(r => r.data);
resultsSub.setDisposable(results.subscribe(observer));
// now query the first page
connectSub.setDisposable(this.connect(start)
.subscribe(p => pages.onNext(p), e => observer.onError(e)));
return sub;
});
}
Note: I've not used the ES6 syntax before, so hopefully I didn't mess anything up here.

Rx queue implementation and Dispatcher's buffer

I want to implement a queue which is capable of taking events/items from multiple producers in multiple threads, and consume them all on single thread.
this queue will work in some critical environment, so I am quite concerned with it's stability.
I have implemented it using Rx capabilities, but I have 2 questions:
Is this implementation OK? Or maybe it is flawed in some way I do not know of? (as an alternative - manual implementation with Queue and locks)
What is Dispatcher's buffer length? Can it handle 100k of queued items?
The code below illustrates my approach, using a simple TestMethod. It's output shows that all values are put in from different threads, but are processed on another single thread.
[TestMethod()]
public void RxTest()
{
Subject<string> queue = new Subject<string>();
queue
.ObserveOnDispatcher()
.Subscribe(s =>
{
Debug.WriteLine("Value: {0}, Observed on ThreadId: {1}", s, Thread.CurrentThread.ManagedThreadId);
},
() => Dispatcher.CurrentDispatcher.InvokeShutdown());
for (int j = 0; j < 10; j++)
{
ThreadPool.QueueUserWorkItem(o =>
{
for (int i = 0; i < 100; i++)
{
Thread.Sleep(10);
queue.OnNext(string.Format("value: {0}, from thread: {1}", i.ToString(), Thread.CurrentThread.ManagedThreadId));
}
queue.OnCompleted();
});
}
Dispatcher.Run();
}
I'm not sure about the behaviour of Subject in heavily multithreaded scenarios. I can imagine though that something like BlockingCollection (and its underlying ConcurrentQueue) are well worn in the situations you're talking about. And simple to boot.
var queue = new BlockingCollection<long>();
// subscribing
queue.GetConsumingEnumerable()
.ToObservable(Scheduler.NewThread)
.Subscribe(i => Debug.WriteLine("Value: {0}, Observed on ThreadId: {1}", i, Thread.CurrentThread.ManagedThreadId));
// sending
Observable.Interval(TimeSpan.FromMilliseconds(500), Scheduler.ThreadPool)
.Do(i => Debug.WriteLine("Value: {0}, Sent on ThreadId: {1}", i, Thread.CurrentThread.ManagedThreadId))
.Subscribe(i => queue.Add(i));
You certainly don't want to touch queues and locks. The ConcurrentQueue implementation is excellent and will certainly handle the size queues you're talking about effectively.
Take a look at EventLoopScheduler. It's built-in to RX and I think it does everything you want.
You can take any number of observables, call .ObserveOn(els) (els is your instance of an EventLoopScheduler) and you're now marshalling multiple observable from multiple threads onto a single thread and queuing each call to OnNext serially.

rx reactive extension: how to have each subscriber get a different value (the next one) from an observable?

Using reactive extension, it is easy to subscribe 2 times to the same observable.
When a new value is available in the observable, both subscribers are called with this same value.
Is there a way to have each subscriber get a different value (the next one) from this observable ?
Ex of what i'm after:
source sequence: [1,2,3,4,5,...] (infinite)
The source is constantly adding new items at an unknown rate.
I'm trying to execute a lenghty async action for each item using N subscribers.
1st subscriber: 1,2,4,...
2nd subscriber: 3,5,...
...
or
1st subscriber: 1,3,...
2nd subscriber: 2,4,5,...
...
or
1st subscriber: 1,3,5,...
2nd subscriber: 2,4,6,...
I would agree with Asti.
You could use Rx to populate a Queue (Blocking Collection) and then have competing consumers read from the queue. This way if one process was for some reason faster it could pick up the next item potentially before the other consumer if it was still busy.
However, if you want to do it, against good advice :), then you could just use the Select operator that will provide you with the index of each element. You can then pass that down to your subscribers and they can fiter on a modulus. (Yuck! Leaky abstractions, magic numbers, potentially blocking, potentiall side effects to the source sequence etc)
var source = Obserservable.Interval(1.Seconds())
.Select((i,element)=>{new Index=i, Element=element});
var subscription1 = source.Where(x=>x.Index%2==0).Subscribe(x=>DoWithThing1(x.Element));
var subscription2 = source.Where(x=>x.Index%2==1).Subscribe(x=>DoWithThing2(x.Element));
Also remember that the work done on the OnNext handler if it is blocking will still block the scheduler that it is on. This could affect the speed of your source/producer. Another reason why Asti's answer is a better option.
Ask if that is not clear :-)
How about:
IObservable<TRet> SomeLengthyOperation(T input)
{
return Observable.Defer(() => Observable.Start(() => {
return someCalculatedValueThatTookALongTime;
}, Scheduler.TaskPoolScheduler));
}
someObservableSource
.SelectMany(x => SomeLengthyOperation(input))
.Subscribe(x => Console.WriteLine("The result was {0}", x);
You can even limit the number of concurrent operations:
someObservableSource
.Select(x => SomeLengthyOperation(input))
.Merge(4 /* at a time */)
.Subscribe(x => Console.WriteLine("The result was {0}", x);
It's important for the Merge(4) to work, that the Observable returned by SomeLengthyOperation be a Cold Observable, which is what the Defer does here - it makes the Observable.Start not happen until someone Subscribes.