Atomic counters in Couchbase - nosql

I wanted to know if Couchbase support consistent incremental Counters. From what I've read in this doc, it does not, it just encapsulates a read/write operation so you won't need to do it yourself. Of course this doesn't work for me because the data might change since the time you read the data from the database.

Couchbase absolutely does, just like memcached and Membase Server, it supports the incr/decr operations atomically within a cluster.
cb.set("mykey", 1)
x = cb.incr("mykey")
puts x #=> 2
incr is both writing and returning the resulting value.
"The update operation occurs on the server and is provided at the protocol level." means that it is atomic on the cluster, and executed by the server.
"This simplifies what would otherwise be a two-stage get and set operation." means that instead of a two-stage operation, it is a single operation!

If you are using the Java API, since the release of version 2.0, the incr methods has been replaced by the counter method.
You would need to use the counter method of your bucket. This method allows you to define the name of the counter document (which holds a long type) and the increment. If the doc does not exists, it creates it. Many other parameters are defined in the official documentation.
//Obtain the id from counter document and increment it
com.couchbase.client.java.Bucket bucket;
JsonLongDocument joCounter = bucket.counter("counter", 1);
//get the counter long value (might be useful to generate doc id)
long newCounter = joCounter.content();
This operation is atomic, so feel safe to use counters.
http://docs.couchbase.com/developer/java-2.0/documents-atomic.html

Related

MongoDb and expressJS trying to undertand Code

function _allUsers(callback){
var db = connect.get();
db.collection("users").find({}).toArray(function(err,data){
if(err){
callback(err);
}else{
callback(null,data);
}
});
}
I am trying to understand this code, I have been looking around the web but I find the explanations kinda defficult to understand ( I am new at Mean stack), so my questions are:
What does the Collection method do? I am not sure but the string "users" is it just the name of our collection with all users?
Why do we have to use a callback in this situation? (I find callbacks very confusing).
And why do we have to give toArray function, an annonymous function?
Instead of toArray could I use pretty method() without any annonymous function as a parameter?
MEAN Stack is a software bundle of software programs supporting applications written in all javascript. This means you can use javascript from your database, to your back-end and front-end.
MEAN actually stands for the first characters of each software program included in the stack. MongoDB, Expressjs, AngularJS and NodeJS.
1
MongoDB is a NoSQL database which uses BSON (similar to JSON) to store so called documents. Look at a document as if it is a single entity or row in a traditional database. These entities (or rows) are stored in collections (a collection of documents) which can be compared to tables.
So the answer to your 1st question is opens up the users collection, which grants access to all the user documents.
2
NodeJS is asynchronous by design. This allows NodeJS to perform a lot of operations while running on a single thread*. Because NodeJS is single-threaded we need a way to write our code non-blocking meaning we can start an operation, proceed with executing other code and come back whenever that operation is finished.
In your case we request access to the users collection, this takes some time. In order to allow other parts of our application to continue processing we use a callback. When we have access to our collection, our callback is executed and we can perform whatever operation we wanted to do when we first requested access.
*NodeJS actually runs on multiple threads but a developer never has to worry about multithreading, NodeJS does that for us.'
3
This is exactly what the previous point is about.
The .toArray() method returns an array that contains all the documents from a cursor. The method iterates completely the cursor, loading all the documents into RAM and exhausting the cursor. Source
.toArray() is a computionally intensive operation. Since we do not want to wait untill .toArray() is finished but proceed processing the rest of our code, we give it a callback so that we can come back to our collection processing whenever it's ready.
4
From what I can read from the docs I guess you could indeed write blocking code and do it this way:
var users = db.collection("users").find({}).toArray();
This however will block your code entirely. There is never a good reason to do this.
Disclaimer: I left out or oversimplified details in this explanation for ease of understanding.
db.collection('users') this will return the users collection instance
we are using callback for asynchronous
the annonymous function in toArray is its callback
this is dependent on the library in use..
without any annonymous function as a parameter
expressjs is an asynchronous programming, we need callback || Promises
You can think of the collection as of table in MySQL. A collection consists of documents (rows/items/records in MySQL). Your example calls the Users collection and finds all documents (records) in it.
About the callbacks - NodeJS/Express are commonly callbacks-oriented. This is the pattern they use and most of the code is using it, because it is asynchronous. If you need to be sure that some snippet is executed right after some other snippet, you have to use callback (or promise).
Calling toArray() depends on what your callback expects. You can skip calling this method if the callback expects the Query object returned by the find() method. All that depends on your callback.
You can use non-anonymous function, too, but you have to have in mind the asynchronous logic and continue using callbacks/promises. You can read more about callbacks and promises in this Quora's article.
Here you can find more about the find() method.

Atomic get and delete in memcached?

Is there a way to do atomic get-and-delete in memcached?
In other words, I want to get the value for a key if it exists and delete it immediately, so this value can be read once and only once.
I think this pseudocode might work, but note the caveat postscript:
# When setting:
SET key-0 value
SET key-ns 0
# When getting:
ns = INCR key-ns
GET key-{ns - 1}
Constraint: I have millions of keys that could be accessed millions of times, and only a small percentage will have a value set at any given time. I don't want to have to update an atomic counter for every key with every get access request as above.
The canonical, but yet generic, answer to your question is : lock free hash table with a relaxed memory model.
The more relaxed is your memory model the more you get with a good lock free design, it's a way to get more performance out of the same chipset.
Here is a talk about that, I don't think that it's possible to answer to your question with a single post on hash tables and lock free programming, I'm not even trying to do that.
You cannot do this with memcached in a single command since there is no api that supports exactly what your asking for. What I would do to get the behavior your looking for is to implement some sort of marking behavior to signify that another client has or hasn't read the data. For example, you could create a JSON document as follows:
{
"data": "value",
"used": false
}
When you get the item check to see if it has already been used by another client by examining the used field. If it hasn't been used then set the value using the cas you got from the GET command and make sure that the document is updated to reflect the fact that a client has already accessed this key.
If the set operation fails because the cas is invalid then this means that another client has obtained this item and already updated it in memcached to signify that it has been used. In this case you just cancel whatever you were doing with the item and move on.
If the set operation succeeds then this means you client is the sole owner of this data. You can now delete it from memcached and do whatever processing on it you like.
Note that when doing the set I would also add an expiration time of about 5 seconds. This way if you application crashes your documents will clean themselves up if you don't finish with the entire process of deleting them.
To put some code to the answer from #mikewied, I think the basic gist is... (using Node.js):
var Memcached = require('memcached');
var memcache = new Memcached('localhost:11211');
var getOnce = function(key, callback) {
// gets is the check-and-set get (vs regular get)
memcache.gets(key, function(err, data) {
if (!data) {
// Cache miss, nothing to see here.
callback(null);
} else {
var yourData = data[key];
// Do a check-and-set to remove the data from the cache.
// This sets the value to null *only* if no one else already did.
memcache.cas(key, null /* new data */, data.cas, 10, function(err) {
if (err) {
// Check-and-set failed! (Here we'll treat it like a cache miss)
yourData = null;
}
callback(yourData);
});
}
});
};
I'm not an expert on Memcached and so I may be wrong. My answer is from reading the documentation and my experience using Memcached.
IMO this is not possible with memcached's current implementation.
to demonstrate why this is not possible currently here is a simple example to demonstrate the race condition:
two processes start at the same time
both execute a get/delete at the same time
memcached replies to both get commands at the same time
done (the desired result was to have get/delete execute atomically then the second get/delete to fail. instead memcached did get, get, delete, fails to delete)
to get an atomic get/delete would require:
a new command for memcached that is atomic let's call it get_delete
some sort of synchronization lock method of all the memcached clients to ensure both the get and delete commands are executed while the lock is held
so all clients would grab the synchronization lock whenever they need to enter the critcal section (i.e. get, delete) then release the lock after the critical section

MongoDB critical section

I need to perform a few operations (read and writes) on my mongodb without having another process interrupting. It's for an online game and when a user sends resources to another the following steps are performed:
Check his resource value
Abort if it's not enough
Insert a resource transaction
Decrement his resource value
Increment the other ones resource value
I'm concerned that while checking if its enough or inserting the resource transaction some other transaction has already been inserted and the values become invalid. How can I make sure that this part is executed in this order?
I can see two ways:
Use client side transactions to hold a "lock": http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
Or use versioning here whereby you hold a field with a $inc'd version number which gets updated every time you save and must be queried by whenever you go to save. A good example is within Vermongo: https://github.com/thiloplanz/v7files/wiki/Vermongo
Those seem to be the two most plausible ways I see of getting this done.
Transaction is a almost forbidden word when talking about mongo. But you can perform steps 1,2 and 4 using a atomic uptade with $inc using resource value as condition, and then perform steps 3 and 5. You will not have support for rolling back on step if next steps fails.
I am an engineer at Tokutek
TokuMX is a MongoDB replacement server that uses the same protocol and drivers and supports native multi-statement transactions on non-sharded setups. What you want can be accomplished with a serializable transaction, which will take document-level locks on documents you touch. This would be done something like
> db.beginTransaction("serializable");
> if (resourcesInsufficient()) { db.rollbackTransaction(); }
> // insert and update
> db.commitTransaction()
Again, this is not supported in sharding but may be useful for your application. More details, features and limitations are discussed here.

can ZooKeeper get znode data and znode data version (stat) in one single operation?

I am developing an application that use ZooKeeper as the datastore. For one of the methods in the application, I need to use the optimistic concurrent control. For example, I need to implement a get method which get the znode data, and I use the znode data version for the optimistic concurrent control check. For what I understand, one can't get the znode data and znode data version in one single operation. If there is high contention to update the znode data, the get method will not work since the znode data might changed after getting the znode data. so I am asking - is there a way I get can the znode data and znode data version (or znode stat) in one single operation without any locking attempt in between?
In Java, you can can achieve what you want easily:
Stat stat = new Stat();
byte[] data = zk.getData("/path", null, stat));
This does read data and version information (in the stat object) in a single operation. When you write back the data, you pass the version number you got when you read it:
zk.setData("/path", data, stat.getVersion());
If there is a version mismatch, the method will throw KeeperException.BadVersionException, which gives you an optimistic lock.
In Python using Kazoo it is also trivial to get both stats and implement some optmistic locking. Here a sketch:
while True:
data, stat = zk.get("/path")
# do something with the data and then:
try:
zk.set("/path", new_data, stat.version)
break
except BadVersionError:
continue # or pass
Also, do use pre-made recipes when you can, as they are already extensively debuged, and should treat all corner cases.

Multi-collection, multi-document 'transactions' in MongoDB

I realise that MongoDB, by it's very nature, doesn't and probably never will support these kinds of transactions. However, I have found that I do need to use them in a somewhat limited fashion, so I've come up with the following solution, and I'm wondering: is this the best way of doing it, and can it be improved upon? (before I go and implement it in my app!)
Obviously the transaction is controlled via the application (in my case, a Python web app). For each document in this transaction (in any collection), the following fields are added:
'lock_status': bool (true = locked, false = unlocked),
'data_old': dict (of any old values - current values really - that are being changed),
'data_new': dict (of values replacing the old (current) values - should be an identical list to data_old),
'change_complete': bool (true = the update to this specific document has occurred and was successful),
'transaction_id': ObjectId of the parent transaction
In addition, there is a transaction collection which stores documents detailing each transaction in progress. They look like:
{
'_id': ObjectId,
'date_added': datetime,
'status': bool (true = all changes successful, false = in progress),
'collections': array of collection names involved in the transaction
}
And here's the logic of the process. Hopefully it works in such a way that if it's interupted, or fails in some other way, it can be rolled back properly.
1: Set up a transaction document
2: For each document that is affected by this transaction:
Set lock_status to true (to 'lock' the document from being modified)
Set data_old and data_new to their old and new values
Set change_complete to false
Set transaction_id to the ObjectId of the transaction document we just made
3: Perform the update. For each document affected:
Replace any affected fields in that document with the data_new values
Set change_complete to true
4: Set the transaction document's status to true (as all data has been modified successfully)
5: For each document affected by the transaction, do some clean up:
remove the data_old and data_new, as they're no longer needed
set lock_status to false (to unlock the document)
6: Remove the transaction document set up in step 1 (or as suggested, mark it as complete)
I think that logically works in such a way that if it fails at any point, all data can be either rolled back or the transaction can be continued (depending on what you want to do). Obviously all rollback/recovery/etc. is performed by the application and not the database, by using the transaction documents and the documents in the other collections with that transaction_id.
Is there any glaring error in this logic that I've missed or overlooked? Is there a more efficient way of going about it (e.g. less writing/reading from the database)?
As a generic response multi-document commits on MongoDB can be performed as two phase commits, which have been somewhat extensively documented in the manual (See: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/).
The pattern suggested by the manual is briefly to following:
Set up a separate transactions collection, that includes target document, source document, value and state (of the transaction)
Create new transaction object with initial as the state
Start making a transaction and update state to pending
Apply transactions to both documents (target, source)
Update transaction state to committed
Use find to determine whether documents reflect the transaction state, if ok, update transaction state to done
In addition:
You need to manually handle failure scenarios (something didn't happen as described below)
You need to manually implement a rollback, basically by introducing a name state value canceling
Some specific notes for your implementation:
I would discourage you from adding fields like lock_status, data_old, data_new into source/target documents. These should be properties of the transactions, not the documents themselves.
To generalize the concept of target/source documents, I think you could use DBrefs: http://www.mongodb.org/display/DOCS/Database+References
I don't like the idea of deleting transaction documents when they are done. Setting state to done seems like a better idea since this allows you to later debug and find out what kind of transactions have been performed. I'm pretty sure you won't run out of disk space either (and for this there are solutions as well).
In your model how do you guarantee that everything has been changed as expected? Do you inspect the changes somehow?
MongoDB 4.0 adds support for multi-document ACID transactions.
Java Example:
try (ClientSession clientSession = client.startSession()) {
clientSession.startTransaction();
collection.insertOne(clientSession, docOne);
collection.insertOne(clientSession, docTwo);
clientSession.commitTransaction();
}
Note, it works for replica set. You can still have a replica set with one node and run it on local machine.
https://stackoverflow.com/a/51396785/4587961
https://docs.mongodb.com/manual/tutorial/deploy-replica-set-for-testing/
MongoDB 4.0 is adding (multi-collection) multi-document transactions: link