Click counter implementation with MongoDB - asynchronous write - mongodb

I would like to implement a click counter using MongoDB (e.g. user clicks a link, count the total clicks).
My intuitive approach is to have an in-memory low priority thread pool that will handle a blocking queue of click messages and persist it to MongoDB in the background asynchronously.
So my question is - does MongoDB's native Java Driver have some async capabilities that do just that?
If it doesn't, is there an alternative Mongo driver that might has benefits over rolling my own async code?

Well, not really async but if you use a WriteConcern of NONE it is sort of async in that you only get the data into the socket's buffer and the insert returns. The down side is that you won't know if the insert worked or not. In the face of a failure you could silently drop a lot of clicks.
There is an Asynchronous Java Driver that allows you to use futures or callbacks to get the results of the insert. Going that approach there should be no need to roll your own queue or have a background thread (the driver has its own receive threads).
HTH - Rob.
P.S. Full disclosure - I work on the Asynchronous Java Driver.

Related

Asynchronous database queries in PostgreSQL using drivers

I would like to use asynchronous queries in PostgreSQL. I know that some asynchronous drivers exist in many programming languages ie. asyncg (Python), vertx-postgress (Java) and so on.
Do I need to configure PostgreSQL somehow to use asynchronous features? There is an "Asynchronous Behavior" section in postgesql.conf. Do I need to uncomment and edit these values to use PostgreSQL
optimally in an asynchronous way?
The parameters in the “asynchonous behavior” section of the documentation are largely unrelated.
You should instead study the Asynchronous Command Processing section. This is about the C library, but that's probably what's used by the libraries you mention under the hood.
Don't forget that there can only be one query at a time on a PostgreSQL database connection, regardless if processing is synchronous or not.

Implementing a "live" stream to drive an Akka 2.4 Persistence Query

I have been investigating the experimental Akka Persistence Query module and am very interested in implementing a custom read journal for my application. The documentation describes two main flavors of queries, ones that return current state of the journal (e.g CurrentPersistenceIdsQuery) and ones that return a subscribe-able stream that emit events as the events are committed to the journal via the write side of the application (e.g. AllPersistenceIdsQuery)
For my contrived application, I am using Postgres and Slick 3.1.1 to drive the guts of these queries. I can successfully stream database query results by doing something like:
override def allPersistenceIds = {
val db = Database.forConfig("postgres")
val metadata = TableQuery[Metadata]
val query = for (m <- metadata) yield m.persistenceId
Source.fromPublisher(db.stream(query.result))
}
However, the stream is signaled as complete as soon as the underlying Slick DB action is completed. This doesn't seem to fulfill the requirement of a perpetually open stream that is capable of emitting new events.
My questions are:
Is there a way to do it purely using the Akka Streams DSL? That is, can I sent up a flow that cannot be closed?
I have done some exploring on how the LevelDB read journal works and it seems to handle new events by having the read journal subscribe to the write journal. This seems reasonable but I must ask - in general, is there a recommended approach for dealing with this requirement?
The other approach I have thought about is polling (e.g. periodically have my read journal query the DB and check for new events / ids). Would someone with more experience than I be able to offer some advice?
Thanks!
It's not as trivial as this one line of code however you're one the right track already.
In order to implement an "infinite" stream you'll need to query multiple times - i.e. implement polling, unless the underlying db allows for an infinite query (which here it does not AFAICS).
The polling needs to keep track of the "offset", so if you're querying by some tag, and you issue another poll, you need to start that (2nd now) query from the "last emitted element", and not the beginning of the table again. So you need somewhere, most likely an Actor, that keeps this offset.
The Query Side LevelDB plugin is not the best role model for other implementations as it assumes much about the underlying journal and how those work. Also, LevelDB is not meant for production with Akka Persistence – it's a Journal we ship in order to have a persistent journal you can play around with out of the box (without starting Cassandra etc).
If you're looking for inspiration the MongoDB plugins actually should be a pretty good source for that, as they have very similar limitations as the SQL stores. I'm not sure if any of the SQL journals did currently implement the Query side.
One can use Postgres replication API to get 'infinite' stream of database events. It's supported by Postgres JDBC driver starting from version 42.0.0, see related pull request.
However, it's not real stream but rather buffered synchronous reader from database WAL.
PGReplicationStream stream =
pgConnection
.replicationStream()
.logical()
.withSlotName("test_decoding")
.withSlotOption("include-xids", false)
.withSlotOption("skip-empty-xacts", true)
.start();
while (true) {
ByteBuffer buffer = stream.read();
//process logical changes
}
It would be nice to have an Akka Streams adapter (Source) in alpakka project for this reader.

Why using Locking in MongoDB?

MoongoDB is from the NoSql era, and Lock is something related to RDBMS? from Wikipedia:
Optimistic concurrency control (OCC) is a concurrency control method for relational database management systems...
So why do i find in PyMongo is_locked , and even in driver that makes non-blocking calls, Lock still exists, Motor has is_locked.
NoSQL does not mean automatically no locks.
There always some operations that do require a lock.
For example building of index
And official MongoDB documentation is a more reliable source than wikipedia(none offense meant to wikipedia :) )
http://docs.mongodb.org/manual/faq/concurrency/
Mongo does in-place updates, so it needs to lock in order to modify the database. There are other things that need locks, so read the link #Tigra provided for more info.
This is pretty standard as far as databases and it isn't an RDBMS-specific thing (Redis also does this, but on a per-key basis).
There are plans to implement collection-level (instead of database-level) locking: https://jira.mongodb.org/browse/SERVER-1240
Some databases, like CouchDB, get around the locking problem by only appending new documents. They create a new, unique revision id and once the document is finished writing, the database points to the new revision. I'm sure there's some kind of concurrency control when changing which revision is used, but it doesn't need to block the database to do that. There are certain downsides to this, such as compaction needing to be run regularly.
MongoDB implements a Database level locking system. This means that operations which are not atomic will lock on a per database level, unlike SQL whereby most techs lock on a table level for basic operations.
In-place updates only occur on certain operators - $set being one of them, MongoDB documentation did used to have a page that displayed all of them but I can't find it now.
MongoDB currently implements a read/write lock whereby each is separate but they can block each other.
Locks are utterly vital to any database, for example, how can you ensure a consistent read of a document if it is currently being written to? And if you write to the document how do you ensure that you only apply that single update at once and not multiple updates at the same time?
I am unsure how version control can stop this in CouchDB, locks are really quite vital for a consistent read and are separate to version control, i.e. what if you wish to apply a read lock to the same version or read a document that is currently being written to a new revision? You will obviously see a lock queue appear. Even though version control might help a little with write lock saturation there will still be a write lock and it will still need to work on a level.
As for concurrency features; MongoDB has the ability (for one), if the data is not in RAM, to subside a operation for other operations. This means that locks will not just sit there waiting for data to be paged in and other operations will run in the mean time.
As a side note, MongoDB actually has more locks than this, it also has a JavaScript lock which is global and blocking, it does not have the normal concurrency features of regular locks.
and even in driver that makes non-blocking calls
Hmm I think you might be confused by what is meant as a "non-blocking" application or server: http://en.wikipedia.org/wiki/Non-blocking_algorithm

How to async bulk get(s) in memcached?

Is it possible to send a bulk gets call to memcached using spymemcached? (Note the gets to get CAS ID)
There is not a way to send bulk gets in spymemcached. In the future though all operations will most likely be sent as bulk operations under the hood. This means that you will send everything as a single operation and they will all get packaged up in spymemcached as a single bulk operation. This functionality might be available in version 3.0.

Is there any method to guarantee transaction from the user end

Since MongoDB does not support transactions, is there any way to guarantee transaction?
What do you mean by "guarantee transaction"?
There are two conepts in MongoDB that are similar;
Atomic operations
Using safe mode / getlasterror ...
http://www.mongodb.org/display/DOCS/Last+Error+Commands
If you simply need to know if there was an error when you run an update for example you can use the getlasterror command, from the docs ...
getlasterror is primarily useful for
write operations (although it is set
after a command or query too). Write
operations by default do not have a
return code: this saves the client
from waiting for client/server
turnarounds during write operations.
One can always call getLastError if
one wants a return code.
If you're writing data to MongoDB on
multiple connections, then it can
sometimes be important to call
getlasterror on one connection to be
certain that the data has been
committed to the database. For
instance, if you're writing to
connection # 1 and want those writes to
be reflected in reads from connection #2, you can assure this by calling getlasterror after writing to
connection # 1.
Alternatively, you can use atomic operations for cases where you need to increment a value for example (like an upvote, etc.) more about that here:
http://www.mongodb.org/display/DOCS/Atomic+Operations
As a side note, MySQL's default storage engine doesn't have transaction either! :)
http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
MongoDB only supports atomic operations. There is no ways implement transaction in the sense of ACID on top of MongoDB. Such a transaction support must be implemented in the core. But you will never see full transaction support due to the CARP theorem. You can not have speed, durability and consistency at the same time.
I think ti's one of the things you choose to forego when you choose a NoSQL solution.
If transactions are required, perhaps NoSQL is not for you. Time to go back to ACID relational databases.
Unfortunately MongoDB does't support transaction out of the box, but actually you can implement ACID optimistic transactions on top on it. I wrote an example and some explanation on a GitHub page.