atomic operations and atomic transactions - atomic

Can someone explain to me, whats the difference between atomic operations and atomic transactions? Its seems to me that these two are the same thing.Is that correct?

The concept of Atomicity is common between atomic transactions and atomic operations, but they are usually related to different domains.
Atomic Transactions are associated with Database operations where a set of actions must ALL complete or else NONE of them complete. For example, if someone is booking a flight, you want to both get payment AND reserve the seat OR do neither. If either one were allowed to succeed without the other also succeeding, the database would be inconsistent.
Atomic Operations on the other hand are usually associated with low-level programming with regards to multi-processing or multi-threading applications and are similar to Critical Sections.
For example, if two threads both access and modify the same variable, each thread goes through the following steps:
Read the variable from storage into local memory.
Modify the value in local memory.
Write the modified value back to the original storage location.
But in a multi-threaded system an interrupt or other context switch might happen after the first process has read the value but has not written it back. The second process (or interrupt) will then read and modify the OLD value and write its modified value back to storage. When the first process is re-enabled, it doesn't know that something might have changed so it writes back its change to the original value. Hence the operation that the second process did to the variable will be lost.
If an operation is atomic, it is guaranteed to complete without being interrupted once it begins. This is usually accomplished using hardware-level primitives like Test-and-Set or Compare-and-Swap.

To get a wider picture, you can take a look at:
MySQL Transactions and Atomic Operations
Atomicity (database systems)
Atomicity (Programming)
Some quotes from the above-cited resources:
About databases:
In an atomic transaction, a series of database operations either all
occur, or nothing occurs. A guarantee of atomicity prevents updates to
the database occurring only partially, which can cause greater
problems than rejecting the whole series outright. In other words,
atomicity means indivisibility and irreducibility.
About programming:
In concurrent programming, an operation (or set of operations) is
atomic, linearizable, indivisible or uninterruptible if it appears to
the rest of the system to occur instantaneously. Atomicity is a
guarantee of isolation from concurrent processes. Additionally, atomic
operations commonly have a succeed-or-fail definition — they either
successfully change the state of the system, or have no apparent
effect.
I have seen the word transaction used more often for databases and operation in programming, especially in kernel-level programming.

In a statement:
an atomic transaction is the smallest set of operations to perform the required steps.
Either all of those required operations happen(successfully) or the atomic transaction fails.
An atomic operation usually has nothing in common with transactions. To my knowledge this comes from hardware programming, where an set of operations (or one) happen to get solved instantly.

Related

Making multiple users access to PSQL database

I'm a rookie in this topic, all I ever did was making a connection to database for one user, so I'm not familiar with making multiple user access to database.
My case is: 10 facilities will use my program for recording when workers are coming and leaving, the database will be on the main server and all I made was one user while I was programming/testing that program. My question is: Can multiple remote locations use one user for database to connect (there should be no collision because they are all writing different stuff, but at the same tables) and if that's not the case, what should I do?
Good relational databases handle this quite well, it is the “I” in the the so-called ACID properties of transactions in relational databases; it stands for isolation.
Concurrent processes are protected from simultaneously writing the same table row by locks that block other transactions until one transaction is done writing.
Readers are protected from concurrent writing by means of multiversion concurrency control (MVCC), which keeps old versions of the data around to serve readers without blocking anybody.
If you have enclosed all data modifications that belong together into a transaction, so that they happen atomically (the “A” in ACID), and your transactions are simple and short, your application will probably work just fine.
Problems may arise if these conditions are not satisfied:
If your data modifications are not protected by transactions, a concurrent session may see intermediate, incomplete results of a different session and thus work with inconsistent data.
If your transactions are complicated, later statements inside a transaction may rely on results of previous statements in indirect ways. This assumption can be broken by concurrent activity that modifies the data. There are three approaches to that:
Pessimistic locking: lock all data the first time you use them with something like SELECT ... FOR UPDATE so that nobody can modify them until your transaction is done.
Optimistic locking: don't lock, but whenever you access the data a second time, check that nobody else has modified them in the meantime. If that has been the case, roll the transaction back and try it again.
Use high transaction isolation levels like REPEATABLE READ and SERIALIZABLE which give better guarantees that the data you are using don't get modified concurrently. You have to be prepared to receive serialization errors if the database cannot keep the guarantees, in which case you have to roll the transaction back and retry it.
These techniques achieve the same goal in different ways. The discussion when to use which one exceeds the scope of this answer.
If your transactions are complicated and/or take a long time (long transactions are to be avoided as much as possible, because they cause all kinds of problems in a database), you may encounter a deadlock, which is two transactions locking each other in a kind of “deadly embrace”.
The database will detect this condition and interrupt one of the transactions with an error.
There are two ways to deal with that:
Avoid deadlocks by always locking resources in a certain order (e.g., always update the account with the lower account number first).
When you encounter a deadlock, your code has to retry the transaction.
Contrary to common believe, a deadlock is not necessarily a bug.
I recommend that you read the chapter about concurrency control in the PostgreSQL documentation.

Manually lock for MongoDB

I have these operations:
Find a doc from collection.
Manipulate doc.prop base on it's current value, which "prop" is a string.
Update doc back to collection.
So in this case, I have to make sure these operations are atomic, because updating doc.prop must base on the current value.
Here are two approaches:
1. Add "valueKey"(Number) property in doc, make sure valueKey is matched when updating doc. Increase valueKey after updated. If valueKey is not matched, mark this update as failure and retry again.
2. Use "fsyncLock" provided by MongoDB to lock the whole mongod instance, during the operations.
The 1st approach I mentioned above is well, but when facing huge volume of these operations at the same time, the "failure" and "retry" would be frequent.
The 2nd approach, which I haven't tried, I think it is for backing up database and is not good in this case.
So I'm wondering is there any other efficient approach?
The first approach is called an optimistic lock. Optimistic locks assume that the probability of collision is low, otherwise, as you already pointed out, there are a lot of retries. Those retries can also be destructive - if a text is edited, it might make sense to merge the edits, but it hardly ever makes sense for a phone number.
Locking the entire database is an extreme form of a pessimistic (offline) lock, where the concurrency of the system is deliberately reduced. However, that has problems because clients don't know what's going on - their edits will simply fail which is worst-of-a-kind user experience.
So pessimistic locks really only make sense if clients have a chance of actually knowing that something is locked. For instance, you'd somehow need to inform the user that it's not possible to edit the item she wants to edit, because someone else is already in edit-mode for that item. This also has problems, especially if another user left the screen and is blocking all other users.
If you wanted to go for a pessimistic lock, however, that should absolutely never be implemented by something like a global database lock: simply lock the item itself and implement the business logic for the locking in your code.
Morale: This isn't a technology problem, it's a logical problem. Google Docs demonstrates one way to allow concurrent editing of multiple users, but it's hard to implement, has limited use in other types of applications and is still deemed annoying by some users. Git and the likes show another method, where the logic of branches, merging and conflicts is exposed to the user as well, but asynchronously (multi-version concurrency control).

Write operation during a long cursor operation

I use MongoDB 2.4 with a single DB.
I find all items in a collection (50.000+) and for each one, I insert it into another one.
it = coll1.find()
while (it.hasNext()) {
coll2.save(it.next())
}
Is it a performance issue to make intensive writes when a cusor is open on the same database ?
This essentially comes down to a question about concurrency ( http://docs.mongodb.org/manual/faq/concurrency/ ) being able to do reads on a single database level writer greedy lock performantly while creating a write intensive load.
MongoDB should be able to juggle your read lock with the write lock quite well here, interweaving operations and yielding the current operation under certain conditions that it sees fit to keep performance up (see link supplied above).
This is, of course, in contrast to SQL where read and write operations are isolated, as such this means that MongoDBs concurrency rules actually break the I in ACID. Of course, in SQL the lock is much more granular so you would get relative performance normally.
If you do see a performance hit, mainly due to IO (reading requires IO as well remember) then you might find it prudent to batch your writes into groups of maybe 1000, taking about a 5 second break after each batch to let the IO subside.
No as cursors are not atomic. Each read is its own atomic transaction. This means that mongo is not subject to the issues of ensuring that the cursor represents a single snapshot in time.

Why using Locking in MongoDB?

MoongoDB is from the NoSql era, and Lock is something related to RDBMS? from Wikipedia:
Optimistic concurrency control (OCC) is a concurrency control method for relational database management systems...
So why do i find in PyMongo is_locked , and even in driver that makes non-blocking calls, Lock still exists, Motor has is_locked.
NoSQL does not mean automatically no locks.
There always some operations that do require a lock.
For example building of index
And official MongoDB documentation is a more reliable source than wikipedia(none offense meant to wikipedia :) )
http://docs.mongodb.org/manual/faq/concurrency/
Mongo does in-place updates, so it needs to lock in order to modify the database. There are other things that need locks, so read the link #Tigra provided for more info.
This is pretty standard as far as databases and it isn't an RDBMS-specific thing (Redis also does this, but on a per-key basis).
There are plans to implement collection-level (instead of database-level) locking: https://jira.mongodb.org/browse/SERVER-1240
Some databases, like CouchDB, get around the locking problem by only appending new documents. They create a new, unique revision id and once the document is finished writing, the database points to the new revision. I'm sure there's some kind of concurrency control when changing which revision is used, but it doesn't need to block the database to do that. There are certain downsides to this, such as compaction needing to be run regularly.
MongoDB implements a Database level locking system. This means that operations which are not atomic will lock on a per database level, unlike SQL whereby most techs lock on a table level for basic operations.
In-place updates only occur on certain operators - $set being one of them, MongoDB documentation did used to have a page that displayed all of them but I can't find it now.
MongoDB currently implements a read/write lock whereby each is separate but they can block each other.
Locks are utterly vital to any database, for example, how can you ensure a consistent read of a document if it is currently being written to? And if you write to the document how do you ensure that you only apply that single update at once and not multiple updates at the same time?
I am unsure how version control can stop this in CouchDB, locks are really quite vital for a consistent read and are separate to version control, i.e. what if you wish to apply a read lock to the same version or read a document that is currently being written to a new revision? You will obviously see a lock queue appear. Even though version control might help a little with write lock saturation there will still be a write lock and it will still need to work on a level.
As for concurrency features; MongoDB has the ability (for one), if the data is not in RAM, to subside a operation for other operations. This means that locks will not just sit there waiting for data to be paged in and other operations will run in the mean time.
As a side note, MongoDB actually has more locks than this, it also has a JavaScript lock which is global and blocking, it does not have the normal concurrency features of regular locks.
and even in driver that makes non-blocking calls
Hmm I think you might be confused by what is meant as a "non-blocking" application or server: http://en.wikipedia.org/wiki/Non-blocking_algorithm

Is there any method to guarantee transaction from the user end

Since MongoDB does not support transactions, is there any way to guarantee transaction?
What do you mean by "guarantee transaction"?
There are two conepts in MongoDB that are similar;
Atomic operations
Using safe mode / getlasterror ...
http://www.mongodb.org/display/DOCS/Last+Error+Commands
If you simply need to know if there was an error when you run an update for example you can use the getlasterror command, from the docs ...
getlasterror is primarily useful for
write operations (although it is set
after a command or query too). Write
operations by default do not have a
return code: this saves the client
from waiting for client/server
turnarounds during write operations.
One can always call getLastError if
one wants a return code.
If you're writing data to MongoDB on
multiple connections, then it can
sometimes be important to call
getlasterror on one connection to be
certain that the data has been
committed to the database. For
instance, if you're writing to
connection # 1 and want those writes to
be reflected in reads from connection #2, you can assure this by calling getlasterror after writing to
connection # 1.
Alternatively, you can use atomic operations for cases where you need to increment a value for example (like an upvote, etc.) more about that here:
http://www.mongodb.org/display/DOCS/Atomic+Operations
As a side note, MySQL's default storage engine doesn't have transaction either! :)
http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
MongoDB only supports atomic operations. There is no ways implement transaction in the sense of ACID on top of MongoDB. Such a transaction support must be implemented in the core. But you will never see full transaction support due to the CARP theorem. You can not have speed, durability and consistency at the same time.
I think ti's one of the things you choose to forego when you choose a NoSQL solution.
If transactions are required, perhaps NoSQL is not for you. Time to go back to ACID relational databases.
Unfortunately MongoDB does't support transaction out of the box, but actually you can implement ACID optimistic transactions on top on it. I wrote an example and some explanation on a GitHub page.