Berkeley DB: Number of lock objects for hash access method - hash

This page says that "for the Hash access method, you only need a single lock object".
http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/lock_max.html
Does this mean that all the processes/threads that access the database will try to lock the same lock object? Doesn't it cause a very high lock contention?
Thanks!
--Michi

What it's describing here is how to calculate the number of lock objects required by your application, although the default lock object configuration (1000) is usually enough. It's describing how many lock objects a given single data access operation will require, so that you can multiply that times the number of concurrent data access operations and configure the number of lock objects appropriately. It's not really talking about lock contention.
For the HASH access method, a given key value maps directly to a hash bucket. There is only one page that needs to be looked at (and locked) in order to reach the data. This is different from Btree (which needs to traverse the internal index nodes in order to get to the data) and Queue (which needs to lock each record and the page that the record resides on).
In recent releases we've actually eliminated some locks that weren't required, so that a simpler way of putting it would be:
Each database operation is going to require
One lock object for the page (Btree, Hash or Recno) or record (Queue) that is being accessed,
plus One lock object for the meta data page,
plus One lock object if a Btree page split is required,
plus One lock object per page if Queue is being used
Basically, typically 2-3 lock objects per data access. Transactions accumulate lock objects until the transaction is complete, so if a transaction in your application typically accesses 10 records, that transaction will require 20-30 lock objects. If you can have up to 10 concurrent threads in your application, then you would need to configure your system to have about 300 lock objects. It's always better to configure more than you need so that you don't run out and the memory overhead of over-allocating lock objects is minimal (they are small structures).
I hope that helps.
Dave

Related

mongo save documents in monotically increasing sequence

I know mongo docs provide a way to simulate auto_increment.
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
But it is not concurrency-proof as guaranteed by say MySQL.
Consider the sequence of events:
client 1 obtains an index of 1
client 2 obtains an index of 2
client 2 saves doc with id=2
client 1 saves doc with id=1
In this case, it is possible to save a doc with id less than the current max that is already saved. For MySql, this can never happen since auto increment id is assigned by the server.
How do I prevent this? One way is to do optimistic looping at each client, but for many clients, this will result in heavy contention. Any other better way?
The use case for this is to ensure id is "forward-only". This is important for say a chat room where many messages are posted, and messages are paginated, I do not want new messages to be inserted in a previous page.
But it is not concurrency-proof as guaranteed by say MySQL.
That depends on the definition of concurrency-proof, but let's see
In this case, it is possible to save a doc with id less than the current max that is already saved.
That is correct, but it depends on the definition of simultaneity and monotonicity. Let's say your code snapshots the state of some other part of the system, then fetches the monotonic key, then performs an insert that may take a while. In that case, this apparently non-monotonic insert might actually be 'more monotonic' in the sense that index 2 was indeed captured at a later time, possibly reflecting a more recent state. In other words: does the time it took to insert really matter?
For MySql, this can never happen since auto increment id is assigned by the server.
That sounds like folklore. Most relational dbs offer fine-grained control over these features, since strict guarantees severely impact concurrency.
MySQL does neither guarantee that there are no gaps, nor that a transaction with a high AUTO_INCREMENT id isn't visible to other readers before a transaction that acquired a lower AUTO_INCREMENT value was committed, unless you keep a table-level lock, which severely impacts concurrency.
For gaplessness, consider a transaction rollback of the first of two concurrent inserts. Does the second insert now get a new id assigned while it's being committed? No - from the InnoDB documentation:
You may see gaps in the sequence of values assigned to the AUTO_INCREMENT column if you roll back transactions that have generated numbers using the counter. (see end of 14.6.5.5.1, "Traditional InnoDB Auto-Increment Locking")
and
In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”
also, you're completely ignoring the problem of replication where sequences lead to even more trouble:
Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time. (see 14.6.5.5.2 "Configurable InnoDB Auto-Increment Locking")
The sheer length of the documentation of the InnoDB behavior is a reminder of the true complexity of making apparently simple guarantees in a concurrent system. Yes, monotonicity of inserts is possible with table-level locks, but hardly desirable. If you take a distributed view of the system, things get worse, because we can't even be sure of the counter value in partition mode...

DB2 Read committed without locking?

We have a transaction that is modifying a record. The transaction must call a web service, rolling back the transaction if the service fails (so it can't commit it before hand). Because the record is modified, the client app has a lock on it. However, the web service must retrieve that record to get information from it as part of it's processing. Bam, deadlock.
We use websphere, which, for reasons that boggle my mind, defaults to repeatable read isolation level. We knocked it down to read_committed, thinking that this would retrieve the row without seeking a lock. In our dev environment, it seemed to work, but in staging we're getting deadlocks.
I'm not asking why it behaved differently, we probably made a mistake somewhere. Nor am I asking about the specifics of the web service example above, because obviously this same thing could happen elsewhere.
But based on reading the docs, it seems like read_committed DOES acquire a shared lock during read, and as a result will wait for an exclusive lock held by another transaction (in this case the client app). But I don't want to go to read_uncommitted isolation level because I don't want dirty reads. Is there a less extreme solution? I need some middle ground where I can perform reads without any lock-waiting, and retrieve only committed data.
Is there such a goldilocks solution? Not too deadlock-y, not too dirty-read-y? If not in siolation level, maybe some modifier I can tack onto my SQL? Anything?
I assume you are talking jdbc isolation levels, and not db2. The difference between read_committed (cursor stability in db2) and repeatable_read (read stability) is how long the share locks are kept. repeatable_read keeps every lock that satisfied the predicates, read_committed on the other hand only keeps the lock until another row that matches the predicate is found.
Have you compared the plans? If the plans are different you may end up with different behaviour.
Are there any escalations occurring?
Have you tried CURRENTLY_COMMITTED (assuming you are on 9.7+)?
Pre currently_committed there where the following settings, DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and DB2_SKIPDELETED
The lowest isolation level that reads committed rows is read committed.
Usually, you process rows in a DB2 database like this:
Read database row with no read locks (ordinary SELECT with read committed).
Process data so you have a row with changed values.
Read database row again, with a read lock. (SELECT for UPDATE)
Check to see the database row in 1. matches the database row in 3.
If rows match, update database row.
If rows don't match, release update lock and go back to 2.

Locking ID before doing an insert - Oracle 10G

I have a table whose primary keys are numbers are not sequentially.
By company policy is to register the new rows with ID lower value available. I.E.
table.ID = [11,13,14,16,17]
min(table.ID) = 12
I have an algorithm that gives me the lowest available. I want to know how to prevent this ID is use by another person before making insertion.
Would it be possible to do by DB? or would it be programming language?
Thanks.
The company policy is extremely short-sighted. Unless the company's goal is to build applications that do not scale and the company is unconcerned with performance.
If you really wanted to do this, you'd need to serialize all your transactions that touch this table-- essentially turning your nice, powerful server into a single-threaded single-user low-end machine. There are any number of ways to do this. The simplest (though not simple) method would be to do a SELECT ... FOR UPDATE on the row with the largest key less than the new key you want to insert (11 in this case). Once you acquired the lock, you would need to re-confirm that 12 is vacant. If it is, you could then insert the row with an id of 12. Otherwise, you'd need to restart the process looking for the new key and trying to lock the row with an id one less than that key. When your transaction commits, the lock would be released and the next session that was blocked waiting for a lock would be able to process. This assumes that you can control every process that tries to insert data into this table and that they would all implement exactly the same logic. It will lock up the system if you ever allow transactions to span waits for human input because humans will inevitably go to lunch with rows locked. And all that serialization will radically reduce the scalability of your application.
I would strongly encourage you to push back against the ridiculous "requirement" rather than implementing something this hideous.

CacheManager GetData performance issue

We are using Enterprise Library 5 and the CacheManager that it provides in our web application. Everything seems to be working fine up to the point where we start a heavy load test on the application.
We are caching records from the database using a key based on their ID. We are not requesting from cache one item all the time, sometimes we need to get a list of items from the cache. For this we have a LINQ query that makes a Select(e => CacheManager.GetData(id_from_list)) and returns the list of items from the cache. Most of the time this works fine but in heavy loads the GetData method becomes a bottleneck due to the locking that the cache manager is performing both on read and write operations from cache. Basically only one thread can read data from the cache at one time. We did create several cache managers based on the type of the items - this allows several threads to get data from different cache managers but still the issue remains when heavy loads hit the application (one bottleneck per cache manager) - of course it did improve the application up to some point but not enough.
Did someone else encountered the same problem and did you find a way to overcome this?
NOTE: We tried to actually cache lists of items and compose the key from the ids of the items in the list. This actually solved the problem and the cachemanager.getdata is not a bottleneck anymore ... BUT ... obviously this is not a good solution as we could have each item thousands of times in the cache in a lot of lists.
You may consider adapting the CacheManager to use a read/write lock (which I think is much more suitable for this situation) instead of the exclusive locking that it uses now.
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlock.aspx
Basically, a read/write lock is appropriate when multiple reader threads need simultaneous access to the data, and only the occurrence of a write will cause incoming readers to block.
These have other problems when put under load, however, such as write starvation. Depending on the read/write lock implementation a write will always wait for all reads to finish first - with a constant stream of reads a write will never have a chance to happen.

Memcached, Locking and Race Conditions

We are trying to update memcached objects when we write to the database to avoid having to read them from database after inserts/updates.
For our forum post object we have a ViewCount field containing the number of times a post is viewed.
We are afraid that we are introducing a race condition by updating the memcached object, as the same post could be viewed at the same time on another server in the farm.
Any idea how to deal with these kind of issues - it would seem that some sort of locking is needed but how to do it reliably across servers in a farm?
If you're dealing with data that doesn't necessarily need to be updated realtime, and to me the view count is one of them, then you could add an expires field to the objects that are stored in memcache.
Once that expiration happens, it'll go back to the database and read the new value, but until then it will leave it alone.
Of course for new posts you may want this updated more often, but you can code for this.
Memcache only stores one copy of your object in one of its instances, not in many of them, so I wouldn't worry about object locking or anything. That is for the database to handle, not your cache.
Edit:
Memcache offers no guarantee that when you're getting and setting from varied servers that your data won't get clobbered.
From memcache docs:
A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.
Race conditions and stale data
One thing to keep in mind as you design your application to cache data, is how to deal with race conditions and occasional stale data.
Say you cache the latest five comments for display on a sidebar in your application. You decide that the data only needs to be refreshed once per minute. However, you neglect to remember that this sidebar display is renderred 50 times per second! Thus, once 60 seconds rolls around and the cache expires, suddenly 10+ processes are running the same SQL query to repopulate that cache. Every time the cache expires, a sudden burst of SQL traffic will result.
Worse yet, you have multiple processes updating the same data, and the wrong one ends up dating the cache. Then you have stale, outdated data floating about.
One should be mindful about possible issues in populating or repopulating our cache. Remember that the process of checking memcached, fetching SQL, and storing into memcached, is not atomic at all!
I'm thinking - could a solution be to store viewcount seperately from the Post object, and then do an INCR on it. Of course this would require reading 2 seperate values from memcached when displaying the information.
memcached operations are atomic. the server process will queue the requests and serve each one completely before going to the next, so there's no need for locking.
edit: memcached has an increment command, which is atomic. You just have to store the counter as a separate value in the cache.
We encountered this in our system. We modified get so
If the value is unset, it sets it with a flag ('g') and [8] second TTL, and returns false so the calling function generates it.
If the value is not flagged (!== 'g') then unserialize and return it.
If the value is flagged (==='g') then wait 1 second and try again until it's not flagged. It will eventually be set by the other process, or expired by the TTL.
Our database load dropped by a factor of 100 when we implemented this.
function get($key) {
$value=$m->get($key);
if ($value===false) $m->set($key, 'g', $ttl=8);
else while ($value==='g') {
sleep(1);
$value=$m->get($key);
}
return $value;
}