Is it possible to grab a list of memcached key based on some regex? I understand that one solution is to store the key in the database and grab the list when I need to delete those keys. This means that, it is going to incur additional cost to the db.
I was wondering if there is another way to do it without DB overhead.
Cheers,
Mickey
No, there's no way to do that. The documentation suggests a way to simulate a namespace, but that's it.
memcached is fast because it doesn't do this sort of thing.
If it did all the stuff your database could do, it'd be as fast as your database and someone would need to come along and build something with more constrained semantics that optimized for speed over functionality.
private static string CleanKey(string key)
{
var regex = new Regex("[^a-zA-Z0-9-]");
var clean = regex.Replace(key, string.Empty);
return clean.Length > 250 ? clean.Substring(0, 250) : clean;
}
Related
I am working on an app where users can create posts that uses Amazon DynamoDB. One of the attributes of a post item in the database is postId. I am searching for the best practice to set this value upon creation. So far, I have thought of:
Counting the current items in the DB and then assigning the value as postId = dbcount + 1. I cannot find a count method for DynamoDB using Swift, and the ways I have found (scan & description) are either inefficient or accurate. Also, I thought of the scenario of 2 users posting at the same time.
I could create a UUID with Swift and set the postId to this value.
Upon these 2 options, which route is better? Is there a preferred industry standard? Option 2 seems to be the better choice, but I am not sure. Are there any other potential alternatives? Thank you!
I would definitely stay away from option 1 - as you said the potential for a race condition is too high and it could be expensive to implement too.
A UUID would certain work and is likely to be the least painful. However, there are other options too. An atomic counter would work. A bit more complicated but you could even use a conditional write. But the logic for that would be a pain.
The advantage of the UUID is that you generate it so that it can be used for, as an example, a row of data in a child table.
How can I implement incr/decr on top of a key/value store?
I'm using a key value store that doesn't support incr and decr though which is why I want to create this. I have used Redis and Memcached incr and decr, so as mentioned in some of the answers then this is a perfect example of how I want the incr and decr to behave, so thanks to those who mentioned this.
The point of having a incr() function is it's all internal to the store. You don't have to pull data out and push it back in.
What you're doing sounds like you want to put some logic in your code that pulls the data out, increments it and pushes it back in... While it's not very hard (I think I've just described how you'd do it), it does defeat the point somewhat.
To get the benefit you'd need to change the source of your key store. Might be easy.
But a lot of caches already have this. If you really need this for speed, perhaps you should find an alternate store like memcached that does support it.
Memcache has this functionality built in
edit: it looks like you're not going to get an atomic update without updating the source, as there doesn't appear to be a lock function. If there is (and this is not pretty), you can lock the value, get it, increment it in your application, put it, and unlock it. Suboptimal though.
it kind of seems like without a compareAndSet then you are out of luck. But it will help to consider the problem from another angle. For example, if you were implementing an atomic counter that shows the number of upvotes for a question, then one way would be to have a "table" per question and to put a +1 for each upvote and -1 for each downvote. Then to "get" you would sum the "table". For this to work I assume "tables" are inexpensive and you don't care how long "get" takes to compute, you only mentioned incr/decr.
If you wish to atomically increment or decrement an int value associated with a key of e.g. type string, and if you'll know all of the keys in advance of having to perform the atomic operations on any of them, use Dictionary<string, int[]> and pre-populate the dictionary with a single-item array for each key value. It will then be possible to perform atomic operations (e.g. increment) on items via code like Threading.Interlocked.Increment(MyDict[keyString][0]);. If you need to be able to deal with keys that are not known in advance, you may need to use a ConcurrentDictionary instead of Dictionary, but you need to be careful if two threads try to simultaneously create dictionary entries for the same key.
Since increment and decrement are simple addition and subtraction operations that are "commutative", what you need to implement is a PN-Counter. It is a CRDT (commutative replicated data type). Various examples of how to implement this on Riak are available around the web and on Github.
What's the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don't have access to mongodb's new "distinct" command, either, since my driver, erlmongo, doesn't seem to implement it, yet.
Even if your driver doesn't implement distinct, you can implement it yourself. In JavaScript (sorry, I don't know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({"distinct" : "collection_name", "key" : "tags"})
So, that is: you do a findOne on the "$cmd" collection of whatever database you're using. Pass it the collection name and the key you want to run distinct on.
If you ever need a command your driver doesn't provide a helper for, you can look at http://www.mongodb.org/display/DOCS/List+of+Database+Commands for a somewhat complete list of database commands.
I know this is an old question, but I had the same issue and could not find a real solution in PHP for it.
So I came up with this:
http://snipplr.com/view/59334/list-of-keys-used-in-mongodb-collection/
John, you may find it useful to use Variety, an open source tool for analyzing a collection's schema: https://github.com/jamescropcho/variety
Perhaps you could run Variety every N hours in the background, and query the newly-created varietyResults database to retrieve a listing of unique keys which begin with a given string (i.e. are descendants of a specific parent).
Let me know if you have any questions, or need additional advice.
Good luck!
I have a set of rather complex ORM modules that inherit from Class::DBI. Since the data changes quite infrequently, I am considering using a Caching/Memoization layer on top of this to speed things up. I found a module: Class::DBI::Cacheable but no rating or any reviews on RT. I would appreciate hearing from people who have used this or any other Class::DBI caching scheme.
Thanks a ton.
I too have rolled my own ORM plenty of times I hate to say! Caching/Memoization is pretty easy if all your fetches happen through a single api (or subclasses thereof).
For any fetch based on a unique key you can just cache based on a concatenation of the keys. A naive approach might be:
my %_cache;
sub get_object_from_db {
my ($self, $table, %table_lookup_key) = #_;
# concatenate a unique key for this object
my $cache_key = join('|', map { "$_|$table_lookup_key{$_}" }
sort keys %table_lookup_key
return $_cache{$cache_key}
if exists $_cache{$cache_key};
# otherwise get the object from the db and cache it in the hash
# before returning
}
Instead of a hash, you can use the Cache:: suite of modules on CPAN to implement time and memory limits in your cache.
If you're going to cache for some time you might want to think about a way to expire objects in the cache. If for instance all your updates also go through the ORM you can clear (or update) the cache entry in your update() ORM method.
A final point to think carefully about - you're returning the same object each time which has implications. If, eg., one piece of code retrieves an object and updates a value but doesn't commit that change to the db, all other code retrieving that object will see that change. This can be very useful if you're stringing together a series of operations - they can all update the object and then you can commit it at the end - but it may not be what you intend. I usually set a flag on the object when it is fresh from the database and then in your setter method invalidate that flag if the object is updated - that way you can always check that flag if you really want a fresh object.
On a few occasions we've rolled our own, but we limited it to special cases where profiling indicated we needed a boost (for example large joins). Since our applications typically use a custom abstraction layer (akin to a home-grown ORM) on top of the DB access, that's where we implemented the caching. We achieved good results that we were satisfied with and it didn't take a whole lot of effort. Of course, since we weren't using a CPAN ORM, we didn't really have any choice about using a CPAN caching module, either.
It was strictly case-by-case and opt-in. Whether you end up using a CPAN solution or rolling your own, it's probably a good idea to restrict it to cases where profiling indicates you need help, and make sure that it's opt-in so your caching doesn't undermine your application in subtle ways by being active when you didn't expect it.
I have used memcached before to cache objects, but not with Class::DBI (ORM makes me feel dirty).
When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).
Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?
In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?
Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?
The platform uses php/python with ISAM /w MySQL.
Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.
For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.
If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.
I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.
Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.
Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.
But watch out for collisions, obviously.
With hashes you
Are free to merge the database with a similar one (or a backup), if necessary
Are not doing something that could help some guessing attacks even a bit
Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
(Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
Helping yourself in case there appears a search engine that indexes by GUID.
Please let us know if the last 6 months brought you some clarity on this question...
Hashes aren't guaranteed to be unique, nor, I believe, consistent.
will your users have to remember/use the value? or are you looking at it from a security POV?
From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.
Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.
However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.
Hashes have the advantage that you can check if they are valid or not BEFORE performing any check to your database whether they exist or not. This can help you to fend off attacks with random hashes as you don't need to burden your database with fake lookups.
Therefor, if your hash has some kind of well-defined format with for example a checksum at the end, you can check if it's correct without needing to go to the database.