Query several hash from redis efficiently - hash

I want to have some objects data in redis and I want to display all objects in a table.
in SQL I would just get the entire row for all object and display it in a view
in redis, I don't want to query each hash separately, since that will be unbearable slow.
Assuming I know the hash keys and the hash names I want to pull, Is there a way to do this effienctly?

I'm not sure why you believe querying each hash would be unbearably slow. If you loop through your hash keys and do an HMGET for each with the field names you should be good, provided you pipeline the requests.
Alternatively, you could do this in a Lua script that accepts (some of) the key names as KEYS and the fields as ARGV, returnint the answer in whatever format you need it.

Store all hash key in a set, let's called it 'hashkeyset'
Use 'sort' command to retrieve all hash values sort hashkeyset get *->field0 get *->field1 ... get *->fieldN
You can find more about 'sort' in this link http://redis.io/commands/sort

Related

DynamoDB - Most efficient way of deleting a partition?

Let's say I have a partition-key that is User:user#email.com and it has several sort-keys like Data, Sale:001, Contact:001.
Now, what if I want to delete this user?
I have thought of two possible ways using the API.
1 - Scan
First do a SCAN where partition-key=User:user#email, get the results and do a batch delete on each returned item with the respective sort-key.
2 - Query
For this I would first need to change all sort keys to have a common prefix, for example User|Data, User|Sale:001, User|Contact:001, and then do a query where
partition-key=User:user#email.com and sort_key.begins_with(User)
after getting the results I would then do a batch delete just like the scan option.
It isn't clear to me which option is the best because I'm not sure if the Scan has the "intelligence" to only scan inside that specific partition or it would scan every record in the table. Because in DynamoDB you pay for each kb of items that was "searched"
Because if it is intelligent then I think it would cost the same as the query option without needing to add a prefix to my sort keys.
Scan() doesn't support partition-key=User:user#email except as a filter expression.
So yes, the whole table would be read. Only the records that match would actually be returned.
Query() on the other hand requires partition-key=user:user#email as a key condition expression. You don't need to make any changes to your sort key design; as including a key condition for the sort key is optional.
The partition key equality test is required, and must be specified in
the following format:
partitionKeyName = :partitionkeyval
If you also want to provide a condition for the sort key, it must be
combined using AND with the condition for the sort key. Following is
an example, using the = comparison operator for the sort key:
partitionKeyName = :partitionkeyval AND sortKeyName = :sortkeyval

PostgreSql Dynamic JSON Indexing

I am new to PostgreSql world. We chose this DB so that we could query our JSON results for filter queries like contains, less than , greater than, etc. JSON results are dynamic and we cannot know in advance what keys will be generated as the output. Table (result_id (int64),jsondata(jsonb)) data looks like this
id1,{k1:vab,k2:abc,k3:def}
id1,{k1:abv,k2:v7,k3:ghu}
id1,{k1:v5,k2:vdd,k3:vew}
id1,{k1:v6,k2:v9s,k3:ved}
id2,{k4:vw,k5:vds,k6:vdss}
id2,{k4:v1,k5:fgg,k6:dd}
id2,{k4:qw,k5:gfd,k6:ess}
id2,{k4:er,k5:dfs,k6:fss}
My queries would be something like
Select * from table where result_id = 'id1' and jsondata->'k1' contains 'ab'
My script outputs a json content that I store in this table.
Each json key is represented in a Grid column and json key's values are column data.Grid offers filtering capabilities, which means filtering on JSON data.
My problem is that the filtering can happen on any JSON key, but key names are not static. Keys (json output) might change when the script content is changed So previously indexed keys would become irrelevant. But if the script is not changed the keys remain constant.
How do I apply indexing so that my JSON filter operations become faster? The same table contains many keys within the same JSON row and across rows. Wouldn't it be inefficient to index all keys so that filtering can be made efficient?

Redis HMSET Documentation: What does it mean by 'hash'?

Redis HMSET command documentation describes it as:
"Sets the specified fields to their respective values in the hash stored at key. This command overwrites any existing fields in the hash. If key does not exist, a new key holding a hash is created."
What does the word 'hash' mean in this case? Does it mean a hash table? Or, hash code computed for the given the field,value pairs? I would like to think it means the former, i.e., a hash table, but I would still like to clarify as the documentation is not explicit?
Hash refers to the Redis Hash Data-Type:
Redis Hashes are maps between string fields and string values, so they
are the perfect data type to represent objects (e.g. A User with a
number of fields like name, surname, age, and so forth)

How to query Cassandra by date range

I have a Cassandra ColumnFamily (0.6.4) that will have new entries from users. I'd like to query Cassandra for those new entries so that I can process that data in another system.
My sense was that I could use a TimeUUIDType as the key for my entry, and then query on a KeyRange that starts either with "" as the startKey, or whatever the lastStartKey was. Is this the correct method?
How does get_range_slice actually create a range? Doesn't it have to know the data type of the key? There's no declaration of the data type of the key anywhere. In the storage_conf.xml file, you declare the type of the columns, but not of the keys. Is the key assumed to be of the same type as the columns? Or does it do some magic sniffing to guess?
I've also seen reference implementations where people store TimeUUIDType in columns. However, this seems to have scale issues as this particular key would then become "hot" since every change would have to update it.
Any pointers in this case would be appreciated.
When sorting data only the column-keys are important. The data stored is of no consequence neither is the auto-generated timestamp. The CompareWith attribute is important here. If you set CompareWith as UTF8Type then the keys will be interpreted as UTF8Types. If you set the CompareWith as TimeUUIDType then the keys are automatically interpreted as timestamps. You do not have to specify the data type. Look at the SlicePredicate and SliceRange definitions on this page http://wiki.apache.org/cassandra/API This is a good place to start. Also, you might find this article useful http://www.sodeso.nl/?p=80 In the third part or so he talks about slice ranging his queries and so on.
Doug,
Writing to a single column family can sometimes create a hot spot if you are using an Order-Preserving Partitioner, but not if you are using the default Random Partitioner (unless a subset of users create vastly more data than all other users!).
If you sorted your rows by time (using an Order-Preserving Partitioner) then you are probably even more likely to create hotspots, since you will be adding rows sequentially and a single node will be responsible for each range of the keyspace.
Columns and Keys can be of any type, since the row key is just the first column.
Virtually, the cluster is a circular hash key ring, and keys get hashed by the partitioner to get distributed around the cluster.
Beware of using dates as row keys however, since even the randomization of the default randompartitioner is limited and you could end up cluttering your data.
What's more, if that date is changing, you would have to delete the previous row since you can only do inserts in C*.
Here is what we know :
A slice range is a range of columns in a row with a start value and an end value, this is used mostly for wide rows as columns are ordered. Known column names defined in the CF are indexed however so they can be retrieved specifying names.
A key slice, is a key associated with the sliced column range as returned by Cassandra
The equivalent of a where clause uses secondary indexes, you may use inequality operators there, however there must be at least ONE equals clause in your statement (also see https://issues.apache.org/jira/browse/CASSANDRA-1599).
Using a key range is ineffective with a Random Partitionner as the MD5 hash of your key doesn't keep lexical ordering.
What you want to use is a Column Family based index using a Wide Row :
CompositeType(TimeUUID | UserID)
In order for this not to become hot, add a first meaningful key ("shard key") that would split the data accross nodes such as the user type or the region.
Having more data than necessary in Cassandra is not a problem, it's how it is designed, so what you must ask yourself is "what do I need to query" and then design a Column Family for it rather than trying to fit everything in one CF like you'd do in an RDBMS.

What's a real world example of something you would represent with a hash?

I'm just trying to get a grip on when you would need to use a hash and when it might be better to use an array. What kind of real-world object would a hash represent, say, in the case of strings?
I believe sometimes a hash is referred to as a "dictionary", and I think that's a good example in itself. If you want to look up the definition of a word, it's nice to just do something like:
definition['pernicious']
Instead of trying to figure out the correct numeric index that the definition would be stored at.
This answer assumes that by "hash" you're basically just referring to an associative array.
I think you're looking at things in the wrong direction. It is not the object which determines if you should use a hash but the manner in which you are accessing it. A common use of a hash is when using a lookup table. If your objects are strings and you want to check if they exist in a Dictionary, looking them up will (assuming the hash works properly) by O(1). WIth sorting, the time would instead be O(logn), which may not be acceptable.
Thus, hashes are ideal for use with Dictionaries (hashmaps), sets (hashsets), etc.
They are also a useful way of representing an object without storing the object itself (for passwords).
The phone book - key = name, value = phone number.
I also think of the old World Book Encyclopedias (actual books). Each article is "hashed" into a single book (cat goes in the "C" volume).
Any time you have data that is well served by a 1-to-1 map.
For example, grades in a class:
"John Smith" => "B+"
"Jacob Jenkens" => "C"
etc
In general hashes are used to find things fast - a hash map can be used to assosiate one thing with another fast, a hash set will just store things "fast".
Please consider also the hash function complexity and cost when considering whether it's better to use a hash container or a normal less then container - the additional size of the hash value and the time needed to compute a "perfect" hash, and the time needed to make a 1:1 comparision at the end in case of a hash function conflict may in fact be a lot higher then just going through a tree structure with logharitmic complexity using the less then operators.
When you need to associate one variable with another. There isn't a "type limit" to what can be a key/value in a hash.
Hashed have many uses. Aside from cryptographic uses, they are commonly used for quick lookups of information. To get similarly quick lookups using an array you would need to keep the array sorted and then used a binary search. With a hash you get the fast lookup without having to sort. This is the reason most scripting languages implement hashing under one name or another (dictionaries, et al).
I use one often for a "dictionary" of settings for my app.
Setting | Value
I load them from the database or config file, into hashtable for use by my app.
Works well, and is simple.
One example could be zip code associated with an area, city or any postal address.
A good example is a cache with lot's of elements in it. You have some identifer by which you want to look up the a value (say an URL, and you want to find the according cached webpage). You want these lookups to be as fast as possible and don't want to search through all the stored pages everytime some URL is requested. A hash table is a great data structure for a problem like this.
One real world example I just wrote is when I was adding up the amount people spent on meals when filing expense reports.I needed to get a daily total with no idea how many items would exist on a particular day and no idea what the date range for the expense report would be. There are restrictions on how much a person can expense with many variables (What city, weekend, etc...)
The hash table was the perfect tool to handle this. The key was the date the value was the receipt amount (converted to USD). The receipts could come in in any order, i just keep getting the value for that date and adding to it until the job was done. Displaying was easy as well.
(php code)
$david = new stdclass();
$david->name = "david";
$david->age = 12;
$david->id = 1;
$david->title = "manager";
$joe = new stdclass();
$joe->name = "joe";
$joe->age = 17;
$joe->id = 2;
$joe->title = "employee";
// option 1: lets put users by index
$users[] = $david;
$users[] = $joe;
// option 2: lets put users by title
$users[$david->title] = $david;
$users[$joe->title] = $joe;
now the question: who is the manager?
answer:
$users["manager"]