Redis, find by hash, find by field value

Redis, find by hash, find by field value - hash

I want to create hashes with books info in redis, for instance:
HMSET books key "83-7197-669-0" title "Access. DB desing" price 79.0 publisher "Helion" year 2002
HMSET books key "83-7197-786-7" title "Access XP" price 65.0 publisher "Helion" year 2003
Then I just want to find a book with a key 83-7197-669-0, I tried:
1) HGET books key "83-7197-669-0"
(error) ERR wrong number of arguments for 'hget' command
2) HGETALL books
1) "key"
2) "83-7197-786-7"
3) "title"
4) "Access XP"
5) "price"
6) "65.0"
7) "publisher"
8) "Helion"
9) "year"
10) "2003"
I dont know why, but I see here only the second book ..
Next, I wanted to find a book with a given price, with no success. I dont even know what to try. Any ideas?

HSET in redis is like a MAP of MAP, where the books in your example is the key of outer map, where as the key, title, price, and other fields are like keys of inner map.
To get book by key, you can set key as HSET name which and use HGET key to get the book by key.
HMSET books:83-7197-669-0 title "Access. DB desing" price 79.0 publisher "Helion" year 2002
HGETALL books:83-7197-669-0
should return all the fields of the HSET. I hope this gives you good start to redis Hashes.

As you probably see, you are having all those attributes like "key" and "title" in your Redis hash.
Solution is simple, just omit the key and store rest of the data in serialized form. Redis is basically key-value store on steroids, so turn the first command to
HMSET "83-7197-669-0" title <title> price <price> ...
Depending on your use case, you could also use hset, store everything in books and encode values as a single serialized entry.
If you are looking way to access Redis through multiple keys, you are likely using a wrong tool. Relational data stores are better for that; Redis is still high-speed key-value store with some nice extra stuff baked in.

Related

How can I use key/value dashboard variables in Grafana + InfluxDB?

I’m trying to suss out how to format my key/value pair dashboard variable. I’ve got a variable whose definitions are:
sensor_list = 4431,8298,11041,13781
sensor_kv = 4431 : Storage,8298 : Stairs,11041 : Closet,13781 : Attic
However, I can't seem to use it effectively for queries and dashboard formatting with InfluxDB. For example, I've got a panel whose query is this:
SELECT last("battery_ok") FROM "autogen"."Acurite-Tower" WHERE ("id" =~ /^$sensor_list$/) AND $timeFilter GROUP BY time($__interval) fill(null)
That works, but if I replace it with the KV, I can't get the value:
SELECT last("battery_ok") FROM "autogen"."Acurite-Tower" WHERE ("id" =~ /^$sensor_kv$/) AND $timeFilter GROUP BY time($__interval) fill(null)
^ that comes back with no data.
I'm also at a loss as to how to access the value of the KV pair in, say, the template values for a repeating panel. ${sensor_kv:text} returns the word "All" but ${sensor_kv:value} actually causes a straight up error: "Error: Variable format value not found"
My goal here is twofold:
To use the key side of the kv map as the ID to query from in the DB
To use the value side as the label of the stat panel and also as the alias of the measurement if I'm querying in a graph
I’ve read the formatting docs and all they mention are lists; there are no key/value examples on there, and certainly none that do this. It’s clearly a new-ish feature (here is the GH issue where its implementation is merged) so I’m hoping there’s just a doc miss somewhere.

In PR that you linked there is a tiny comment that key/value pair has to contain spaces.
So when you're defining a pairs in Values separated by comma it should be like
key1 : value1, key2 : value2
These will not work
key1:value1, key2:value2
key1 :value1, key2 :value2
key1: value1, key2: value2
Let's say that name of the custom variable is var1
Then you can access the key by ${var1} ,$var1, ${var1:text} or [[var1:text]]
(some datasources will be satisfied with $var1 - some will understand only ${var1:text})
And you can access the value by ${var1:value} [[var1:value]]
Tested in Grafana 8.4.7

I realise this might not be all the information you're after, but hope it will be useful. I came across this question when trying to implement something similar myself (also using InfluxDB), and I have managed to access both keys and values in a query
My query looks like this:
SELECT
"Foo.${VariableName:text}.Bar.${VariableName:value}"
FROM "db"
WHERE (filters, filters) AND $timeFilter GROUP BY "bas"
So as you see, my use case was a bit different from what you're trying to achieve, but it demonstrates that it's basically possible to access both the key and the value in a query.

Key/values are working with some timeseries DB where it makes sense, e.g. MySQL https://grafana.com/docs/grafana/latest/datasources/mysql/:
Another option is a query that can create a key/value variable. The query should return two columns that are named __text and __value. The __text column value should be unique (if it is not unique then the first value is used). The options in the dropdown will have a text and value that allows you to have a friendly name as text and an id as the value.
But that's not a case for InfluxDB: https://grafana.com/docs/grafana/latest/datasources/influxdb/ InfluxDB can't return key=>value result - it returns only timeseries (that's not a key=>value) or only values or only keys.
Workarounds:
1.) Use supported DB (MySQL, PostgreSQL) just to have correct key=>value results. You really don't need to create table for that, just combination of SELECT, UNION, ... and you will get desired result.
2.) Use hidden variable which will be "translating" value to key, which will be used then in the query. E.g. https://community.grafana.com/t/how-to-alias-a-template-variable-value/10929/3
Of course everything has pros and cons, for example multi value variable values may not work as expecting.

How to use Berkeley DB's non-SQL, Key/Value API to implement fuzzy query (LIKE key word)

I can understand this Blog, but it seems unable to apply in such case that using Berkeley DB's non-SQL, Key/Value API to implement "SELECT * FROM table WHERE name LIKE '%abc%'"
Table structure
-------------------------------------------
key data(name)
-------------------------------------------
0 abc
1 abcd
2 you
3 spring
. sabcd
. timeab
.
I guess iterating all records is not an efficient way, but it really do a trick.

You're correct. Absent any other tables, you'd have to scan all the entries and test each data item. In many cases, it's as simple as this.
If you're using SQL LIKE, I doubt you'll be able to do better unless your data items have a well-defined structure.
However, if the "WHERE name LIKE %abc%" query you have is really WHERE name="abc", then you might choose to take a performance penalty on your db_put call to create a reverse index, in addition to your primary table:
-------------------------------------------
key(name) data(index)
-------------------------------------------
abc 0
abcd 1
sabcd 4
spring 3
timeab 5
you 2
This table, sorted in alphabetical order, requires a lexical key comparison function, and uses support for duplicate keys in BDB. Now, to find the key for your entry, you could simply do a db_get ("abc"), or better, open a cursor with DB_SETRANGE on "abc".
Depending on the kinds of LIKE queries you need to do, you may be able to use the reverse index technique to narrow the search space.

How to "EXPIRE" the "HSET" child key in redis?

I need to expire all keys in redis hash, which are older than 1 month.

This is not possible, for the sake of keeping Redis simple.
Quoth Antirez, creator of Redis:
Hi, it is not possible, either use a different top-level key for that
specific field, or store along with the filed another field with an
expire time, fetch both, and let the application understand if it is
still valid or not based on current time.

Redis does not support having TTL on hashes other than the top key, which would expire the whole hash. If you are using a sharded cluster, there is another approach you could use. This approach could not be useful in all scenarios and the performance characteristics might differ from the expected ones. Still worth mentioning:
When having a hash, the structure basically looks like:
hash_top_key
- child_key_1 -> some_value
- child_key_2 -> some_value
...
- child_key_n -> some_value
Since we want to add TTL to the child keys, we can move them to top keys. The main point is that the key now should be a combination of hash_top_key and child key:
{hash_top_key}child_key_1 -> some_value
{hash_top_key}child_key_2 -> some_value
...
{hash_top_key}child_key_n -> some_value
We are using the {} notation on purpose. This allows all those keys to fall in the same hash slot. You can read more about it here: https://redis.io/topics/cluster-tutorial
Now if we want to do the same operation of hashes, we could do:
HDEL hash_top_key child_key_1 => DEL {hash_top_key}child_key_1
HGET hash_top_key child_key_1 => GET {hash_top_key}child_key_1
HSET hash_top_key child_key_1 some_value => SET {hash_top_key}child_key_1 some_value [some_TTL]
HGETALL hash_top_key =>
keyslot = CLUSTER KEYSLOT {hash_top_key}
keys = CLUSTER GETKEYSINSLOT keyslot n
MGET keys
The interesting one here is HGETALL. First we get the hash slot for all our children keys. Then we get the keys for that particular hash slot and finally we retrieve the values. We need to be careful here since there could be more than n keys for that hash slot and also there could be keys that we are not interested in but they have the same hash slot. We could actually write a Lua script to do those steps in the server by executing an EVAL or EVALSHA command. Again, you need to take into consideration the performance of this approach for your particular scenario.
Some more references:
https://redis.io/commands/cluster-keyslot
https://redis.io/commands/cluster-getkeysinslot
https://redis.io/commands/eval

This is possible in KeyDB which is a Fork of Redis. Because it's a Fork its fully compatible with Redis and works as a drop in replacement.
Just use the EXPIREMEMBER command. It works with sets, hashes, and sorted sets.
EXPIREMEMBER keyname subkey [time]
You can also use TTL and PTTL to see the expiration
TTL keyname subkey
More documentation is available here: https://docs.keydb.dev/docs/commands/#expiremember

You can use Sorted Set in redis to get a TTL container with timestamp as score.
For example, whenever you insert a event string into the set you can set its score to the event time.
Thus you can get data of any time window by calling
zrangebyscore "your set name" min-time max-time
Moreover, we can do expire by using zremrangebyscore "your set name" min-time max-time to remove old events.
The only drawback here is you have to do housekeeping from an outsider process to maintain the size of the set.

Elon Musk will soon send people to the moon and we still cannot expire fields on redis :(
Anyway the solution I've been come up with is:
Lets say I want to expire every 3 minutes:
So im holding the data in 3 fields 0 1 2.
and then i do module% 3 to current time in minutes.
if the module for example == 0
so im using only 1 2 and 0 i delete;
then it change to 1 so im using 2 and 0 and delete 1.
Im not using it and i didnt checked it but im just let you know its possible

There is a Redisson java framework which implements hash Map object with entry TTL support. It uses hmap and zset Redis objects under the hood. Usage example:
RMapCache<Integer, String> map = redisson.getMapCache('map');
map.put(1, 30, TimeUnit.DAYS); // this entry expires in 30 days
This approach is quite useful.

We had the same problem discussed here.
We have a Redis hash, a key to hash entries (name/value pairs), and we needed to hold individual expiration times on each hash entry.
We implemented this by adding n bytes of prefix data containing encoded expiration information when we write the hash entry values, we also set the key to expire at the time contained in the value being written.
Then, on read, we decode the prefix and check for expiration. This is additional overhead, however, the reads are still O(n) and the entire key will expire when the last hash entry has expired.

Regarding a NodeJS implementation, I have added a custom expiryTime field in the object I save in the HASH. Then after a specific period time, I clear the expired HASH entries by using the following code:
client.hgetall(HASH_NAME, function(err, reply) {
if (reply) {
Object.keys(reply).forEach(key => {
if (reply[key] && JSON.parse(reply[key]).expiryTime < (new Date).getTime()) {
client.hdel(HASH_NAME, key);
}
})
}
});

If your use-case is that you're caching values in Redis and are tolerant of stale values but would like to refresh them occasionally so that they don't get too stale, a hacky workaround is to just include a timestamp in the field value and handle expirations in whatever place you're accessing the value.
This allows you to keep using Redis hashes normally without needing to worry about any complications that might arise from the other approaches. The only cost is a bit of extra logic and parsing on the client end. Not a perfect solution, but it's what I typically do as I haven't needed TTL for any other reason and I'm usually needing to do extra parsing on the cached value anyways.
So basically it'll be something like this:
In Redis:
hash_name
- field_1: "2021-01-15;123"
- field_2: "2021-01-20;125"
- field_2: "2021-02-01;127"
Your (pseudo)code:
val = redis.hget(hash_name, field_1)
timestamp = val.substring(0, val.index_of(";"))
if now() > timestamp:
new_val = get_updated_value()
new_timestamp = now() + EXPIRY_LENGTH
redis.hset(hash_name, field_1, new_timestamp + ";" + new_val)
val = new_val
else:
val = val.substring(val.index_of(";"))
// proceed to use val
The biggest caveat imo is that you don't ever remove fields so the hash can grow quite large. Not sure there's an elegant solution for that - I usually just delete the hash every once in a while if it feels too big. Maybe you could keep track of everything you've stored somewhere and remove them periodically (though at that point, you might as well just be using that mechanism to expire the fields manually...).

You could store key/values in Redis differently to achieve this, by just adding a prefix or namespace to your keys when you store them e.g. "hset_"
Get a key/value GET hset_key equals to HGET hset key
Add a key/value SET hset_key value equals to HSET hset key
Get all keys KEYS hset_* equals to HGETALL hset
Get all vals should be done in 2 ops, first get all keys KEYS hset_* then get the value for each key
Add a key/value with TTL or expire which is the topic of question:
SET hset_key value
EXPIRE hset_key
Note: KEYS will lookup up for matching the key in the whole database which may affect on performance especially if you have big database.
Note:
KEYS will lookup up for matching the key in the whole database which may affect on performance especially if you have big database. while SCAN 0 MATCH hset_* might be better as long as it doesn't block the server but still performance is an issue in case of big database.
You may create a new database for storing separately these keys that you want to expire especially if they are small set of keys.
Thanks to #DanFarrell who highlighted the performance issue related to
KEYS

You can. Here is an example.
redis 127.0.0.1:6379> hset key f1 1
(integer) 1
redis 127.0.0.1:6379> hset key f2 2
(integer) 1
redis 127.0.0.1:6379> hvals key
1) "1"
2) "1"
3) "2"
redis 127.0.0.1:6379> expire key 10
(integer) 1
redis 127.0.0.1:6379> hvals key
1) "1"
2) "1"
3) "2"
redis 127.0.0.1:6379> hvals key
1) "1"
2) "1"
3) "2"
redis 127.0.0.1:6379> hvals key
Use EXPIRE or EXPIREAT command.
If you want to expire specific keys in the hash older then 1 month. This is not possible.
Redis expire command is for all keys in the hash.
If you set daily hash key, you can set a keys time to live.
hset key-20140325 f1 1
expire key-20140325 100
hset key-20140325 f1 2

You could use the Redis Keyspace Notifications by using psubscribe and "__keyevent#<DB-INDEX>__:expired".
With that, each time that a key will expire, you will get a message published on your redis connection.
Regarding your question basically you create a temporary "normal" key using set with an expiration time in s/ms. It should match the name of the key that you wish to delete in your set.
As your temporary key will be published to your redis connection holding the "__keyevent#0__:expired" when it expired, you can easily delete your key from your original set as the message will have the name of the key.
A simple example in practice on that page : https://medium.com/#micah1powell/using-redis-keyspace-notifications-for-a-reminder-service-with-node-c05047befec3
doc : https://redis.io/topics/notifications ( look for the flag xE)

static async setCount(ip: string, count: number) {
const val = await redisClient.hSet(ip, 'ipHashField', count)
await redisClient.expire(ip, this.expireTime)
}
Try expire your key.

Are there any alternatives to HBASE in particular with regards to key range scans?

The most relevant feature that I appreciate in HBASE is the key range scan, where you can combine your keys under a higher level key with lower level ones, which allows you to obtain a hierarchy of data related to the higher level keys.
For example:
CUSTOMER ID = C100
DEPARTMENT ID = D100
USER ID = U100
The key for the above example would be
C100D100U100K01 : "my data for k01"
C100D100U100K02 : "my data for k02"
C100D100U100K03 : "my data for k03"
...
With the above, you would be able to fetch all of the data related to your customer ID by performing a range scan on C100* or if more details where needed, by department such as C100D100U100*, and so on.
Are there any alternatives to HBASE with this regard in the NOSQL spectrum of solutions ?

Any hierarchical key-value store would work. There's a (short) list on Wikipedia : Hierarchical key-value store.
The one I know best is GT.M, where your sample data could look like this :
customer("C100","D100","U100","K01")="my data for k01"
customer("C100","D100","U100","K02")="my data for k02"
customer("C100","D100","U100","K03")="my data for k03"
So customer("C100") would gives you access to all the data of a single customer, customer("C100","D100") would gives you access to all the data for a single department for a single customer, etc.

Couchbase has similar functionality if you use views (an index). You can create a view on all the keys, and do range queries over them. As far as I know, you can only wildcard over the end of a key but not the beginning, e.g.:
AAABBBCCCDDD* // yes
*BBBCCCDDDEEE // no
AAA*CCCDDDEEE // no
This is because it sorts the keys, and when you query you're getting a sub-range. However, you can get around this by creating views that sort by a different order.
More info: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views.html

Riak has secondary indexes that would allow querying data by matching the index or by range scan. The results from secondary indexes can be used as an input for Riak's MapReduce. Check this for more details: riak secondary indexes

hbase rowkey design

I am moving from mysql to hbase due to increasing data.
I am designing rowkey for efficient access pattern.
I want to achieve 3 goals.
Get all results of email address
Get all results of email address + item_type
Get all results of particular email address + item_id
I have 4 attributes to choose from
user email
reverse timestamp
item_type
item_id
What should my rowkey look like to get rows efficiently?
Thanks

Assuming your main access is by email you can have your main table key as
email + reverse time + item_id (assuming item_id gives you uniqueness)
You can have an additional "index" table with email+item_type+reverse time+item_id and email+item_id as keys that maps to the first table (so retrieving by these is a two step process)

Maybe you are already headed in the right direction as far as concatenated row keys: in any case following comes to mind from your post:
Partitioning key likely consists of your reverse timestamp plus the most frequently queried natural key - would that be the email? Let us suppose so: then choose to make the prefix based on which of the two (reverse timestamp vs email) provides most balanced / non-skewed distribution of your data. That makes your region servers happier.
Choose based on better balanced distribution of records:
reverse timestamp plus most frequently queried natural key
e.g. reversetimestamp-email
or email-reversetimestamp
In that manner you will avoid hot spotting on your region servers.
.
To obtain good performance on the additional (secondary ) indexes, that is not "baked into" hbase yet: they have a design doc for it (look under SecondaryIndexing in the wiki).
But you can build your own a couple of ways:
a) use coprocessor to write the item_type as rowkey to separate tabole with a column containing the original (user_email-reverse timestamp (or vice-versa) fact table rowke
b) if disk space not issue and/or the rows are small, just go ahead and duplicate the entire row in the second (and third for the item-id case) tables.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Redis, find by hash, find by field value - hash

Related

How can I use key/value dashboard variables in Grafana + InfluxDB?

How to use Berkeley DB's non-SQL, Key/Value API to implement fuzzy query (LIKE key word)

How to "EXPIRE" the "HSET" child key in redis?

Are there any alternatives to HBASE in particular with regards to key range scans?

hbase rowkey design

Categories

Resources