Why can't `blake2_256` prevent the "first key pair" in a StorageDoubleMap from being compromised when using decl_storage? - hash

decl_storage! is a "procedural macro" used for storing data to make it available in subsequent blocks.
It says if the user is able to set the first key pair in the double_map, then we cannot trust that key pair, and so we must use a cryptographic hasher such as blake2_256 to prevent "other values of all storage items being compromised".
Then it goes on to say that if the user is able to set the second key pair in the double_map, then we cannot trust that key pair, and so we must use a cryptographic hasher such as blake2_256 to prevent "other items in storage with the same first key being compromised".
With regard to the first key pair, why does it say that it's just to prevent "other values of all storage items being compromised"? Isn't blake2_256 also used to prevent the first key pair itself from being compromised (rather than just "other values")?

Let's say the hash of module1.someValue is 0x12345678
hash of module2.doubleMapValue.firstKey(value1) is 0x1234
hash of module2.doubleMapValue.secondKey(value2) is 0x5678
This means module2.doubleMapValue.fullKey(value1, value2) and module1.someValue have same hash. i.e. the values are stored in the same place.
If a user is able to control both keys of module2.doubleMapValue and figure out the value of value1 and value2, then they will be able to override the value of module1.someValue and cause security issues.
That's why the hash function of key1 of double map needs to be a cryptographic hasher if the value is controlled by a user. Otherwise a user may be able to craft value1 such it collides with the storage of all other modules, and hence compromise them.
In case a user does not control key2, double map provides a clear all keys with hash(key1) prefix feature that could be hijacked to cause troubles as well.

Related

Key, Value, Hash and Hash function for HashTable

I'm having trouble understanding what the Hash Function does and doesn't do, as well as what exactly a Bucket is.
From my understanding:
A HashTable is a data structure that maps keys to values using a Hash Function.
A HashFunction is meant to map data from an array of arbitrary/unknown size to a data array of fixed size.
There can be duplicate Values in the original data array, but this is irrelevant.
Each Value will have a unique Key. Thus, each Key has exactly 1 Value.
The HashFunction will generate a HashCode for each (Value, Key) pair. However, Collisions can occur in which multiple (Value, Key) pairs map to the same HashCode.
This can be remedied by using either Chaining/Open Addressing methods.
The HashCode is the index value indicating the position of a particular entry from the original data array within the Bucket array.
The Bucket array is the fixed data array constructed that will contain the entries from the original array.
My questions:
How are the Keys generated for each value? Is the HashFunction meant to generate both Key and HashCode values for each entry? Does each Bucket thus contain only one entry (assuming a Chaining implementation to remedy Collision)?
How are the Keys generated for each value?
Key is not generated, it is provided by you and serves as an input to the hash function which in turn converts that key into index of hash table. Simply speaking:
H(key)=index
so the value you are looking for is:
hash_table[index] = value
Is the HashFunction meant to generate HashCode values for each entry?
It all depends on the implementation of hash function and hash table. Some hash functions might generate a hashcode out of provided key and then for example take its modulo(size) where size is the size of hash table, in order to get the index. Others might convert the key directly into index. In either case the ultimate goal of hash function is to find the location of searched data within hash table in constant time.
Does each Bucket thus contain only one entry (assuming a Chaining implementation to remedy Collision)?
Ideally each key should be mapped to a unique index but mostly that's not the case since the number of buckets (i.e. indices) is far smaller than the number of keys so the average length of a chain per bucket (i.e. number of collisions per bucket) is no.of keys/no.of indices

How are same hash vs same key handled?

This question is not specific to any programming language, I am more interested in a generic logic.
Generally, associative maps take a key and map it to a value. As far as I know, implementations require the keys to be unique otherwise values get overwritten. Alright.
So let us assume that the above is done by some hash implementation.
What if two DIFFERENT keys get the same hash value? I am thinking of this in the form of an underlying array whose indices are in a result of hash on said keys. It could be possible that more than one unique key gets mapped to the same value yes? If so, how does such an implementation handle this?
How is handling same hash different from handling same key? Since same key results in overwriting and same hash HAS to retain the value.
I understand hashing with collision, so I know chaining and probing. Do implementations iterate over the current values which are hashed to a particular index and determine if the key is the same?
While I was searching for the answer I came across these links:
1. What happens when a duplicate key is put into a HashMap?
2. HashMap with multiple values under the same key
They don't answer my question however. How do we distinguish between same hash vs same key?
By comparing the keys. If you look at object-oriented implementations of hash maps, you'll find that they usually require two methods to be implemented on the key type:
bool equal(Key key1, Key key2);
int hash(Key key);
If only the hash function can be given and no equality function, that restricts the hash map to be based on the language's default equality. This is not always desirable as sometimes keys need to be compared with a different equality function. For example, if the keys are strings, an application may need to do a case-insensitive comparison, and then it would pass a hash function that converts to lowercase before hashing, and an equal function that ignores case.
The hash map stores the key alongside each corresponding value. (Usually, that's a pointer to the key object that was originally stored.) Any lookup into the hash map has to make a key comparison after finding a matching hash, to verify that the key actually matches.
For example, for a very simple hash map that stores a list in each bucket, the list would be a list of (key, value) pairs, and any lookup compares the keys of each list entry until it finds a match. In pseudocode:
Array<List<Pair<Key, Value>>> buckets;
Value lookup(Key k_sought) {
int h = hash(k_sought);
List<Pair<Key, Value>> bucket = buckets[h];
for (kv in bucket) {
Key k_found = kv.0;
Value v_found = kv.1;
if (equal(k_sought, k_found)) {
return v_found;
}
}
throw Not_found;
}
You can not tell what a key is from the index, so no you can not iterate over the values to find any information about the keys. You will either have to guarantee 0 collisions or store the information that was hashed to give the index.
If you only have values stored in your structure, there is no way to tell if they have the same key or just the same hash. You will need to store the key along with the value to know.

Redis HMSET Documentation: What does it mean by 'hash'?

Redis HMSET command documentation describes it as:
"Sets the specified fields to their respective values in the hash stored at key. This command overwrites any existing fields in the hash. If key does not exist, a new key holding a hash is created."
What does the word 'hash' mean in this case? Does it mean a hash table? Or, hash code computed for the given the field,value pairs? I would like to think it means the former, i.e., a hash table, but I would still like to clarify as the documentation is not explicit?
Hash refers to the Redis Hash Data-Type:
Redis Hashes are maps between string fields and string values, so they
are the perfect data type to represent objects (e.g. A User with a
number of fields like name, surname, age, and so forth)

Cuckoo Hashing: What is the best way to detect collisions in hash functions?

I implemented a hashmap based on cuckoo hashing.
My hash functions take values of any length and return keys of type long. To match the keys to my array size n, I do key % n.
I'm thinking about following scenario:
Insert value A with key A.key into location A.key % n
Find value B with key A.key
So for this example I get the entry for value A and it is not recognized that value B hasn't even been inserted. This happens if my hash function returns the same key for two different values. Collisions with different keys but same locations are no problem.
What is the best way to detect those collisions?
Do I have to check every time I insert or search an item if the original values are equal?
As with most hashing schemes, in cuckoo hashing, the hash code tells you where to look in the table for the element in question, but the expectation is that you store both the key and the value in the table so that before returning the stored value, you first check the key stored at that slot against the key you're looking for. That way, if you get the same hash code for two objects, you can determine which object was stored at that slot.

How to "EXPIRE" the "HSET" child key in redis?

I need to expire all keys in redis hash, which are older than 1 month.
This is not possible, for the sake of keeping Redis simple.
Quoth Antirez, creator of Redis:
Hi, it is not possible, either use a different top-level key for that
specific field, or store along with the filed another field with an
expire time, fetch both, and let the application understand if it is
still valid or not based on current time.
Redis does not support having TTL on hashes other than the top key, which would expire the whole hash. If you are using a sharded cluster, there is another approach you could use. This approach could not be useful in all scenarios and the performance characteristics might differ from the expected ones. Still worth mentioning:
When having a hash, the structure basically looks like:
hash_top_key
- child_key_1 -> some_value
- child_key_2 -> some_value
...
- child_key_n -> some_value
Since we want to add TTL to the child keys, we can move them to top keys. The main point is that the key now should be a combination of hash_top_key and child key:
{hash_top_key}child_key_1 -> some_value
{hash_top_key}child_key_2 -> some_value
...
{hash_top_key}child_key_n -> some_value
We are using the {} notation on purpose. This allows all those keys to fall in the same hash slot. You can read more about it here: https://redis.io/topics/cluster-tutorial
Now if we want to do the same operation of hashes, we could do:
HDEL hash_top_key child_key_1 => DEL {hash_top_key}child_key_1
HGET hash_top_key child_key_1 => GET {hash_top_key}child_key_1
HSET hash_top_key child_key_1 some_value => SET {hash_top_key}child_key_1 some_value [some_TTL]
HGETALL hash_top_key =>
keyslot = CLUSTER KEYSLOT {hash_top_key}
keys = CLUSTER GETKEYSINSLOT keyslot n
MGET keys
The interesting one here is HGETALL. First we get the hash slot for all our children keys. Then we get the keys for that particular hash slot and finally we retrieve the values. We need to be careful here since there could be more than n keys for that hash slot and also there could be keys that we are not interested in but they have the same hash slot. We could actually write a Lua script to do those steps in the server by executing an EVAL or EVALSHA command. Again, you need to take into consideration the performance of this approach for your particular scenario.
Some more references:
https://redis.io/commands/cluster-keyslot
https://redis.io/commands/cluster-getkeysinslot
https://redis.io/commands/eval
This is possible in KeyDB which is a Fork of Redis. Because it's a Fork its fully compatible with Redis and works as a drop in replacement.
Just use the EXPIREMEMBER command. It works with sets, hashes, and sorted sets.
EXPIREMEMBER keyname subkey [time]
You can also use TTL and PTTL to see the expiration
TTL keyname subkey
More documentation is available here: https://docs.keydb.dev/docs/commands/#expiremember
You can use Sorted Set in redis to get a TTL container with timestamp as score.
For example, whenever you insert a event string into the set you can set its score to the event time.
Thus you can get data of any time window by calling
zrangebyscore "your set name" min-time max-time
Moreover, we can do expire by using zremrangebyscore "your set name" min-time max-time to remove old events.
The only drawback here is you have to do housekeeping from an outsider process to maintain the size of the set.
Elon Musk will soon send people to the moon and we still cannot expire fields on redis :(
Anyway the solution I've been come up with is:
Lets say I want to expire every 3 minutes:
So im holding the data in 3 fields 0 1 2.
and then i do module% 3 to current time in minutes.
if the module for example == 0
so im using only 1 2 and 0 i delete;
then it change to 1 so im using 2 and 0 and delete 1.
Im not using it and i didnt checked it but im just let you know its possible
There is a Redisson java framework which implements hash Map object with entry TTL support. It uses hmap and zset Redis objects under the hood. Usage example:
RMapCache<Integer, String> map = redisson.getMapCache('map');
map.put(1, 30, TimeUnit.DAYS); // this entry expires in 30 days
This approach is quite useful.
We had the same problem discussed here.
We have a Redis hash, a key to hash entries (name/value pairs), and we needed to hold individual expiration times on each hash entry.
We implemented this by adding n bytes of prefix data containing encoded expiration information when we write the hash entry values, we also set the key to expire at the time contained in the value being written.
Then, on read, we decode the prefix and check for expiration. This is additional overhead, however, the reads are still O(n) and the entire key will expire when the last hash entry has expired.
Regarding a NodeJS implementation, I have added a custom expiryTime field in the object I save in the HASH. Then after a specific period time, I clear the expired HASH entries by using the following code:
client.hgetall(HASH_NAME, function(err, reply) {
if (reply) {
Object.keys(reply).forEach(key => {
if (reply[key] && JSON.parse(reply[key]).expiryTime < (new Date).getTime()) {
client.hdel(HASH_NAME, key);
}
})
}
});
If your use-case is that you're caching values in Redis and are tolerant of stale values but would like to refresh them occasionally so that they don't get too stale, a hacky workaround is to just include a timestamp in the field value and handle expirations in whatever place you're accessing the value.
This allows you to keep using Redis hashes normally without needing to worry about any complications that might arise from the other approaches. The only cost is a bit of extra logic and parsing on the client end. Not a perfect solution, but it's what I typically do as I haven't needed TTL for any other reason and I'm usually needing to do extra parsing on the cached value anyways.
So basically it'll be something like this:
In Redis:
hash_name
- field_1: "2021-01-15;123"
- field_2: "2021-01-20;125"
- field_2: "2021-02-01;127"
Your (pseudo)code:
val = redis.hget(hash_name, field_1)
timestamp = val.substring(0, val.index_of(";"))
if now() > timestamp:
new_val = get_updated_value()
new_timestamp = now() + EXPIRY_LENGTH
redis.hset(hash_name, field_1, new_timestamp + ";" + new_val)
val = new_val
else:
val = val.substring(val.index_of(";"))
// proceed to use val
The biggest caveat imo is that you don't ever remove fields so the hash can grow quite large. Not sure there's an elegant solution for that - I usually just delete the hash every once in a while if it feels too big. Maybe you could keep track of everything you've stored somewhere and remove them periodically (though at that point, you might as well just be using that mechanism to expire the fields manually...).
You could store key/values in Redis differently to achieve this, by just adding a prefix or namespace to your keys when you store them e.g. "hset_"
Get a key/value GET hset_key equals to HGET hset key
Add a key/value SET hset_key value equals to HSET hset key
Get all keys KEYS hset_* equals to HGETALL hset
Get all vals should be done in 2 ops, first get all keys KEYS hset_* then get the value for each key
Add a key/value with TTL or expire which is the topic of question:
SET hset_key value
EXPIRE hset_key
Note: KEYS will lookup up for matching the key in the whole database which may affect on performance especially if you have big database.
Note:
KEYS will lookup up for matching the key in the whole database which may affect on performance especially if you have big database. while SCAN 0 MATCH hset_* might be better as long as it doesn't block the server but still performance is an issue in case of big database.
You may create a new database for storing separately these keys that you want to expire especially if they are small set of keys.
Thanks to #DanFarrell who highlighted the performance issue related to
KEYS
You can. Here is an example.
redis 127.0.0.1:6379> hset key f1 1
(integer) 1
redis 127.0.0.1:6379> hset key f2 2
(integer) 1
redis 127.0.0.1:6379> hvals key
1) "1"
2) "1"
3) "2"
redis 127.0.0.1:6379> expire key 10
(integer) 1
redis 127.0.0.1:6379> hvals key
1) "1"
2) "1"
3) "2"
redis 127.0.0.1:6379> hvals key
1) "1"
2) "1"
3) "2"
redis 127.0.0.1:6379> hvals key
Use EXPIRE or EXPIREAT command.
If you want to expire specific keys in the hash older then 1 month. This is not possible.
Redis expire command is for all keys in the hash.
If you set daily hash key, you can set a keys time to live.
hset key-20140325 f1 1
expire key-20140325 100
hset key-20140325 f1 2
You could use the Redis Keyspace Notifications by using psubscribe and "__keyevent#<DB-INDEX>__:expired".
With that, each time that a key will expire, you will get a message published on your redis connection.
Regarding your question basically you create a temporary "normal" key using set with an expiration time in s/ms. It should match the name of the key that you wish to delete in your set.
As your temporary key will be published to your redis connection holding the "__keyevent#0__:expired" when it expired, you can easily delete your key from your original set as the message will have the name of the key.
A simple example in practice on that page : https://medium.com/#micah1powell/using-redis-keyspace-notifications-for-a-reminder-service-with-node-c05047befec3
doc : https://redis.io/topics/notifications ( look for the flag xE)
static async setCount(ip: string, count: number) {
const val = await redisClient.hSet(ip, 'ipHashField', count)
await redisClient.expire(ip, this.expireTime)
}
Try expire your key.