Twemproxy key distribution - hash

How twemproxy uses hash function and keys distribution method to decide which key will go to which shard?
Is there any in which lib or code i can use to specify which hash function to use and given set of servers, can it tell which shard it will go to?
(mine uses distribution - ketama and hash - fnv1a_64 )
Any help is appreciated

Related

Setting Key of System.Security.Cryptography.AesManaged in C#

When I instantiate AesManaged in C#, it already has the .Key property set. Is it safe to use it? I.e. is it cryptographically strong and random enough for each new instantiation of AesManaged (and every time I call .GenerateKey() on an existing instance)?
All examples I've seen, first generate a random password and then use a key derivation function like Rfc2898DeriveBytes or PasswordDeriveBytes to generate the Key (e.g. How to use 'System.Security.Cryptography.AesManaged' to encrypt a byte[]?). This requires additional information - like salt value, number of password iterations, what hash algorithm to use.
I understand I need all that if I want my users to come up with passwords. I then need to produce random cryptographically strong Keys from them. But if everything is generated by the computer, do I need to programmatically generate random passwords and then Keys from them, or can I just use whatever AesManaged.Key contains?
Yes, you can use the default Key and IV values if you like. You can also explicitly regenerate a new random one with:
SymmetricAlgorithm.GenerateKey() or SymmetricAlgorithm.CreateEncryptor(null, null)
It depends on what you are protecting and how many owners of information you need to support. If speed / volume doesn't matter, then you are still better adopting PBKDF2 by using Rfc2898DeriveBytes for the iterations.
Regardless, you don't want to share the key across multiple users / tenants / security "realms", however, but sure you can use the default key for a single application. If you do, combine the salt with it.
The reasons we use user defined passwords and salts are to avoid attacks that exploit common/weak passwords or shared passwords between users and to ensure as application owners don't know their keys.
The reasons we use PBKDF2 (derivation with many iterations) is to slow down the attacker. Penalty we pay 1 time per user is paid many times by an attacker.
If your needs are just to have a random key for a single application or system, then the default is usable, assuming, of course, it provides the strength you need.

Random identifiers: secure VS unique. What to chose?

Some times when we want to identify something we generate IDentifier for this object.
Sometimes we use just rand sometimes we want something more reliable. Currently I am between:
Data::UUID
Crypt::PRNG
Would be there any difference between results of these two methods?
$id = Data::UUID->new->create_bin; #
$id = Crypt::PRNG::random_bytes(16); # https://metacpan.org/pod/Crypt::PRNG#random_bytes
Both are 16bytes random. Regardless of interface are there further differences?
UPD
In my case I use ID as random string to identificate query to Stripe
It depends what it's used for.
If it's used to identify as your variable name suggests, it needs to be unique.
For example, if two people shared the same session id, they would share the same session.
For example, if two people shared the same temporary file name, they would share the same file.
It's an encryption key, you want it to be random so it has the most entropy possible.

What hash function does Vertica use

I'm looking for a way to assign devices to different groups for an A/B test.
To identify unique devices, we assign them with unique strings as keys - I have no control over this.
I thought about hashing, we're using a vertica DB and it has a built in function for hashing. But, because I don't know what kind of algorithm the function uses I can't reproduce it in the controller that assigns the devices to the A/B test groups.
I'm looking to apply the function on the unique device key.
I looked in the vertica documentation about the function but to no avail.
Help would be appreciated
The HASH() function is proprietary; however there are plans to make it open source in an upcoming release.
For segmentation, any SQL function can be used as long as it's immutable.

Separate data encryption

I store some sensitive data. Data is divded into parts and I want to have separate accees to each part. Let's assume that I have 1000 files. I want to encrypt each file by the same symetric encryption algorithm.
I guess that breaking a key is easier when hacker has got 1000 cryptogram than he has only one cryptogram, so I think that I should use separate key for each file.
My question is following:
Should I use separate key for each file?
If I should, there is problem with storing 1000 keys. So I want to have one secret key and use some my own algorithm to calculate separate key for each file from secret key. Is it good idea?
If you consider passive adversary and use CPA-strong cipher (like AES), it is sufficient to use only one key for all files. Supposing adversary knows the cipher you use, and even knows plaintexts, he cannot reconstruct the key with non-negligible probability. Here is more detailed answer.
If you consider also active adversary (which can replace ciphertexts) you should use Authenticated Encryption. But as I understand this is not your case.
So I want to have one secret key and use some my own algorithm to calculate separate key for each file from secret key. Is it good idea?
In general, developing your own algorithm or scheme is bad idea. You can easily make some unseen mistake in algorithm or implementation and you data will be vulnerable. It is better to use well-known algorithms and implementations peer-reviewed by lots of people and proved to be secure.

Meaning of Open hashing and Closed hashing

Open Hashing (Separate Chaining):
In open hashing, keys are stored in linked lists attached to cells of a hash table.
Closed Hashing (Open Addressing):
In closed hashing, all keys are stored in the hash table itself without the use of linked lists.
I am unable to understand why they are called open, closed and Separate. Can some one explain it?
The use of "closed" vs. "open" reflects whether or not we are locked in to using a certain position or data structure (this is an extremely vague description, but hopefully the rest helps).
For instance, the "open" in "open addressing" tells us the index (aka. address) at which an object will be stored in the hash table is not completely determined by its hash code. Instead, the index may vary depending on what's already in the hash table.
The "closed" in "closed hashing" refers to the fact that we never leave the hash table; every object is stored directly at an index in the hash table's internal array. Note that this is only possible by using some sort of open addressing strategy. This explains why "closed hashing" and "open addressing" are synonyms.
Contrast this with open hashing - in this strategy, none of the objects are actually stored in the hash table's array; instead once an object is hashed, it is stored in a list which is separate from the hash table's internal array. "open" refers to the freedom we get by leaving the hash table, and using a separate list. By the way, "separate list" hints at why open hashing is also known as "separate chaining".
In short, "closed" always refers to some sort of strict guarantee, like when we guarantee that objects are always stored directly within the hash table (closed hashing). Then, the opposite of "closed" is "open", so if you don't have such guarantees, the strategy is considered "open".
You have an array that is the "hash table".
In Open Hashing each cell in the array points to a list containg the collisions. The hashing has produced the same index for all items in the linked list.
In Closed Hashing you use only one array for everything. You store the collisions in the same array. The trick is to use some smart way to jump from collision to collision until you find what you want. And do this in a reproducible / deterministic way.
The name open addressing refers to the fact that the location ("address") of the element is not determined by its hash value. (This method is also called closed hashing).
In separate chaining, each bucket is independent, and has some sort of ADT (list, binary search trees, etc) of entries with the same index.
In a good hash table, each bucket has zero or one entries, because we need operations of order O(1) for insert, search, etc.
This is a example of separate chaining using C++ with a simple hash function using mod operator (clearly, a bad hash function)