How to search an encrypted attribute? - postgresql

I have a sensitive attribute that must be encrypted at all times except during display (not my rule and I think it's overkill, but I must follow this rule). Additionally, the secret used to encrypt/decrypt this data must not be on or accessible through the database. So currently I have a session for the user that stores their encrypted password and decrypts this data when needed. However, now I need to find records by the encrypted attribute. I currently utilize ActiveSupport::MessageEncryptor for encryption/decryption of the attribute. Here's the direction I think I should go to accomplish this:
decryptor = ActiveSupport::MessageEncryptor.new(encrypted_password)
Family.where("decryptor.decrypt_and_verify(name) == ?", some_search_name)
Obviously the first side of that condition does not work as-is, but I need some way to do that. Any ideas?

Quick Primer to Passwords in the DB
This goes to show that encryption in the database is hard, and that you shouldn't do it unless you have thought carefully through your threat model and understand what all the tradeoffs are. To be honest, I have serious doubts that an ORM can ever give you the security you need where you need encryption (for important knowledge reasons), and on PostgreSQL, it is particularly hard because of the possibility of key disclosure in the log files. In general you really need to properly protect both encrypted and plain text with regard to passwords, so you really don't want a relational interface here but a functional one, with a query running under a totally different set of permissions.
Now, I can't tell in your example whether you are trying to protect passwords, but if you are, that's entirely the wrong way to go about it. My example below is going to use MD5. Now I am aware that MD5 is frowned upon by the crypto community because of the relatively short output, but it has the advantage in this case of not requiring pg_crypto to support and being likely stronger than attacking the password directly (in the context of short password strings, it is likely "good enough" particularly when combined with other measures).
Now what you want to do is this: you want to salt the password, then hash it, and then search the hashed value. The most performant way to do this would be to have a users table which does not include the password, but does include the salt, and a shadow table which includes the hashed password but not the user-accessible data. The shadow table would be restricted to its owner and that owner would have access to the user table too.
Then you could write a function like this:
CREATE OR REPLACE FUNCTION get_userid_by_password(in_username text, in_password text)
RETURNS INT LANGUAGE SQL AS
$$
SELECT user_id
FROM shadow
JOIN users ON users.id = shadow.user_id
WHERE users.username = $1 AND shadow.hashed_password = md5(users.salt || $2);
$$ SECURITY DEFINER;
ALTER FUNCTION get_userid_by_password(text, text) OWNER TO shadow_owner;
You would then have to drop to SQL to run this function (don't go through your ORM). However you could index shadow.hashed_password and have it work with an index here (because the matching hash could be generated before scanning the table), and you are reasonably protected against SQL injections giving away the password hashes. You still have to make sure that logging will not be generally enabled of these queries and there are a host of other things to consider, but it gives you an idea of how best to manage passwords per se. Alternatively in your ORM you could do something that would have a resulting SQL query like:
SELECT * FROM users WHERE id = get_userid_by_password($username, $password)
(The above is pseudocode and intended only for illustration purposes. If you use a raw query like that assembled as a text string you are asking for SQL injection.)
What if it isn't a password?
If you need reversible encryption, then you need to go further. Note that in the example above, the index could be used because I was searching merely for an equality on the encrypted data. Searching for an unencrypted data means that indexes are not usable. If you index the unencrypted data then why are you encrypting it in the first place? Also decryption does place burdens on the processor so it will be slow.
In all cases you need to carefully think through your threat model and ask how other vulnerabilities could make your passwords less secure.

Related

Searching for non-hashed data

I have a user table with a password column that uses md5 hash. Over time, some of its hashes were changed to plain text (users asked for immediate password change, without using the method that would apply hash).
I have modest amount of data, i will do it by hand, but i want to know: there's something along the lines of
select * from TableName where Column is not hashed
or
update from TableName
set Column = md5(current value)
where Column is not hashed
or something like that?
As noted previously, the use of MD5 for hashing private credentials or otherwise use within the process of authentication and authorization is officially highly discouraged.
However, your best chance to detect whether a field stores an MD5 value or a non-hashed value and convert it on-the-fly is something like the following:
UPDATE TableName
SET Column = md5(Column)
WHERE Column !~ '^[a-f0-9]{32}$'
There might be remaining ones in case someone really clever guy generated an MD5 hash of something and used that directly as a password. That will not be detectable but authentication must fail on such a case as the stored value would not match the MD5 hash of the entered password at login.
You should also not think about transferring plain-text passwords down to the database for hashing and comparison as the attack surface really is pretty high already. Even if you use decent TLS for your database connections who guarantees you that an administrator or attacker hasn't enabled logging slow statements with parameters directly at the database?
Instead, the application should use a library to generate a salt and salt-hashed password directly and only transfer the salt and hash to the storage. The format specified by crypt is industry accepted (and therefore highly recommended), there are solid libraries for any programming language available and once a certain algorithm becomes deprecated, you can incrementally change it without a coordinated one-shot upgrade.

Reason behind MD5 Hashing?

I have sometimes seen and have been recommended to store Strings and associative array keys as MD5 hash values. Now I have learnt about hashing from MIT - OCW 6.046j and it seems more like a scheme to store data in an efficient format for fast searching and to prevent people from getting back the original.
But don't languages supporting associative arrays / dictionaries do this internally? What additional advantage is the MD5 hash giving?
Most languages may support this internally, for example see Java's hashcode(), which is used when storing keys in a HashMap:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
But there are scenarios where you want to do it yourself.
Scenario 1 - using as key in a database:
Let's suppose you have a big no-sql-ish database of letters and metadata of these letters. You want to be able to find a letter's metadata quickly without searching. What would your index be?
One option is using a running index that's unrelated to the letter's content, but then you have to search the database before being able to find a document's metadata. Another option is to create a signature for the document composed of it's prefix (it's just an example out of many), but some documents may share this property ("Dear John,").
So how about taking into account the entire document? That's where you can use md5 as the row-key for your documents.
In this scenario you're relying on having no collisions, and the argument in favour of this assumption usually mention your chances of running into a demented gorilla being (usually) greater. The Secure Hash Algorithm family produce even less collisions.
I mention this since databases normally do not do this out of the box (frameworks may...).
Scenario 2 - One-way hash for password storage:
note: This may no longer apply for md5, but it does for the SHA-family variants.
In this scenario, you want to store passwords on your database, but storing plain-text passwords may have drawbacks if the database is compromised (user often share passwords across sites - may lead to accounts on other sites compromised as well). The usage of hashing here is storing the hashed password and when a user attempts to log-in you only compare the hash and not the password itself. This way you don't need the password stored locally and it is a lot harder to crack it.

What is SaltKey in t-sql?

What is the purpose of saltkey in the t-sql. For example in aspdotnetstorefront databse there is a table name customer, we encrypt/decrypt password then there is another field called SaltKey, what is the purpose of it?
Your question is vague, but I think you are looking for information about a salt, which is a cryptographic concept and not a relational database one. From Wikipedia:
The benefit provided by using a salted password is making a lookup
table assisted dictionary attack against the stored values
impractical, provided the salt is large enough. That is, an attacker
would not be able to create a precomputed lookup table (i.e. a rainbow
table) of hashed values (password + salt), because it would take too
much space. A simple dictionary attack is still very possible,
although much slower since it cannot be precomputed.
See here http://en.wikipedia.org/wiki/Salt_%28cryptography%29 it has to do with encryption and not T-SQL
better look http://en.wikipedia.org/wiki/Salt_%28cryptography%29
see here http://crackstation.net/hashing-security.html
this will help you out to find what is salt..

Get original value from HASH

I came across a table the other day on our enterprise system -- dba_users (oracle).
I was able to find a hashed password in this table for each employee, and also their username.
As far as I know (from googling) the username + password is concatenated, then hashed.
Question: knowing the "salt" (my own username), the "original value" (my own password), and also the hashed value...is there a danger here of being able to figure-out the hash?
Also -- googling oracle 10g hash -- seems like some folks think they've figured the hash algorythm out. And I've read about "rainbow tables" and "offline dictionary attacks"... And finally, I've googled oracle 11g, and one of the features in that version is they hide the hashed password in dba_users so end-users can't see it.
Anyway, I'm scratching my head over why I (i.e., end-users) have access to this table, and why the DBA dosen't seem too worried about it.
The whole point of hashing the password and then storing the hash to database is that if you do so you won't have to worry about who can see the password in the table.
To emphasize: a (properly calculated) hash value of a password which is stored in database is completely useless without the original.
As far as I know, for algorithms like md5, there isn't a way to reverse engineer the original password from hash whatsoever. That's why most services nowadays don't send you your password when you click 'I forgot' link - instead they offer to set a new one (as soon as you provide the old password and the service compares hashes).
Elaborating on what #Goran Jovic said, by concatenating (ie salting) each hashed password with the username.There is no danger in making this available. To make sense of this, you have to understand how a rainbow table work. The way a series of passwords is cracked with a rainbow table is by loading up a precomputed series of hashes from passwords into memory. Then, searching through the table of hashes associated unknown passwords to see if you can find a match in the rainbow table (of which you know the password corresponding with a specific hash). By salting the hashes with a unique identifier (ie a username), however, you defeat this attack because even a password that's already been seen will hash differently depending n the user associated with it. Therefore now, instead of a rainbow table having to handle the millions of different password combinations possible, it must now contain a hash of every possible password plus every possible username. The results space of this is simply too large to search and unless there was a quantum leap in computing technology impossible to beat using a bruteforce tactic.

How to search the value when value is stored as encrypted

in my database i store the student information in encrypted form.
now i want to perform the search to list all student which name is start with "something" or contains "something"
anybody have idea that how can perform this type of query?
Please suggest
Any decent encryption algorithm has as one of its core features the fact that it's impossible to deduce anything about the plaintext just by looking at the encrypted text. If you were able to tell, just by looking at the encrypted text, that the plaintext contained the string william, any attackers would be able to get that information just as easily, and you may as well not be encrypting at all.
The only way to perform this kind of operation on the data is to have access to the decrypted data. Using the model you've described - where the database only ever sees the encrypted data - it's not possible for the database to do this work, as the database has no access to the data it needs.
You need to have the data you're wanting to search on decrypted. The only complete way to do this is to have the application pull all the data out of the database, decrypt it, then do the filtering/sorting/whatever in your application. Obviously this is not going to scale well - but that's surely something you took into consideration when you decided to encrypt the data before putting it in the database.
Another option would be to store fragments of the data unencrypted. For example, if you have a first_name field and you want to be able to retrieve all records where first_name begins with a, have a first_name_first_letter field. Obviously this isn't going to scale well either - if you want to search for all records where first_name contains ill, you're going to have to store the complete first_name unencrypted.
There's a more serious problem with this solution though: by storing unencrypted data, you're leaking information about the encrypted data. The more unencrypted data you store, the more you leak. The more you leak, the more clues you're leaving for an attacker to defeat your encryption - plus, if you've stored the bit they were interested in unencrypted, they've already won.
Another question points to SQLCipher - it's an implemention of sqlite that does the encryption in the database. It seems to be targeted towards your use case - it's even already used on a couple of iPhone apps.
However, it has the database doing the encryption, not the application. This lets the database also handle the decryption, and hence the database is able to inspect the contents of the fields and do the searching you're looking for.
If you're still insisting on not doing the encryption in the database, this won't work for you.
If all you want is the equivalent of "starts with" and "contains", you might be able to do something with a bit field and the bitwise logical operators.
Not sure on the syntax you'd use, exactly (I'm a bit rusty on SQL) but the idea would be to create an additional field for each record which has a bit set for each letter that occurs in the name, then do something like:
SELECT * from someTable where (searchValue & bitField)>0
You then need to iterate over those records, decrypt them, and determine whether they actually meet the criteria you really wanted to search on (since you'd get a superset of the desired records back from the search).
You'd obviously be leaking some information about the contents of the field by doing this, but you can probably reduce that by encrypting the bitfields as well, or by turning on a few extra bits in each bitfield, so you can't tell "bob" from "bobby", for example.
I'm curious about what sort of security goal you're trying to meet with this encryption, though. If you describe the model a bit more, you might get better answers.