Searching for non-hashed data - postgresql

I have a user table with a password column that uses md5 hash. Over time, some of its hashes were changed to plain text (users asked for immediate password change, without using the method that would apply hash).
I have modest amount of data, i will do it by hand, but i want to know: there's something along the lines of
select * from TableName where Column is not hashed
or
update from TableName
set Column = md5(current value)
where Column is not hashed
or something like that?

As noted previously, the use of MD5 for hashing private credentials or otherwise use within the process of authentication and authorization is officially highly discouraged.
However, your best chance to detect whether a field stores an MD5 value or a non-hashed value and convert it on-the-fly is something like the following:
UPDATE TableName
SET Column = md5(Column)
WHERE Column !~ '^[a-f0-9]{32}$'
There might be remaining ones in case someone really clever guy generated an MD5 hash of something and used that directly as a password. That will not be detectable but authentication must fail on such a case as the stored value would not match the MD5 hash of the entered password at login.
You should also not think about transferring plain-text passwords down to the database for hashing and comparison as the attack surface really is pretty high already. Even if you use decent TLS for your database connections who guarantees you that an administrator or attacker hasn't enabled logging slow statements with parameters directly at the database?
Instead, the application should use a library to generate a salt and salt-hashed password directly and only transfer the salt and hash to the storage. The format specified by crypt is industry accepted (and therefore highly recommended), there are solid libraries for any programming language available and once a certain algorithm becomes deprecated, you can incrementally change it without a coordinated one-shot upgrade.

Related

Reason behind MD5 Hashing?

I have sometimes seen and have been recommended to store Strings and associative array keys as MD5 hash values. Now I have learnt about hashing from MIT - OCW 6.046j and it seems more like a scheme to store data in an efficient format for fast searching and to prevent people from getting back the original.
But don't languages supporting associative arrays / dictionaries do this internally? What additional advantage is the MD5 hash giving?
Most languages may support this internally, for example see Java's hashcode(), which is used when storing keys in a HashMap:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
But there are scenarios where you want to do it yourself.
Scenario 1 - using as key in a database:
Let's suppose you have a big no-sql-ish database of letters and metadata of these letters. You want to be able to find a letter's metadata quickly without searching. What would your index be?
One option is using a running index that's unrelated to the letter's content, but then you have to search the database before being able to find a document's metadata. Another option is to create a signature for the document composed of it's prefix (it's just an example out of many), but some documents may share this property ("Dear John,").
So how about taking into account the entire document? That's where you can use md5 as the row-key for your documents.
In this scenario you're relying on having no collisions, and the argument in favour of this assumption usually mention your chances of running into a demented gorilla being (usually) greater. The Secure Hash Algorithm family produce even less collisions.
I mention this since databases normally do not do this out of the box (frameworks may...).
Scenario 2 - One-way hash for password storage:
note: This may no longer apply for md5, but it does for the SHA-family variants.
In this scenario, you want to store passwords on your database, but storing plain-text passwords may have drawbacks if the database is compromised (user often share passwords across sites - may lead to accounts on other sites compromised as well). The usage of hashing here is storing the hashed password and when a user attempts to log-in you only compare the hash and not the password itself. This way you don't need the password stored locally and it is a lot harder to crack it.

How to search an encrypted attribute?

I have a sensitive attribute that must be encrypted at all times except during display (not my rule and I think it's overkill, but I must follow this rule). Additionally, the secret used to encrypt/decrypt this data must not be on or accessible through the database. So currently I have a session for the user that stores their encrypted password and decrypts this data when needed. However, now I need to find records by the encrypted attribute. I currently utilize ActiveSupport::MessageEncryptor for encryption/decryption of the attribute. Here's the direction I think I should go to accomplish this:
decryptor = ActiveSupport::MessageEncryptor.new(encrypted_password)
Family.where("decryptor.decrypt_and_verify(name) == ?", some_search_name)
Obviously the first side of that condition does not work as-is, but I need some way to do that. Any ideas?
Quick Primer to Passwords in the DB
This goes to show that encryption in the database is hard, and that you shouldn't do it unless you have thought carefully through your threat model and understand what all the tradeoffs are. To be honest, I have serious doubts that an ORM can ever give you the security you need where you need encryption (for important knowledge reasons), and on PostgreSQL, it is particularly hard because of the possibility of key disclosure in the log files. In general you really need to properly protect both encrypted and plain text with regard to passwords, so you really don't want a relational interface here but a functional one, with a query running under a totally different set of permissions.
Now, I can't tell in your example whether you are trying to protect passwords, but if you are, that's entirely the wrong way to go about it. My example below is going to use MD5. Now I am aware that MD5 is frowned upon by the crypto community because of the relatively short output, but it has the advantage in this case of not requiring pg_crypto to support and being likely stronger than attacking the password directly (in the context of short password strings, it is likely "good enough" particularly when combined with other measures).
Now what you want to do is this: you want to salt the password, then hash it, and then search the hashed value. The most performant way to do this would be to have a users table which does not include the password, but does include the salt, and a shadow table which includes the hashed password but not the user-accessible data. The shadow table would be restricted to its owner and that owner would have access to the user table too.
Then you could write a function like this:
CREATE OR REPLACE FUNCTION get_userid_by_password(in_username text, in_password text)
RETURNS INT LANGUAGE SQL AS
$$
SELECT user_id
FROM shadow
JOIN users ON users.id = shadow.user_id
WHERE users.username = $1 AND shadow.hashed_password = md5(users.salt || $2);
$$ SECURITY DEFINER;
ALTER FUNCTION get_userid_by_password(text, text) OWNER TO shadow_owner;
You would then have to drop to SQL to run this function (don't go through your ORM). However you could index shadow.hashed_password and have it work with an index here (because the matching hash could be generated before scanning the table), and you are reasonably protected against SQL injections giving away the password hashes. You still have to make sure that logging will not be generally enabled of these queries and there are a host of other things to consider, but it gives you an idea of how best to manage passwords per se. Alternatively in your ORM you could do something that would have a resulting SQL query like:
SELECT * FROM users WHERE id = get_userid_by_password($username, $password)
(The above is pseudocode and intended only for illustration purposes. If you use a raw query like that assembled as a text string you are asking for SQL injection.)
What if it isn't a password?
If you need reversible encryption, then you need to go further. Note that in the example above, the index could be used because I was searching merely for an equality on the encrypted data. Searching for an unencrypted data means that indexes are not usable. If you index the unencrypted data then why are you encrypting it in the first place? Also decryption does place burdens on the processor so it will be slow.
In all cases you need to carefully think through your threat model and ask how other vulnerabilities could make your passwords less secure.

Get original value from HASH

I came across a table the other day on our enterprise system -- dba_users (oracle).
I was able to find a hashed password in this table for each employee, and also their username.
As far as I know (from googling) the username + password is concatenated, then hashed.
Question: knowing the "salt" (my own username), the "original value" (my own password), and also the hashed value...is there a danger here of being able to figure-out the hash?
Also -- googling oracle 10g hash -- seems like some folks think they've figured the hash algorythm out. And I've read about "rainbow tables" and "offline dictionary attacks"... And finally, I've googled oracle 11g, and one of the features in that version is they hide the hashed password in dba_users so end-users can't see it.
Anyway, I'm scratching my head over why I (i.e., end-users) have access to this table, and why the DBA dosen't seem too worried about it.
The whole point of hashing the password and then storing the hash to database is that if you do so you won't have to worry about who can see the password in the table.
To emphasize: a (properly calculated) hash value of a password which is stored in database is completely useless without the original.
As far as I know, for algorithms like md5, there isn't a way to reverse engineer the original password from hash whatsoever. That's why most services nowadays don't send you your password when you click 'I forgot' link - instead they offer to set a new one (as soon as you provide the old password and the service compares hashes).
Elaborating on what #Goran Jovic said, by concatenating (ie salting) each hashed password with the username.There is no danger in making this available. To make sense of this, you have to understand how a rainbow table work. The way a series of passwords is cracked with a rainbow table is by loading up a precomputed series of hashes from passwords into memory. Then, searching through the table of hashes associated unknown passwords to see if you can find a match in the rainbow table (of which you know the password corresponding with a specific hash). By salting the hashes with a unique identifier (ie a username), however, you defeat this attack because even a password that's already been seen will hash differently depending n the user associated with it. Therefore now, instead of a rainbow table having to handle the millions of different password combinations possible, it must now contain a hash of every possible password plus every possible username. The results space of this is simply too large to search and unless there was a quantum leap in computing technology impossible to beat using a bruteforce tactic.

What is the recommended way to encrypt user passwords in a database?

In a web application written in Perl and using PostgreSQL the users have username and password. What would be the recommended way to store the passwords?
Encrypting them using the crypt() function of Perl and a random salt? That would limit the useful length of passswords to 8 characters and will require fetching the stored password in order to compare to the one given by the user when authenticating (to fetch the salt that was attached to it).
Is there a built-in way in PostgreSQL to do this?
Should I use Digest::MD5?
Don't use SHA1 or SHA256, as most other people are suggesting. Definitely don't use MD5.
SHA1/256 and MD5 are both designed to create checksums of files and strings (and other datatypes, if necessary). Because of this, they're designed to be as fast as possible, so that the checksum is quick to generate.
This fast speed makes it much easier to bruteforce passwords, as a well-written program easily can generate thousands of hashes every second.
Instead, use a slow algorithm that is specifically designed for passwords. They're designed to take a little bit longer to generate, with the upside being that bruteforce attacks become much harder. Because of this, the passwords will be much more secure.
You won't experience any significant performance disadvantages if you're only looking at encrypting individual passwords one at a time, which is the normal implementation of storing and checking passwords. It's only in bulk where the real difference is.
I personally like bcrypt. There should be a Perl version of it available, as a quick Google search yielded several possible matches.
MD5 is commonly used, but SHA1/SHA256 is better. Still not the best, but better.
The problem with all of these general-purpose hashing algorithms is that they're optimized to be fast. When you're hashing your passwords for storage, though, fast is just what you don't want - if you can hash the password in a microsecond, then that means an attacker can try a million passwords every second if they get their hands on your password database.
But you want to slow an attacker down as much as possible, don't you? Wouldn't it be better to use an algorithm which takes a tenth of a second to hash the password instead? A tenth of a second is still fast enough that users won't generally notice, but an attacker who has a copy of your database will only be able to make 10 attempts per second - it will take them 100,000 times longer to find a working set of login credentials. Every hour that it would take them at a microsecond per attempt becomes 11 years at a tenth of a second per attempt.
So, how do you accomplish this? Some folks fake it by running several rounds of MD5/SHA digesting, but the bcrypt algorithm is designed specifically to address this issue. I don't fully understand the math behind it, but I'm told that it's based on the creation of Blowfish frames, which is inherently slow (unlike MD5 operations which can be heavily streamlined on properly-configured hardware), and it has a tunable "cost" parameter so that, as Moore's Law advances, all you need to do is adjust that "cost" to keep your password hashing just as slow in ten years as it is today.
I like bcrypt the best, with SHA2(256) a close second. I've never seen MD5 used for passwords but maybe some apps/libraries use that. Keep in mind that you should always use a salt as well. The salt itself should be completely unique for each user and, in my opinion, as long as possible. I would never, ever use just a hash against a string without a salt added to it. Mainly because I'm a bit paranoid and also so that it's a little more future-proof.
Having a delay before a user can try again and auto-lockouts (with auto-admin notifications) is a good idea as well.
The pgcrypto module in PostgreSQL has builtin suppotr for password hashing, that is pretty smart about storage, generation, multi-algorithm etc. See http://www.postgresql.org/docs/current/static/pgcrypto.html, the section on Password Hashing Functions. You can also see the pgcrypto section of http://www.hagander.net/talks/hidden%20gems%20of%20postgresql.pdf.
Use SHA1 or SHA256 hashing with salting. Thats the way to go for storing passwords.
If you don't use a password recovery mechanism (Not password reset) I think using a hashing mechanism is better than trying to encrypt the password. You can just check the hashes without any security risk. Even you don't know the password of the user.
I would suggest storing it as a salted md5 hash.
INSERT INTO user (password) VALUES (md5('some_salt'||'the_password'));
You could calculate the md5 hash in perl if you wish, it doesn't make much difference unless you are micro-optimizing.
You could also use sha1 as an alternative, but I'm unsure if Postgres has a native implementation of this.
I usually discourage the use of a dynamic random salt, as it is yet another field that must be stored in the database. Plus, if your tables were ever compromised, the salt becomes useless.
I always go with a one-time randomly generated salt and store this in the application source, or a config file.
Another benefit of using a md5 or sha1 hash for the password is you can define the password column as a fixed width CHAR(32) or CHAR(40) for md5 and sha1 respectively.

Using a hash of data as a salt

I was wondering - is there any disadvantages in using the hash of something as a salt of itself?
E.g. hashAlgorithm(data + hashAlgorithm(data))
This prevents the usage of lookup tables, and does not require the storage of a salt in the database. If the attacker does not have access to the source code, he would not be able to obtain the algorithm, which would make brute-forcing significantly harder.
Thoughts? (I have a gut feeling that this is bad - but I wanted to check if it really is, and if so, why.)
If the attacker does not have access to the source code
This is called "security through obscurity", which is always considered bad. An inherently safe method is always better, even if the only difference lies in the fact that you don't feel save "because they don't know how". Someone can and will always find the algorithm -- through careful analysis, trial-and-error, or because they found the source by SSH-ing to your shared hosting service, or any of a hundred other methods.
Using a hash of the data as salt for the data is not secure.
The purpose of salt is to produce unpredictable results from inputs that are otherwise the same. For example, even if many users select the same input (as a password, for example), after applying a good salt, you (or an attacker) won't be able to tell.
When the salt is a function of the data, an attacker can pre-compute a lookup table, because the salt for every password is predictable.
The best salts are chosen from a cryptographic pseudo-random number generator initialized with a random seed. If you really cannot store an extra salt, consider using something that varies per user (like a user name), together with something application specific (like a domain name). This isn't as good as a random salt, but it isn't fatally flawed.
Remember, a salt doesn't need to be secret, but it cannot be a function of the data being salted.
This offers no improvement over just hashing. Use a randomly generated salt.
The point of salting is to make it so two chronologically distinct values' hashes differ, and by so doing breaks pre-calculated lookup tables.
Consider:
data = "test"
hash = hash("test"+hash("test"))
Hash will be constant whenever data = "test". Thus, if the attacker has the algorithm (and the attacker always has the algorithm) they can pre-calculate hash values for a dictionary of data entries.
This is not salt - you have just modified the hash function. Instead of using lookup table for the original hashAlgorithm, attacker can just get the table for your modified one; this does not prevent the usage of lookup tables.
It is always better to use true random data as salt. Imagine an implementation where the username ist taken as salt value. This would lead to reduced security for common names like "root" or "admin".
I you don't want to create and manage a salt value for each hash, you could use a strong application wide salt. In most cases this would be absolutely sufficient and many other things would be more vulnerable than the hashes.