Rainbow tables and John the Ripper [closed] - hash

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am working on a uni project and I have to present the tool "John the Ripper" and the usage of "Rainbow tables" with it.
I played around with the different modes of "John the Ripper" and searched the concept of the "Rainbow tables".
The problem is that I cannot understand how these two are connected and how, if possible, can I use my own "Rainbow tables" in the decryption of the password hash?

They solve the same problem, but in opposite directions:
Password-cracking software like JtR dynamically performs hashing of large lists of candidate plaintexts until a plaintext is found that produces a hash that matches the target hash. If no candidate plaintext produces a match, then the original plaintext has not been discovered and the hash has not been "cracked".
Rainbow tables compare a given hash to a large (but finite) list of precomputed hashes. If a matching hash is not already present in the rainbow table, the plaintext cannot be discovered with that table.
This is the classic "time/memory trade-off" concept. Cracking takes more computation power and time, but less storage. Rainbow tables take less computation power and time, but much more storage (often terabytes in size).
And because modern GPUs can attempt billions of unsalted candidate passwords per second, rainbow tables are only more useful than GPU-based attacks in a very specific and constrained set of circumstances:
The length of the password and it composition (whether specific characters are required, etc.) are known in advance, and small enough to be generated in bulk and stored in a rainbow table (often no more than 9 or 10 characters, depending on composition)
The password was probably randomly generated (because most non-randomly-generated candidates, of much greater length and complexity, can be attempted on GPU)
So unless you're a pentester with specific knowledge that a high-value password was randomly generated but is also relatively short (which would be rare in practice), rainbow tables are largely outdated.
It also makes no sense to "build a rainbow table" on the fly for a new target because the speed of using a rainbow table is only achievable after it has been built. You can simply run through the equivalent GPU attack faster ... and still have your 4TB of disk space available for something else.

Related

How can I make sure that a hash function won't produce the same cypher for 2+ different entries?

Edit: some people flagged this question as a potential duplicate of this other one. While I agree that knowing how the birthday paradox applies to hashing functions, the 2 questions (and respective answers) address 2 different, albeit related, subjects.
The other question is asking "what are the odds of collision", whereas this question main focus is "how can I make sure that collision never happens".
I have a data lake stored in S3 where each day an ETL script dumps additional data from the day before.
Due to how the pipeline is built, it is possible for a very inconsiderate user that has admin access to produce duplicates in said data lake by manually interacting with the dump files coming from our OLTP database, and triggering the ETL script when it's not supposed to.
I thought that a good idea to prevent data duplication was to insert a form of security measure in my ETL script:
Produce a hash for each entry.
Store said hashes somewhere else (like a dynamodb table).
Whenever new data comes in, hash that as well and compare it with the already existing hashes.
If any of new hash is in the existing hashes, reject the associated entry entirely.
However, I know very little about hashing and I was reading that, although unlikely, 2 different sources can produce the same hash.
I understand it's really hard for it to happen in this situation, but I was wondering if there is a way to be 100% sure about it.
Any idea is much appreciated.
Long answer: what you want to study and explore is called "perfect hashing" (ie hashing guaranteed not to have collisions. https://en.wikipedia.org/wiki/Perfect_hash_function
Short answer: A cryptographic collision resistant algorithm like sha-1 is probably safe to use for all but the largest (PBs a day) datasets and even then its probably all right. Git uses sha-1 internally and code repositories probably deal with the most files on the planet and rarely have collisions.
See for details: https://ericsink.com/vcbe/html/cryptographic_hashes.html#:~:text=Git%20uses%20hashes%20in%20two,computed%20when%20it%20was%20stored.
Medium answer: this is actually a pretty hard problem overall and a frequent area of study for computer science and a lot depends on your particular use case and the context you're operating in. Cuckoo hashing, collision resistant algorithms, and hashing in general are probably all good terms to research. There's also a lot of art and science behind space (memory) and time (computer power needed) when picking these methods. A good rule of thumb is that perfect hashing will generally take up more space and time than a collision resistant cryptographic hash like sha-1.

How should column-level PostgresSQL encryption implemented in SQLAlchemy using pgcrypt? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
For example, there is a repo for doing this in Django: https://sourcegraph.com/github.com/dcwatson/django-pgcrypto.
There is some discussion in the SQLAlchemy manual, but I am using non-byte columns: http://docs.sqlalchemy.org/en/rel_0_9/core/types.html
I am running Flask on Heroku using SQLAlchemy.
A code example and/or some discussion would be most appreciated.
There are a bunch of stages to this kind of decision making, it's not just "shove a plugin into the stack and that encryption thing is taken care of"
First, you really need to classify each column for its attractiveness to attackers & what searches/queries need to use it, whether it's a join column / index candidate, etc. Some data needs much stronger protection than other data.
Consider who you're trying to protect against:
Casual attacker (e.g SQL injection holes used for remote table copies)
Stolen database backup (tip: Encrypt these too)
Stolen/leaked log files, possibly including queries and parameters
Attacker with direct non-superuser SQL level access
Attacker with direct superuser SQL-level access
Attacker who gains direct access to the "postgres" OS user, so they can modify configuration, copy/edit logs, install malicious extensions, alter function definitions, etc
Attacker who gains root on the DB server
Of course, there's also the app server, upstream compromise of trusted sources for programming languages and toolkits, etc. Eventually you reach a point where you have to say "I can't realistically defend against this". You can't protect against somebody coming in, saying "I'm from the Government and I'll do x/y/z to you unless you allow me to install a rootkit on this customer's server". The point is that you've got to decide what you do have to protect against, and make your security decisions based on that.
A good compromise can be to do as much of the crypto as possible in the app, so PostgreSQL never sees the encryption/decryption keys. Use one-way hashing whenever possible, rather than using reversible encryption, and when you hash, properly salt your hashes.
That means pgcrypto doesn't actually do you much good, because you're never sending plaintext to the server, and you're not sending key material to the server either.
It also means that two people with the same plaintext for column SecretValue have totally different values for SecretValueSalt, SecretValueHashedBytes in the database. So you can't join on it, use it in a WHERE clause usefully, index it usefully, etc.
For that reason, you'll often compromise with security. You might do an unsalted hash of part of the datum, so you get a partial match, then fetch all the results to your application and filter them on the application side where you have the full information required. So your storage for SecretValue now looks like SecretValueFirst10DigitsUnsaltedHash, SecretValueHashSalt, SecretValueHashBytes. But with better column names.
If in doubt, just don't send plaintext of anything sensitive to the database. That means pgcrypto isn't much use to you, and you'll be doing mostly application-side crypto. The #1 reason for that is that if you send plaintext (or worse, key material) to the DB, it might get exposed in log files, pg_stat_activity, etc.
You'll pretty much always want to store encrypted data in bytea columns. If you really insist you can hex- or base64 encode it and shove it in a text column, but developers and DBAs who have to use your system later will cry.

MD5 hashing collision in username hashing [duplicate]

This question already has answers here:
What is the clash rate for md5? [closed]
(2 answers)
Closed 9 years ago.
This question does not need any code, it's just a conceptual thing about MD5 hashing.
My app manages a community of users.
I use MD5 hashing to reduce a user nickname of arbitrary length to a hash. I expect the MD5 of every nick to be different, because this MD5(nick) will be kind of my user ID for every user.
Is this always true? I'm sure I'm missing something and there can be collisions in the long term (millions of users === millions of different nicks with different lengths)
MD5 collisions for random data (eg. usernames) are rare enough that you'd probably never see them. The problem is that MD5 has been broken with respect to collision resistance, so an attacker could easily generate a pair of usernames that have the same hash, with whatever security and/or functionality implications that would have for your design.
The usual way to generate a short identifier in your situation is to simply associate each username with a sequentially-generated number in the account database. The application uses the number internally, and only references the username when it needs to display something to a user.

How safe is it to rely on hashes for file identification?

I am designing a storage cloud software on top of a LAMP stack.
Files could have an internal ID, but it would have many advantages to store them not with an incrementing id as filename in the servers filesystems, but using an hash as filename.
Also hashes as identifier in the database would have a lot of advantages if the currently centralized database should be sharded or decentralized or some sort of master-master high availability environment should be set up. But I am not sure about that yet.
Clients can store files under any string (usually some sort of path and filename).
This string is guaranteed to be unique, because on the first level is something like "buckets" that users have go register like in Amazon S3 and Google storage.
My plan is to store files as hash of the client side defined path.
This way the storage server can directly serve the file without needing the database to ask which ID it is because it can calculate the hash and thus the filename on the fly.
But I am afraid of collisions. I currently think about using SHA1 hashes.
I heard that GIT uses hashes also revision identifiers as well.
I know that the chances of collisions are really really low, but possible.
I just cannot judge this. Would you or would you not rely on hash for this purpose?
I could also us some normalization of encoding of the path. Maybe base64 as filename, but i really do not want that because it could get messy and paths could get too long and possibly other complications.
Assuming you have a hash function with "perfect" properties and assuming cryptographic hash functions approach that the theory that applies is the same that applies to birthday attacks . What this says is that given a maximum number of files you can make the collision probability as small as you want by using a larger hash digest size. SHA has 160 bits so for any practical number of files the probability of collision is going to be just about zero. If you look at the table in the link you'll see that a 128 bit hash with 10^10 files has a collision probability of 10^-18 .
As long as the probability is low enough I think the solution is good. Compare with the probability of the planet being hit by an asteroid, undetectable errors in the disk drive, bits flipping in your memory etc. - as long as those probabilities are low enough we don't worry about them because they'll "never" happen. Just take enough margin and make sure this isn't the weakest link.
One thing to be concerned about is the choice of the hash function and it's possible vulnerabilities. Is there any other authentication in place or does the user simply present a path and retrieve a file?
If you think about an attacker trying to brute force the scenario above they would need to request 2^18 files before they can get some other random file stored in the system (again assuming 128 bit hash and 10^10 files, you'll have a lot less files and a longer hash). 2^18 is a pretty big number and the speed you can brute force this is limited by the network and the server. A simple lock the user out after x attempts policy can completely close this hole (which is why many systems implement this sort of policy). Building a secure system is complicated and there will be many points to consider but this sort of scheme can be perfectly secure.
Hope this is useful...
EDIT: another way to think about this is that practically every encryption or authentication system relies on certain events having very low probability for its security. e.g. I can be lucky and guess the prime factor on a 512 bit RSA key but it is so unlikely that the system is considered very secure.
Whilst the probability of a collision might be vanishingly small, imagine serving a highly confidential file from one customer to their competitor just because there happens to be a hash collision.
= end of business
I'd rather use hashing for things that were less critical when collisions DO occur ;-)
If you have a database, store the files under GUIDs - so not an incrementing index, but a proper globally unique identifier. They work nicely when it comes to distributed shards / high availability etc.
Imagine the worst case scenario and assume it will happen the week after you are featured in wired magazine as an amazing startup ... that's a good stress test for the algorithm.

Salting Your Password: Best Practices? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I've always been curious... Which is better when salting a password for hashing: prefix, or postfix? Why? Or does it matter, so long as you salt?
To explain: We all (hopefully) know by now that we should salt a password before we hash it for storage in the database [Edit: So you can avoid things like what happened to Jeff Atwood recently]. Typically this is done by concatenating the salt with the password before passing it through the hashing algorithm. But the examples vary... Some examples prepend the salt before the password. Some examples add the salt after the password. I've even seen some that try to put the salt in the middle.
So which is the better method, and why? Is there a method that decreases the chance of a hash collision? My Googling hasn't turned up a decent analysis on the subject.
Edit: Great answers folks! I'm sorry I could only pick one answer. :)
Prefix or suffix is irrelevant, it's only about adding some entropy and length to the password.
You should consider those three things:
The salt has to be different for every password you store. (This is quite a common misunderstanding.)
Use a cryptographically secure random number generator.
Choose a long enough salt. Think about the birthday problem.
There's an excellent answer by Dave Sherohman to another question why you should use randomly generated salts instead of a user's name (or other personal data). If you follow those suggestions, it really doesn't matter where you put your salt in.
I think it's all semantics. Putting it before or after doesn't matter except against a very specific threat model.
The fact that it's there is supposed to defeat rainbow tables.
The threat model I alluded to would be the scenario where the adversary can have rainbow tables of common salts appended/prepended to the password. (Say the NSA) You're guessing they either have it appended or prepended but not both. That's silly, and it's a poor guess.
It'd be better to assume that they have the capacity to store these rainbow tables, but not, say, tables with strange salts interspersed in the middle of the password. In that narrow case, I would conjecture that interspersed would be best.
Like I said. It's semantics. Pick a different salt per password, a long salt, and include odd characters in it like symbols and ASCII codes: ©¤¡
The real answer, which nobody seems to have touched upon, is that both are wrong. If you are implementing your own crypto, no matter how trivial a part you think you're doing, you are going to make mistakes.
HMAC is a better approach, but even then if you're using something like SHA-1, you've already picked an algorithm which is unsuitable for password hashing due to its design for speed. Use something like bcrypt or possibly scrypt and take the problem out of your hands entirely.
Oh, and don't even think about comparing the resulting hashes for equality with with your programming language or database string comparison utilities. Those compare character by character and short-circuit as false if a character differs. So now attackers can use statistical methods to try and work out what the hash is, a character at a time.
It shouldn't make any difference. The hash will be no more easily guessable wherever you put the salt. Hash collisions are both rare and unpredictable, by virtue of being intentionally non-linear. If it made a difference to the security, that would suggest a problem with the hashing, not the salting.
If using a cryptographically secure hash, it shouldn't matter whether you pre- or postfix; a point of hashing is that a single bit change in the source data (no matter where) should produce a different hash.
What is important, though, is using long salts, generating them with a proper cryptographic PRNG, and having per-user salts. Storing the per-user salts in your database is not a security issue, using a site-wide hash is.
First of all, the term "rainbow table" is consistently misused. A "rainbow" table is just a particular kind of lookup table, one that allows a particular kind of data compression on the keys. By trading computation for space, a lookup table that would take 1000 TB can be compressed a thousand times so that it can be stored on a smaller drive drive.
You should be worried about hash to password lookup tables, rainbow or otherwise.
#onebyone.livejournal.com:
The attacker has 'rainbow tables' consisting not of the hashes of dictionary words, but of the state of the hash computation just before finalising the hash calculation.
It could then be cheaper to brute-force a password file entry with postfix salt than prefix salt: for each dictionary word in turn you would load the state, add the salt bytes into the hash, and then finalise it. With prefixed salt there would be nothing in common between the calculations for each dictionary word.
For a simple hash function that scans linearly through the input string, such as a simple linear congruential generator, this is a practical attack. But a cryptographically secure hash function is deliberately designed to have multiple rounds, each of which uses all the bits of the input string, so that computing the internal state just prior to the addition of the salt is not meaningful after the first round. For example, SHA-1 has 80 rounds.
Moreover password hashing algorithms like PBKDF compose their hash function multiple times (it is recommended to iterate PBKDF-2 a minimum of 1000 times, each iteration applying SHA-1 twice) making this attack doubly impractical.
BCrypt hash if the platform has a provider. I love how you don't worry about creating the salts and you can make them even stronger if you want.
Inserting the salt an arbitrary number of characters into the password is the least expected case, and therefore the most "secure" socially, but it's really not very significant in the general case as long as you're using long, unique-per-password strings for salts.