How to get Perl crypt to encrypt more than 8 characters? - perl

Only the first 8 characters is encrypted when the Perl crypt function is used. Is there a way to get it to use more characters?
As an example:
$crypted_password = crypt ("PassWord", "SALT");
and
$crypted_password = crypt ("PassWord123", "SALT");
returns exactly the same result. $crypted_password has exactly the same value.
Would love to use crypt because it is a quick and easy solution to some none reversible encryption but this limit does not make it useful for anything serious.

To quote from the documentation:
Traditionally the result is a string of 13 bytes: two first bytes of the salt, followed by 11 bytes from the set [./0-9A-Za-z], and only the first eight bytes of PLAINTEXT mattered. But alternative hashing schemes (like MD5), higher level security schemes (like C2), and implementations on non-Unix platforms may produce different strings.
So the exact return value of crypt is system dependent, but it often uses an algorithm that only looks at the first 8 byte of the password. These two things combined make it a poor choice for portable password encryption. If you're using a system with a stronger encryption routine and don't try to check those passwords on incompatible systems, you're fine. But it sounds like you're using an OS with the old crappy DES routine.
So a better option is to use a module off of CPAN that does the encryption in a predictable, more secure way.
Some searching gives a few promising looking options (That I haven't used and can't recommend one over another; I just looked for promising keywords on metacpan):
Crypt::SaltedHash
Authen::Passphrase::SaltedDigest
Crypt::Bcrypt::Easy
Crypt::Password::Util

Related

Basics of MD5: How to know hash bit length and symmetry?

I'm curious about some basics of MD5 encryption I couldn't get from Google, Java questions here nor a dense law paper:
1-How to measure, in bytes, an MD5 hash string? And does it depends if the string is UNICODE or ANSI?
2-Is MD5 an assymetric algorythm?
Example: If my app talks (http) to a REST webservice using a key (MD5_128 hash string, ANSI made of 9 chars) to unencrypt received data, does that account for 9x8=72 bytes in an assymetric algorithm?
I'm using Windevs 25 in Windows, using functions like Encrypt and HashString, but I lack knowledge about encryption.
Edit: Not asnwered yet, but it seems like I need to know more about charsets before jumping to hashes and encryption. https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
An MD5 hash is 128 bits, 16 bytes. The result is binary, not text, so it is neither "ANSI" nor "Unicode". Like all hashes, it is asymmetric, which should be obvious from the fact that you can hash inputs which are longer than 128 bits. Since it is asymmetric, you cannot "unencrypt" (decrypt) it. This is by design and intentional.

How are SHA-3 variants named?

How should we succinctly refer to SHA-3 variants of specific width? The precedent set by SHA-2 naming is unfortunately ambiguous if applied to SHA-3. Specifically, we have SHA-0 and SHA-1 (160 bits), followed by SHA-2 (224, 256, 384, or 512 bits), where SHA-224, SHA-256, SHA-384, and SHA-512 refer to the SHA-2 variants. SHA-3 supports the same bit counts as SHA-2, but a different naming convention is needed to distinguish between SHA-2 and SHA-3. SHA-3-224, SHA-3-256, SHA-3-384, and SHA-3-512 seem reasonable (if clumsy), but I can find no established naming convention of any sort.
I believe they have been finalized as follows
"SHA3-224", "SHA3-256", "SHA3-386", "SHA3-512"
SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions
There is no convention yet. Even the standard itself is not published AFAIK.
I'd use SHA3-256 etc. (like MD6-256).
Same naming scheme is also used in BouncyCastle library.
As for SHA-3-256 and friends, I personally don't like the idea of using the same char - in algorithm name and as property separator. If you necessarily need to keep the dash in algorithm name, I'd go with SHA-3/256 -- similar scheme is used in cipher transformation naming in JCA.

How strong is this hashing technique?

Use AES/Rijndael or any symmetric encryption.
Encrypt the hidden value using itself as the key and a random IV.
Store the ciphertext + IV. Discard everything else.
To check the hash: try to decrypt using provided plaintext. If provided == decrypted, then it's OK.
Ignore ciphertext length problems.
Is this secure?
There is an existing method of generating a hash or MAC using an block cipher like AES. It's called CBC-MAC. It's operation is pretty simple. Just encrypt the data to be hashed using AES in CBC mode and output the last block of the ciphertext, discarding all prior blocks of the ciphertext. The IV for CBC would normally be left as zero, and the AES key can be used to produce a MAC.
CBC-MAC does have some limitations. Do not encrypt and MAC your data using the same key and IV, or the MAC will simply be equal to the last block of the ciphertext. Also, the size of the hash/MAC is limited to the size of block cipher. Using AES with CBC-MAC produces a 128 bit MAC, and MACs are usually expected to be at least this size.
Something worth noting is that CBC-MAC is a very inefficient way to produce a MAC. A better way to go would be to use SHA2-256 or SHA2-512 in HMAC. In my recent tests, using SHA256 in HMAC produces a result approximately as fast as AES in CBC-MAC, and the HMAC in this case is twice as wide. However, new CPUs will be produced with hardware acceleration for AES, allowing AES in CBC-MAC mode to be used to very quickly produce a 128 bit MAC.
As described, it has a problem in that it reveals information about the length of the data being hashed. That in itself would be some kind of weakness.
Secondly ... it is not clear that you would be able to check the hash. It would be necessary to store the randomly generated IV with the hash.
I was thinking about this while bicycling home, and one other possible issue came to mind. With a typical hashing scheme to store a password, it is best to run the hash a bunch of iterations (e.g., PBKDF2). This makes it much more expensive to run a brute force attack. One possibility to introduce that idea into your scheme might be to repeatedly loop over the encrypted data (e.g., feed back the encrypted block back into itself).

How do I properly implement Unicode passwords?

Adding support for Unicode passwords it an important feature that should not be ignored by developers.
Still, adding support for Unicode in passwords is a tricky job because the same text can be encoded in different ways in Unicode and you don't want to prevent people from logging in because of this.
Let's say that you'll store the passwords as UTF-8, and mind that this question is not related to Unicode encodings and it's related to Unicode normalization.
Now the question is how you should normalize the Unicode data?
You have to be sure that you'll be able to compare it. You need to be sure that when the next Unicode standard will be released it will not invalidate your password verification.
Note: still there are some places where Unicode passwords will probably never be used, but this question is not about why or when to use Unicode passwords, it is about how to implement them in the proper way.
1st update
Is it possible to implement this without using ICU, like using OS for normalizing?
A good start is to read Unicode TR 15: Unicode Normalization Forms. Then you realize that it is a lot of work and prone to strange errors - you probably already know this part since you are asking here. Finally, you download something like ICU and let it do it for you.
IIRC, it is a multistep process. First you decompose the sequence until you cannot further decompose - for example é would become e + ´. Then you reorder the sequences into a well-defined ordering. Finally, you can encode the resulting byte stream using UTF-8 or something similar. The UTF-8 byte stream can be fed into the cryptographic hash algorithm of your choice and stored in a persistent store. When you want to check if a password matches, perform the same procedure and compare the output of the hash algorithm with what is stored in the database.
A question back to you- can you explain why you added "without using ICU"? I see a lot of questions asking for things that ICU does (we* think) pretty well, but "without using ICU". Just curious.
Secondly, you may be interested in StringPrep/NamePrep and not just normalization: StringPrep - to map strings for comparison.
Thirdly, you may be intererested in UTR#36 and UTR#39 for other Unicode security implications.
*(disclosure: ICU developer :)

What text encoding scheme do you use when you have binary data that you need to send over an ascii channel?

If you have binary data that you need to encode, what encoding scheme do you use?
I know about:
Hex encoding. Very simple, but quite verbose, expands one byte to two.
Base 64. Most common, not so verbose, expands three bytes to four.
Base 85. Not common, less verbose again, expands four bytes to five.
Are there any other encoding schemes in common use? If so, what are there advantages and disadvantages?
Edit: This is useful, for example, when trying to store arbitrary data in a cookie. Cookies can only store text, not arbitrary data, so you need to convert it in some way, preferably with a way to convert it back. Further, assume that you are using a stateless server so that you cannot save the state on the server and just put an identifier into the cookie. Of course, if you do this you would also need some way of verifying that what the user is passing back to you is what you passed to the user, for example a signature.
Also, since the current consensus is that you should use base64 since it is widespread, I will also point out that this is what I use... I am just curious if anyone used anything else, and if so, why.
Edit: Just in case someone stumbles across this, if you do want to use Base64 to store data in a cookie, you need to use a modified Base64 implementation. See this answer for the reason why.
For encoding cookie values, you need to be careful. See this older answer:
With Version 0 cookies, values should
not contain white space, brackets,
parentheses, equals signs, commas,
double quotes, slashes, question
marks, at signs, colons, and
semicolons. Empty values may not
behave the same way on all browsers.
Base64 encoding can generate = symbols for certain inputs, and this technically is not permitted in cookies (version 0 cookies, anyway, which are the most widely supported). In practice, I suspect the = will actually work fine, but maybe not.
I would suggest that to be absolutely sure that your encoded binary is cookie-compatible, then basic hex encoding is safest (e.g. in java).
edit: As #Paul helpfully pointed out, there is a modified version of Base 64 that is "URL safe" (and, I assume, "cookie safe"). Using a modified version of a standard algorithm rather dilutes its charm, mind you.
edit: #shoosh pointed out that the = is only used to denote the end of the base64 string, so you could trim the =, set the cookie, then reattach the = again when you need to decode it.
Base64 wins because it's so common that I don't have to ever worry about rolling my own encoder/decoder. I haven't run into any applications where I've been worried about saving bandwidth or filespace in encoded binary data.
Once upon a time, there was UTF-7. It's officially deprecated, but it still works as an ACE (ASCII Compatible Encoding). Now there's IDN.
uuencode is popular is some circles
HTML and XML encode unicode using this syntax
Base64 is the de-facto standard. Using anything else is asking for trouble.