Do we use hash functions for anything 'critical' under the assumption that collisions will never occur?

Do we use hash functions for anything 'critical' under the assumption that collisions will never occur? - hash

I'm currently learning the basics of cryptography and started to wonder. I understand that if an attacker wanted to 'pretend to be you' they could theoretically find a collision for your password or whatever it may be that identifies you, then authenticate themselves with that hash value.
Are there any other less obvious uses for hash functions perhaps aside from information security where in the almost impossible off chance that a collision occurs something rather strange would happen? Or in fact are there any real world examples of when this has happened?
I wonder because from what I understand if we use a strong enough hash function we pretty much assume that a collision will certainly not happen... but what if it did? Do we ever use hash functions for anything 'critical'?
edit: This is purely a speculative question.

The amount of try would be so huge (as the associated time to process) that the login by an unknown user is unlikely probable.
In order to prevent that king of attack, you can put some security like interval between 3 false tries. That done, the time needed to process the entire attack with a result would be too long for the attacker.
See http://en.wikipedia.org/wiki/Brute-force_attack.
The hashing method can also be used to create CheckSum, see http://en.wikipedia.org/wiki/Checksum.

Hash functions are also used to digitally sign documents. Assume you have got a document like a PDF and you want your boss to sign it. You would send it to him, he would sign it and you could send it over to someone else in his name.
Or you could prepare a special document that abuses a hash collision. You do not necessarily need a full hash collision. MD5 for example applies block by block of plain text to the hash. If you can find a collision for this single block you won.
Lets say "asdasd" causes a collision with "qweqwe".
You can create a PDF like so:
Headline
if("asdasd" == "asdasd")
Good text...
else
Evil text...
Your boss will see "Good text...". After you have his digital signature for this document you replace one "asdasd" by "qweqwe".
Headline
if("asdasd" == "qweqwe")
Good text...
else
Evil text...
Now you can send the evil PDF with a valid signature.
It is not as easy as I described but you get the idea I think.

Related

Why is ".map" slower then "while/for loop" in Dart(Flutter)

I saw this article:
https://itnext.io/comparing-darts-loops-which-is-the-fastest-731a03ad42a2
It says that ".map" is slow with benchmark result
But I don't understand why slower than while/for loop
How does it work in low level?
I think it's because .map is called an unnamed method like this (_){ }
Can you explain that in detail?

Its because mapping an array will create a copy of each value than modify the original array.
Since a while/for loop does not copy the values but rather just accesses them using their index, it is a lot faster.

Can you explain that in detail?
It's like saying "I don't understand why hitchhiking on the back of a construction truck is so much slower than taking the high speed train to my destination".
The only detail that is important is that map is not a loop. map() internally probably uses a loop of some kind.
This person is misusing a method call that is meant for something else, just because a side-effect of that call when combining it with a call materializing the iterable, like toList(), is that it loops through the iterable given. It doesn't even have the side effect on it's own.
Stop reading "tutorials" or "tips" of people misusing language features. map() is not a loop. If you need a loop, use a loop. The same goes for the ternary operator. It's not an if, if you need an if, use it.
Use language features for what they are meant, stop misusing language features because their side-effect does what you want and then wondering why they don't work as well as the feature actually meant for it.
Sorry if this seems a bit ranty, but I have seen countless examples by now. I don't know where it comes from. My personal guess is "internet tutorials". Because everybody can write one. Please don't read them. Read a good book. It was written by professionals, proofread, edited, and checked. Internet tutorials are free, written by random people and about worth as much as they cost.

Is it possible to safely validate offline license keys clientside?

Is it possible to validate license keys on a client application in such a way that it becomes very difficult to crack?
Consider the following simple example:
var status = secure_function_that_checks_license();
if (status == "REGISTERED")
print("Welcome, user");
else
print("Access denied");
No matter how elaborate your function is, in the end you will always have to branch based on the result it gives.
This thread explains a bit more about generating and verifying keys but doesn't explain how to avoid the branching problem.
Is the only way to do this in a secure way to use some sort of online activation scheme?

First if all remember that there is no prevention when it comes to cracking there is just stalling, if you code is worth cracking it will be cracked
Now In the obfuscation process there is a practice named inlining, it simply replaces your function call with the actual function body, this way your code will be harder to crack since there are much more code to modify

Is this encryption method secure?

I developed an application in C++ using Crypto++ to encrypt information and store the file in the hard drive. I use an integrity string to check if the password entered by the user is correct. Can you please tell me if the implementation generates a secure file? I am new to the world of the cryptography and I made this program with what I read.
string integrity = "ImGood"
string plaintext = integrity + string("some text");
byte password[pswd.length()]; // The password is filled somewhere else
byte salt[SALT_SIZE]; // SALT_SIZE is 32
byte key[CryptoPP::AES::MAX_KEYLENGTH];
byte iv[CryptoPP::AES::BLOCKSIZE];
CryptoPP::AutoSeededRandomPool rnd;
rnd.GenerateBlock(iv, CryptoPP::AES::BLOCKSIZE);
rnd.GenerateBlock(salt, SALT_SIZE);
CryptoPP::PKCS5_PBKDF2_HMAC<CryptoPP::SHA512> gen;
gen.DeriveKey(key, CryptoPP::AES::MAX_KEYLENGTH, 32,
password, pswd.length(),
salt, SALT_SIZE,
256);
CryptoPP::StringSink* sink = new CryptoPP::StringSink(cipher);
CryptoPP::Base64Encoder* base64_enc = new CryptoPP::Base64Encoder(sink);
CryptoPP::CFB_Mode<CryptoPP::AES>::Encryption cfb_encryption(key, CryptoPP::AES::MAX_KEYLENGTH, iv);
CryptoPP::StreamTransformationFilter* aes_enc = new CryptoPP::StreamTransformationFilter(cfb_encryption, base64_enc);
CryptoPP::StringSource source(plaintext, true, aes_enc);
sstream out;
out << iv << salt << cipher;
The information in the string stream "out" is then written to a file. Another thing is that I don't know what the "purpose" parameter in the derivation function means, I'm guessing it is the desired length of the key so I put 32, but I'm not sure and I can't find anything about it in the Crypto++ manual.
Any opinion, suggestion or mistake pointed out is appreciated.
Thank you very much in advance.

A file can be "secure" only if you define what you mean by "secure".
Usually, you will be interested in two properties:
Confidentiality: the data that is encrypted shall remain unreadable to attackers; revealing the plaintext data requires knowledge of a specific secret.
Integrity: any alteration of the data should be reliably detected; attackers shall not be able to modify the data in any way (even "blindly") without the modification being noticed by whoever decrypts the data.
Your piece of code, apparently, fulfils confidentiality (to some extent) but not integrity. Your string called "integrity" is a misnomer: it is not an integrity check. Its role is apparently to detect accidental password mistakes, not attacks; thus, it would be less confusing if that string was called passwordVerifier instead. An attacker can alter any bit beyond the first 48 bits without the decryption process noticing anything.
Adding integrity (the genuine thing) requires the use of a MAC. Combining encryption and a MAC securely is subject to subtleties; therefore, it is recommended to use for encryption and MAC an authenticated encryption mode which does both, and does so securely (i.e. that specific combination was explicitly reviewed by hordes of cryptographers). Usual recommended AE modes include GCM and EAX.
An important point to note is that, in a context where integrity matters, data cannot be processed before having been verified. This has implications for big files: if your huge file is adorned with a single MAC (whether "manually" or as part of an AE mode), then you must first verify the complete file before beginning to do anything with the plaintext data. This does not work well with streamed processing (e.g. if playing a huge video). A workaround is to split the data into individual chunks, each with its own MAC, but then some care must be taken about the ordering of chunks (attackers could try to remove, duplicate or reorder chunks): things become complex. Complexity, on a general basis, is bad for security.
There are contexts where integrity does not matter. For instance, if your attack model is "the attacker steals the laptop", then you only have to care about confidentiality. However, if the attack model is "the attacker steals the laptop, modifies a few files, and puts it back in my suitcase without me noticing", then integrity matters: the attacker could plant a modification in the file, and infer parts of the secret itself based on your external behaviour when you next access the file.
For confidentiality, you use CFB, which is a bit old-style, but not wrong. For the password-to-key transform, you use PBKDF2, which is fine; the iteration count, though, is quite low: you use 256. Typical values are 20000 or more. The theory is that you should make actual performance measures to set this count to as high a value as you can tolerate: a higher value means slower processing, both for you and for the attacker, so you ought to crank that up (depending on your patience).
Mandatory warning: you are in the process of defining your own crypto, which is a path fraught with perils. Most people who do that produce weak systems, and that includes trained cryptographers; in fact, being a trained cryptographer does not mean that you know how to define a secure protocol, but rather that you know better than defining your own protocol. You are thus highly encouraged to rely on an existing protocol (or format), rather than making your own. I suggest OpenPGP, with (for instance) GnuPG as support library. Even if you need for some reason (e.g. licence issues) to reimplement the format, using a standard format is still a good idea: it avoids introducing weaknesses, it promotes interoperability, and it can be tested against existing systems.

Is input validation necessary?

This is a very naive question about input validation in general.
I learned about input validation techniques such as parse and validatestring. In fact, MATLAB built-in functions are full of those validations and parsers. So, I naturally thought this is the professional way of code development. With these techniques, you can be sure of data format of input variables. Otherwise your codes will reject the inputs and return an error.
However, some people argue that if there is a problem in input variable, codes will cause errors and stop. You'll notice the problem anyway, and then what's the point of those complicated validations? Given that codes for validation itself take some efforts and time, often with quite complicated flow controls, I had to admit this opinion has its point. With massive input validations, readability of codes may be compromised.
I would like hear about opinions from advanced users on this issue.

Here is my experience, I hope it matches best practice.
First of all, let me mention that I typically work in situations where I have full control, and won't build my own UI as #tom mentioned. In general, if there is at any point a large probability that your program gets junk inputs it will be worth checking for them.
Some tradeoffs that I typically make to decide whether I should check my inputs:
Development time vs debug time
If erronious inputs are hard to debug (for example because they don't cause errors but just undesirable outcomes) the balance will typically be in favor of checking, otherwise not.
If you are not sure where you will end up (re)using the code, it may help to enforce any assumptions that are required on the input.
Development time vs runtime experience
If your code takes an hour to run, and will break in the end when an invalid input value occurs, you would want to check of this at the beginning of the code
If the code runs into an error whilst opening a file, the user may not understand immediately, if you mention that no valid filename is specified this may be easier to deal with.

The really (really) short story:
Break your design down into user interface, business logic and data - (see MVC pattern)
In your UI layer, do "common sense" validation, e.g. if the input is a $ cost value then it should be >= 0, be able to be parsed into a decimal etc.
In your business logic layer, validate the value, e.g. the $ cost value might not be allowed to be greater than the profit margin (etc.)
In your data layer, validate the data operation, e.g. that insert operation succeeded
The extra really short story: YES! Validate all inputs.
For extra reading credits see: this!

Which is better in PHP: suppress warnings with '#' or run extra checks with isset()?

For example, if I implement some simple object caching, which method is faster?
1. return isset($cache[$cls]) ? $cache[$cls] : $cache[$cls] = new $cls;
2. return #$cache[$cls] ?: $cache[$cls] = new $cls;
I read somewhere # takes significant time to execute (and I wonder why), especially when warnings/notices are actually being issued and suppressed. isset() on the other hand means an extra hash lookup. So which is better and why?
I do want to keep E_NOTICE on globally, both on dev and production servers.

I wouldn't worry about which method is FASTER. That is a micro-optimization. I would worry more about which is more readable code and better coding practice.
I would certainly prefer your first option over the second, as your intent is much clearer. Also, best to keep away edge condition problems by always explicitly testing variables to make sure you are getting what you are expecting to get. For example, what if the class stored in $cache[$cls] is not of type $cls?
Personally, if I typically would not expect the index on $cache to be unset, then I would also put error handling in there rather than using ternary operations. If I could reasonably expect that that index would be unset on a regular basis, then I would make class $cls behave as a singleton and have your code be something like
return $cls::get_instance();

The isset() approach is better. It is code that explicitly states the index may be undefined. Suppressing the error is sloppy coding.
According to this article 10 Performance Tips to Speed Up PHP, warnings take additional execution time and also claims the # operator is "expensive."
Cleaning up warnings and errors beforehand can also keep you from
using # error suppression, which is expensive.
Additionally, the # will not suppress the errors with respect to custom error handlers:
http://www.php.net/manual/en/language.operators.errorcontrol.php
If you have set a custom error handler function with
set_error_handler() then it will still get called, but this custom
error handler can (and should) call error_reporting() which will
return 0 when the call that triggered the error was preceded by an #.
If the track_errors feature is enabled, any error message generated by
the expression will be saved in the variable $php_errormsg. This
variable will be overwritten on each error, so check early if you want
to use it.

# temporarily changes the error_reporting state, that's why it is said to take time.
If you expect a certain value, the first thing to do to validate it, is to check that it is defined. If you have notices, it's probably because you're missing something. Using isset() is, in my opinion, a good practice.

I ran timing tests for both cases, using hash keys of various lengths, also using various hit/miss ratios for the hash table, plus with and without E_NOTICE.
The results were: with error_reporting(E_ALL) the isset() variant was faster than the # by some 20-30%. Platform used: command line PHP 5.4.7 on OS X 10.8.
However, with error_reporting(E_ALL & ~E_NOTICE) the difference was within 1-2% for short hash keys, and up 10% for longer ones (16 chars).
Note that the first variant executes 2 hash table lookups, whereas the variant with # does only one lookup.
Thus, # is inferior in all scenarios and I wonder if there are any plans to optimize it.

I think you have your priorities a little mixed up here.
First of all, if you want to get a real world test of which is faster - load test them. As stated though suppressing will probably be slower.
The problem here is if you have performance issues with regular code, you should be upgrading your hardware, or optimize the grand logic of your code rather than preventing proper execution and error checking.
Suppressing errors to steal the tiniest fraction of a speed gain won't do you any favours in the long run. Especially if you think that this error may keep happening time and time again, and cause your app to run more slowly than if the error was caught and fixed.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse