Why is MD5 hashing so hard and in Swift 3? - swift

Ok, so every now and then you come across problems that you've solved before using various frameworks and libraries and whatnot found on the internet and your problem is solved relatively quick and easy and you also learn why your problem was a problem in the first place.
However, sometimes you come across problems that make absolute 0 sense, and even worse when the solutions make negative sense.
My problem is that I want to take Data and make an MD5 hash out of it.
I find all kinds of solutions but none of them work.
What's really bugging me out actually is how unnecessarily complicated the solutions seem to be for a trivial task as getting an MD5 hash out of anything.
I am trying to use the Crypto and CommonCrypto frameworks by Soffes and they seem fairly easy, right? Right?
Yes!
But why am I still getting the error fatal error: unexpectedly found nil while unwrapping an Optional value?
From what I understand, the data served by myData.md5 in the extension of Crypto by Soffes seem to be "optional". But why?
The code I am trying to execute is:
print(" md5 result: " + String(data: myData.md5, encoding: .utf8)!)
where myData has data in it 100% because after the above line of code, I send that data to a server, and the data exists.
On top of that, printing the count of myData.md5.count by print(String(myData.md5.count)) works perfectly.
So my question is basically: How do I MD5 hash a Data and print it as a string?
Edit:
What I have tried
That works
MD5:ing the string test in a PHP script gives me 098f6bcd4621d373cade4e832627b4f6
and the Swift code "test".md5() also gives me 098f6bcd4621d373cade4e832627b4f6
That doesn't work
Converting the UInt8 byte array from Data.md5() to a string that represents the correct MD5 value.
The different tests I've done are the following:
var hash = ""
for byte in myData.data.md5() {
hash += String(format: "%02x", byte)
}
print("loop = " + hash) //test 1
print("myData.md5().toHexString() = " + myData.md5().toHexString()) //test 2
print("CryptoSwift.Digest.md5([UInt8](myData)) = " + CryptoSwift.Digest.md5([UInt8](myData)).toHexString()) //test 3
All three tests with the 500 byte test data give me the MD5 value 56f6955d148ad6b6abbc9088b4ae334d
while my PHP script gives me 6081d190b3ec6de47a74d34f6316ac6b
Test Sample (64 bytes):
Raw data:
FFD8FFE0 00104A46 49460001 01010048 00480000 FFE13572 45786966 00004D4D
002A0000 0008000B 01060003 00000001 00020000 010F0002 00000012 00000092
Test 1, 2 and 3 MD5: 7f0a012239d9fde5a46071640d2d8c83
PHP MD5: 06eb0c71d8839a4ac91ee42c129b8ba3
PHP Code: echo md5($_FILES["file"]["tmp_name"])

The simple answer to your question is:
String(data: someData, encoding: .utf8)
returns nil if someData is not properly UTF8 encoded data. If you try to unwrap nil like this:
String(data: someDate, encoding: .utf8)!
you get:
fatal error: unexpectedly found nil while unwrapping an Optional value
So at it's core, it's got nothing to do with hashing or crypto.
Both the input and the output of MD5 (or any hash algorithm for that matter) are binary data (and not text or strings). So the output of MD5 is not UTF8 encoded data. Thus why the above String initializer always failed.
If you want to display binary data in your console, you need to convert it to a readable representation. The most common ones are hexadecimal digits or Base 64 encoding.
Note: Some crypto libraries allow you to feed string into their hash functions. They will silently convert the string to a binary representation using some character encoding. If the encodings do not match, the hash values do not match across systems and programming languages. So you better try to understand why they really do in the background.

I use a library called 'CryptoSwift' for creating hashes, as well as encrypting data before I send it/store it. It's very easy to use.
It can be found here https://github.com/krzyzanowskim/CryptoSwift and you can even install it with CocoaPods by adding pod 'CryptoSwift' to your podfile.
Once installed, hashing a Data object is as simple as calling Data.md5()! It really is that easy. It also supports other hashing algorithms such as SHA.
You can then just print the MD5 object and CryptoSwift will convert it to a String for you.
The full docs on creating digests can be found here: https://github.com/krzyzanowskim/CryptoSwift#calculate-digest

Thanks to Jacob King I tried a much simpler MD5 framework called CryptoSwift.
The user Codo inspired me to look deeper in to my PHP script as he suggested that I am not in fact hashing the content of my data, but instead the filename, which is correct.
The original question however was not about which framework to use or suggestions to as why my app and my PHP script return different MD5 values.
The question was originally about why I get the error
fatal error: unexpectedly found nil while unwrapping an Optional value
at the line of code saying
print(" md5 result: " + String(data: myData.md5, encoding: .utf8)!)
So the answer to that is that I should not try to convert the 16 bytes data output of the MD5() function, but instead call a subfunction of MD5() called toHexString().
So the proper line of code should look like the following:
print("md5 result: " + myData.md5().toHexString())
BONUS
My PHP script now contains the following code:
move_uploaded_file($_FILES["file"]["tmp_name"], $target_dir); //save data to disk
$md5_of_data = md5_file ($target_dir); //get MD5 of saved data
BONUS-BONUS
The problem and solution is part of a small framework called AssetManager that I'm working on, which can be found here: https://github.com/aidv/AssetManager

Related

Why does Blockcypher signer tool return some extra characters than bip32 dart package?

I'm trying to sign a transaction skeleton Blockcypher returns, in order to send it along, following https://www.blockcypher.com/dev/bitcoin/#creating-transactions.
For this example, I'll use the completely-unsafe 'raw raw raw raw raw raw raw raw raw raw raw raw' mnemonic, which using dart bip32 package creates a BIP32 with private key 0x05a2716a8eb37eb2aaa72594573165349498aa6ca20c71346fb15d82c0cbbf7c and address mpQfiFFq7SHvzS9ebxMRGVohwHTRJJf9ra for BTC testnet.
Blockcypher Tx Skeleton tosign is 1cbbb4d229dcafe6dc3363daab8de99d6d38b043ce62b7129a8236e40053383e.
Using Blockcypher signer tool:
$ ./signer 1cbbb4d229dcafe6dc3363daab8de99d6d38b043ce62b7129a8236e40053383e 05a2716a8eb37eb2aaa72594573165349498aa6ca20c71346fb15d82c0cbbf7c
304402202711792b72547d2a1730a319bd219854f0892451b8bc2ab8c17ec0c6cba4ecc4022058f675ca0af3db455913e59dadc7c5e0bd0bf1b8ef8c13e830a627a18ac375ab
On the other hand, using bip32 I get:
String toSign = txSkel['tosign'][0];
var uToSign = crypto.hexToBytes(toSign);
var signed = fromNode.sign(uToSign);
var signedHex = bufferToHex(signed);
var signedHexNo0x = signedHex.substring(2);
where fromNode is the bip32.BIP32 node. Output is signedHexNo0x = 2711792b72547d2a1730a319bd219854f0892451b8bc2ab8c17ec0c6cba4ecc458f675ca0af3db455913e59dadc7c5e0bd0bf1b8ef8c13e830a627a18ac375ab.
At first sight, they seem completely different buffers, but after a detailed look, Blockcypher signer output only has some extra characters than that of bip32:
Blockcypher signer output (I split it into several lines for you to see it clearly):
30440220
2711792b72547d2a1730a319bd219854f0892451b8bc2ab8c17ec0c6cba4ecc4
0220
58f675ca0af3db455913e59dadc7c5e0bd0bf1b8ef8c13e830a627a18ac375ab
bip32 output (also intentionally split):
2711792b72547d2a1730a319bd219854f0892451b8bc2ab8c17ec0c6cba4ecc4
58f675ca0af3db455913e59dadc7c5e0bd0bf1b8ef8c13e830a627a18ac375ab
I'd expect two 64-character numbers to give a 128-characters signature, which bip32 output accomplishes. Hence, Blockcypher signer output has 140 characters, i.e. 12 more than the former, which is clear when seen as split into lines as above.
I'd be really thankful to anyone throwing some light on this issue, which I need to understand and correct. I need to implement the solution in dart, I cannot use the signer script other than for testing.
The dart bip32 package doesn't seem to encode the signature in DER format, but rather in a simple (r, s) encoding. However DER is required for Bitcoin. For more information see:
https://bitcoin.stackexchange.com/questions/92680/what-are-the-der-signature-and-sec-format
You can either add the DER extra bytes yourself according to your r and s or check if there's a DER encoding in the dart bip32 library.

Readin files with enconding OEM 850 in Swift

I have to read text files in Swift/Cocoa, which are encoded as OEM 850. Does anybody know how to do this?
You can first read the file in as raw data and then convert that data to a string value according to your encoding. A small wrinkle in your case:
There are two types which represent the known string encodings, NSStringEncoding (String.Encoding in Swift) and CFStringEncoding. Apple only directly defines a subset of the known encodings as NSStringEncoding/String.Encoding values. The remaining known encodings have CFStringEncoding values and the function CFStringConvertEncodingToNSStringEncoding() is provided to map these to NSStringEncoding. Unfortunately for you OEM 850 is only directly provided by CFStringEncoding...
That sounds worse than it is. In Objective-C you can get the encoding you require using:
NSStringEncoding dosLatin1 = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingDOSLatin1);
Note: “DOS Latin 1” is one of the names for the same coding “OEM 850” refers to (see Wikipedia for a list) and is the one Apple chose hence the kCFStringEncodingDOSLatin1.
In Swift this is messier:
let dosLatin1 = String.Encoding(rawValue: CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(CFStringEncodings.dosLatin1.rawValue)))
Once you have the encoding the rest is straightforward, without error checking an outline is:
let sourceURL = ...
let rawData = try? Data(contentsOf: sourceURL)
let convertedString = String(data: rawData, encoding: dosLatin1)
In real code you must check that the file read and conversion are succesful. Reading raw data from a URL in Swift will throw if the read fails, converting the data to a string produces an optional (String?) as the conversion may fail.
HTH

NSData encoding

Currently, I'm trying to parse an NSData in my iOS app. Problem is, I can't seem to find a proper hebrew encoding for parsing. I must decode the data using the Windows-1255 encoding (hebrew encoding type for windows) or ISO 8859-8 encoding, or I'll get plain gibberish. The closest I've got to solving the issue was using
CFStringConvertEncodingToNSStringEncoding(CFStringEncodings.ISOLatinHebrew)
yet it throws 'CFStringEncodings' is not convertible to 'CFStringEncoding' (notice Encodings vs Encoding).
What can I do in order to encode the data correctly?
Thanks!
The problem is that CFStringEncodings is an enumeration based on CFIndex
(which in turn is a type alias for Int), whereas CFStringEncoding is a type
alias for UInt32. Therefore you have to convert the .ISOLatinHebrew
value explicitly to a CFStringEncoding:
let cfEnc = CFStringEncodings.ISOLatinHebrew
let enc = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(cfEnc.rawValue))
Turns out I needed to get my hands a bit dirty.
I saw that CFStringEncodings has a relation to the file CFStringEncodingsExt.h, so I searched the file for some help. Suddenly I came across a huge CF_ENUM that included exactly what I needed- all of the CFStringEncodings by their UInt32 value!
So it has turned out that kCFStringEncodingISOLatinHebrew = 0x0208, /* ISO 8859-8 */
I encourage everyone who is facing this encoding issue to go to that file and search for his needed encoding.

Coldfusion Hash SHA-1 Doesnt look the same as the sample

Im working on a script to hash a "fingerprint" for communicating with the secure Pay Direct Post API.
The issue I have is im trying to create a SHA-1 String that matches the sample code provided so that i can ensure things get posted accurately.
the example Sha-1 string appears encoded like
01a1edbb159aa01b99740508d79620251c2f871d
However my string when converted appears as
7871D5C9A366339DA848FC64CB32F6A9AD8FCADD
completely different...
my code for this is as follows..
<cfset variables.finger_print = "ABC0010|txnpassword|0|Test Reference|1.00|20110616221931">
<cfset variables.finger_print = hash(variables.finger_print,'SHA-1')>
<cfoutput>
#variables.finger_print#
</cfoutput>
Im using Coldfusion 8 to do this
it generates a 40 character hash, but i can see its generating completely different strings.
Hopefully someone out there has done this before and can point me in the right direction...
thanks in advance
** EDIT
The article for creating the Hash only contains the following information.
Example: Setting the fingerprint Fields joined with a | separator:
ABC0010|txnpassword|0|Test Reference|1.00|20110616221931
SHA1 the above string: 01a1edbb159aa01b99740508d79620251c2f871d
When generating the above example string using coldfusion hash it turns it into this
7871D5C9A366339DA848FC64CB32F6A9AD8FCADD
01a1edbb159aa01b99740508d79620251c2f871d
Sorry, but I do not see how the sample string could possibly produce that result given that php, CF and java all say otherwise. I suspect an error in the documentation. The one thing that stands out is the use of "txnpassword" instead of a sample value, like with the other fields. Perhaps they used a different value to produce the string and forgot to plug it into the actual example?
Update:
Example 5.2.1.12, on page 27, makes more sense. Ignoring case, the results from ColdFusion match exactly. I noticed the description also mentions something about a summarycode value, which is absent from the example in section 3.3.6. So that tends to support the theory of documentation error with the earlier example.
Code:
<cfset input = "ABC0010|mytxnpasswd|MyReference|1000|201105231545|1">
<cfoutput>#hash(input, "sha-1")#</cfoutput>
Result:
3F97240C9607E86F87C405AF340608828D331E10

MD5 file hash for the same unchanged file is different each time C#

Good evening all,
I've been working on an MD5 tool in C# that takes a file, goes through my Hasher class and pops the result in a database, along with the filename and directory.
The issue I'm having is that each time I run the test, the MD5 result for the same identical file i.e. unchanged in any way is completely different.
Below is the code I use
HashAlgorithm hmacMd5 = new HMACMD5();
byte[] hash;
try
{
using (Stream fileStream = new FileStream(fileLocation, FileMode.Open))
{
using (Stream bufferedStream = new BufferedStream(fileStream, 5600000))
{
hash = hmacMd5.ComputeHash(bufferedStream);
foreach (byte x in hash)
{
md5Result += x;
}
}
}
}
catch (UnauthorizedAccessException uae) { }
return md5Result;
Here are the results for 3 seperate runs of hello.mp2:
1401401571161052548110297623915056204169177
16724366215610475211823021169211793421
56154777074212779619017828183239971
Quite puzzling.
My only rational thought as to why I'm getting these results is with concatenating byte to string.
Can anyone spot an issue here?
Regards,
Ric
You should use System.Security.Cryptography.MD5 rather.
HMACMD5 doesn't compute a hash, it computes a message authentication code.
HMACMD5 is a type of keyed hash
algorithm that is constructed from the
MD5 hash function and used as a
Hash-based Message Authentication Code
(HMAC). The HMAC process mixes a
secret key with the message data,
hashes the result with the hash
function, mixes that hash value with
the secret key again, then applies the
hash function a second time. The
output hash will be 128 bits in length
Since you're not supplying the HMAC key, one is being generated randomly for you on your behalf and causing you to see different results.
My suggestion is that you are not computing MD5 hashes since MD5 produces a fixed length output of 32 hex numbers
Also, since you don't see any numbers from 0xA to 0xF is quite puzzling
You might wanna check a "real" result with online MD5 calculators such as this one
You shouldn't have a bufferedStream in between. I would guess a different number of bytes is buffered in each run.