Currently, I'm trying to parse an NSData in my iOS app. Problem is, I can't seem to find a proper hebrew encoding for parsing. I must decode the data using the Windows-1255 encoding (hebrew encoding type for windows) or ISO 8859-8 encoding, or I'll get plain gibberish. The closest I've got to solving the issue was using
CFStringConvertEncodingToNSStringEncoding(CFStringEncodings.ISOLatinHebrew)
yet it throws 'CFStringEncodings' is not convertible to 'CFStringEncoding' (notice Encodings vs Encoding).
What can I do in order to encode the data correctly?
Thanks!
The problem is that CFStringEncodings is an enumeration based on CFIndex
(which in turn is a type alias for Int), whereas CFStringEncoding is a type
alias for UInt32. Therefore you have to convert the .ISOLatinHebrew
value explicitly to a CFStringEncoding:
let cfEnc = CFStringEncodings.ISOLatinHebrew
let enc = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(cfEnc.rawValue))
Turns out I needed to get my hands a bit dirty.
I saw that CFStringEncodings has a relation to the file CFStringEncodingsExt.h, so I searched the file for some help. Suddenly I came across a huge CF_ENUM that included exactly what I needed- all of the CFStringEncodings by their UInt32 value!
So it has turned out that kCFStringEncodingISOLatinHebrew = 0x0208, /* ISO 8859-8 */
I encourage everyone who is facing this encoding issue to go to that file and search for his needed encoding.
Related
I am trying to solve the problem with chars encoding in api results. I have tried already to convince API provide to add utf-8 definition to header, but without success. So I need to convert result by myself in dart/flutter.
Tried already many things with Utf8Decoder or utf8.encode/decode, but without proper result.
Below example is in Polish. What I should have is:
Czesi walczą z pożarem
What I have in API results is:
Czesi walczÄ z pożarem
How can I decode the string to proper format?
Thanks!
I have to read text files in Swift/Cocoa, which are encoded as OEM 850. Does anybody know how to do this?
You can first read the file in as raw data and then convert that data to a string value according to your encoding. A small wrinkle in your case:
There are two types which represent the known string encodings, NSStringEncoding (String.Encoding in Swift) and CFStringEncoding. Apple only directly defines a subset of the known encodings as NSStringEncoding/String.Encoding values. The remaining known encodings have CFStringEncoding values and the function CFStringConvertEncodingToNSStringEncoding() is provided to map these to NSStringEncoding. Unfortunately for you OEM 850 is only directly provided by CFStringEncoding...
That sounds worse than it is. In Objective-C you can get the encoding you require using:
NSStringEncoding dosLatin1 = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingDOSLatin1);
Note: “DOS Latin 1” is one of the names for the same coding “OEM 850” refers to (see Wikipedia for a list) and is the one Apple chose hence the kCFStringEncodingDOSLatin1.
In Swift this is messier:
let dosLatin1 = String.Encoding(rawValue: CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(CFStringEncodings.dosLatin1.rawValue)))
Once you have the encoding the rest is straightforward, without error checking an outline is:
let sourceURL = ...
let rawData = try? Data(contentsOf: sourceURL)
let convertedString = String(data: rawData, encoding: dosLatin1)
In real code you must check that the file read and conversion are succesful. Reading raw data from a URL in Swift will throw if the read fails, converting the data to a string produces an optional (String?) as the conversion may fail.
HTH
Ok, so every now and then you come across problems that you've solved before using various frameworks and libraries and whatnot found on the internet and your problem is solved relatively quick and easy and you also learn why your problem was a problem in the first place.
However, sometimes you come across problems that make absolute 0 sense, and even worse when the solutions make negative sense.
My problem is that I want to take Data and make an MD5 hash out of it.
I find all kinds of solutions but none of them work.
What's really bugging me out actually is how unnecessarily complicated the solutions seem to be for a trivial task as getting an MD5 hash out of anything.
I am trying to use the Crypto and CommonCrypto frameworks by Soffes and they seem fairly easy, right? Right?
Yes!
But why am I still getting the error fatal error: unexpectedly found nil while unwrapping an Optional value?
From what I understand, the data served by myData.md5 in the extension of Crypto by Soffes seem to be "optional". But why?
The code I am trying to execute is:
print(" md5 result: " + String(data: myData.md5, encoding: .utf8)!)
where myData has data in it 100% because after the above line of code, I send that data to a server, and the data exists.
On top of that, printing the count of myData.md5.count by print(String(myData.md5.count)) works perfectly.
So my question is basically: How do I MD5 hash a Data and print it as a string?
Edit:
What I have tried
That works
MD5:ing the string test in a PHP script gives me 098f6bcd4621d373cade4e832627b4f6
and the Swift code "test".md5() also gives me 098f6bcd4621d373cade4e832627b4f6
That doesn't work
Converting the UInt8 byte array from Data.md5() to a string that represents the correct MD5 value.
The different tests I've done are the following:
var hash = ""
for byte in myData.data.md5() {
hash += String(format: "%02x", byte)
}
print("loop = " + hash) //test 1
print("myData.md5().toHexString() = " + myData.md5().toHexString()) //test 2
print("CryptoSwift.Digest.md5([UInt8](myData)) = " + CryptoSwift.Digest.md5([UInt8](myData)).toHexString()) //test 3
All three tests with the 500 byte test data give me the MD5 value 56f6955d148ad6b6abbc9088b4ae334d
while my PHP script gives me 6081d190b3ec6de47a74d34f6316ac6b
Test Sample (64 bytes):
Raw data:
FFD8FFE0 00104A46 49460001 01010048 00480000 FFE13572 45786966 00004D4D
002A0000 0008000B 01060003 00000001 00020000 010F0002 00000012 00000092
Test 1, 2 and 3 MD5: 7f0a012239d9fde5a46071640d2d8c83
PHP MD5: 06eb0c71d8839a4ac91ee42c129b8ba3
PHP Code: echo md5($_FILES["file"]["tmp_name"])
The simple answer to your question is:
String(data: someData, encoding: .utf8)
returns nil if someData is not properly UTF8 encoded data. If you try to unwrap nil like this:
String(data: someDate, encoding: .utf8)!
you get:
fatal error: unexpectedly found nil while unwrapping an Optional value
So at it's core, it's got nothing to do with hashing or crypto.
Both the input and the output of MD5 (or any hash algorithm for that matter) are binary data (and not text or strings). So the output of MD5 is not UTF8 encoded data. Thus why the above String initializer always failed.
If you want to display binary data in your console, you need to convert it to a readable representation. The most common ones are hexadecimal digits or Base 64 encoding.
Note: Some crypto libraries allow you to feed string into their hash functions. They will silently convert the string to a binary representation using some character encoding. If the encodings do not match, the hash values do not match across systems and programming languages. So you better try to understand why they really do in the background.
I use a library called 'CryptoSwift' for creating hashes, as well as encrypting data before I send it/store it. It's very easy to use.
It can be found here https://github.com/krzyzanowskim/CryptoSwift and you can even install it with CocoaPods by adding pod 'CryptoSwift' to your podfile.
Once installed, hashing a Data object is as simple as calling Data.md5()! It really is that easy. It also supports other hashing algorithms such as SHA.
You can then just print the MD5 object and CryptoSwift will convert it to a String for you.
The full docs on creating digests can be found here: https://github.com/krzyzanowskim/CryptoSwift#calculate-digest
Thanks to Jacob King I tried a much simpler MD5 framework called CryptoSwift.
The user Codo inspired me to look deeper in to my PHP script as he suggested that I am not in fact hashing the content of my data, but instead the filename, which is correct.
The original question however was not about which framework to use or suggestions to as why my app and my PHP script return different MD5 values.
The question was originally about why I get the error
fatal error: unexpectedly found nil while unwrapping an Optional value
at the line of code saying
print(" md5 result: " + String(data: myData.md5, encoding: .utf8)!)
So the answer to that is that I should not try to convert the 16 bytes data output of the MD5() function, but instead call a subfunction of MD5() called toHexString().
So the proper line of code should look like the following:
print("md5 result: " + myData.md5().toHexString())
BONUS
My PHP script now contains the following code:
move_uploaded_file($_FILES["file"]["tmp_name"], $target_dir); //save data to disk
$md5_of_data = md5_file ($target_dir); //get MD5 of saved data
BONUS-BONUS
The problem and solution is part of a small framework called AssetManager that I'm working on, which can be found here: https://github.com/aidv/AssetManager
I'm using camel 2.14.1 and splitting huge xml file with Chinese/Japanese characters using group=10000 within tokenize tag.
Files are created successfully based on grouping but Chinese/Japanese text codes are converted to Junk characters.
I tried enforcing UTF-8 before new XML creation using "ConvertBodyTo" but still issue persists.
Can someone help me !!
I had run into a similar issue while trying to split a csv file using tokenize with grouping.
Sample csv file: (with Delimiter - '|')
CandidateNumber|CandidateLastName|CandidateFirstName|EducationLevel
CAND123C001|Wells|Jimmy|Bachelor's Degree (±16 years)
CAND123C002|Wells|Tom|Bachelor's Degree (±16 years)
CAND123C003|Wells|James|Bachelor's Degree (±16 years)
CAND123C004|Wells|Tim|Bachelor's Degree (±16 years)
The ± character is corrupted after tokenize with grouping. I was initially under the assumption that the problem was with not setting the proper File Encoding for split, but the exchange seems to have the right value for property CamelCharsetName=ISO-8859-1.
from("file://<dir with csv files>?noop=true&charset=ISO-8859-1")
.split(body().tokenize("\n",2,true)).streaming()
.log("body: ${body}");
The same works fine with dont use grouping.
from("file://<dir with csv files>?noop=true&charset=ISO-8859-1")
.split(body().tokenize("\n")).streaming()
.log("body: ${body}");
Thanks to this post, it confirmed the issue is while grouping.
Looking at GroupTokenIterator in camel code base the problem seems to be with the way TypeConverter is used to convert String to InputStream
// convert to input stream
InputStream is =
camelContext.getTypeConverter().mandatoryConvertTo(InputStream.class, data);
...
Note: the mandatoryConvertTo() has an overloaded method with exchange
<T> T mandatoryConvertTo(Class<T> type, Exchange exchange, Object value)
As the exchange is not passed as argument it always falls back to default charset set using system property "org.apache.camel.default.charset"
Potential Fix:
// convert to input stream
InputStream is =
camelContext.getTypeConverter().mandatoryConvertTo(InputStream.class, exchange, data);
...
As this fix is in the camel-core, another potential option is to use split without grouping and use AgrregateStrategy with completionSize() and completionTimeout().
Although it would be great to get this fixed in camel-core.
How to determine string encoding in cocoa?
Recently I'm working on a radio player.Sometimes id3 tag text was garbled.
Here is my code:
CFDictionaryRef audioInfoDictionary;
UInt32 size = sizeof(audioInfoDictionary);
result = AudioFileGetProperty(fileID, kAudioFilePropertyInfoDictionary, &size, &audioInfoDictionary);
ID3 info are in audioInfoDictionary. Sometimes the id3 doesn't use utf8 encoding, and title, artist name were garbled.
Is there any way to determine what encoding a string use?
Special thx!
While it's an NSString object, there's no specific encoding since it's guaranteed to represent whatever is put into it using the encoding determined when it was created. See the Working With Encodings section of the docs.
From where are you getting the ID3 tags? The time you "receive" this information is the best time to determine its encoding. See Creating and Initializing Strings and the next few sections (for file and url creation) for a list of initializers. Some of them let you set the encoding and others pass back (by reference) the "best guess" encoding the system determined when creating the string. Look for methods with "usedEncoding:" for the system's reported guess.
All of this really depends on exactly what is handing you that string. Are you reading it from a file (an MP3) or a web service (Internet Radio)? If the latter, the server's response should include the encoding and if that's wrong, there's not much to do but guess.