How to decode UTF-8 knowing character count but not byte count? - swift

I need to decode a UTF-8-encoded string I don’t know the byte count for. I do know the character count.
With the byte count, I would do this:
NSString(bytes: UnsafePointer<Byte>(bytes),
length: byteCount,
encoding: String.Encoding.utf8.rawValue)
How can I use the character count instead?

A possible solution is to use the UTF-8 UnicodeCodec to decode
bytes until the wanted number of characters is reached
(or an error occurs):
func decodeUTF8<S: Sequence>(bytes: S, numCharacters: Int) -> String
where S.Iterator.Element == UInt8 {
var iterator = bytes.makeIterator()
var utf8codec = UTF8()
var string = ""
while string.characters.count < numCharacters {
switch (utf8codec.decode(&iterator)) {
case let .scalarValue(val):
string.unicodeScalars.append(val)
default:
// Error or out of bytes:
return string
}
}
return string
}
(You could also return nil or throw an error in the error case.)
Example:
let bytes = "H€llo".utf8
let dec = decodeUTF8(bytes: bytes, numCharacters: 3)
print(dec) // H€l

Related

How to fix the Integer literal '2147483648' overflows when stored into 'Int' exception?

let stringData = "84121516" // this is 4 bytes data
let value = self.checkHexToInt(stringData: stringData)
func checkHexToInt(stringData: String) -> Int? {
guard let num = Int(stringData, radix: 16) else {
return nil
}
return Int(num)
}
// values is 2215777558 But I need most significant bit only
let checkEngineLightOn = ((value! & 0x80000000) > 0);
When I am doing this I got the exception saying "Integer literal '2147483648' overflows when stored into 'Int'"
When I do this I am expecting to get either true or false. Or is any other to get most significant bit out of Int Value?
As #OOPer noted in the comments, on a 32-bit system Int is 32-bits and your value is larger than Int32.max. Since you are decoding 4 bytes you can use UInt32:
func checkHexToUInt32(stringData: String) -> UInt32? {
return UInt32(stringData, radix: 16)
}
let stringData = "84121516" // this is 4 bytes data
let value = self.checkHexToUInt32(stringData: stringData)
let checkEngineLightOn = ((value! & 0x80000000) > 0)
Note: UInt32(_:radix:) returns an UInt32? which is nil if the conversion fails, so there is no reason for the guard and return nil, just return the value of the conversion.

Decode nsData to String Array

I want to decode my nsData to a String Array. I have this code right now:
func nsDataToStringArray(data: NSData) -> [String] {
var decodedStrings = [String]()
var stringTerminatorPositions = [Int]()
var currentPosition = 0
data.enumerateBytes() {
buffer, range, stop in
let bytes = UnsafePointer<UInt8>(buffer)
for i in 0 ..< range.length {
if bytes[i] == 0 {
stringTerminatorPositions.append(currentPosition)
}
currentPosition += 1
}
}
var stringStartPosition = 0
for stringTerminatorPosition in stringTerminatorPositions {
let encodedString = data.subdata(with: NSMakeRange(stringStartPosition, stringTerminatorPosition - stringStartPosition))
let decodedString = NSString(data: encodedString, encoding: String.Encoding.utf8.rawValue)! as String
decodedStrings.append(decodedString)
stringStartPosition = stringTerminatorPosition + 1
}
return decodedStrings
}
But I get an error on this line: let bytes = UnsafePointer<UInt8>(buffer)
Cannot invoke initializer for type 'UnsafePointer' with an
argument list of type '(UnsafeRawPointer)'
Do I need to convert the buffer to a UnsafePointer? If so, how can I do that?
buffer in the enumerateBytes() closure is a UnsafeRawPointer
and you have to "rebind" it to an UInt8 pointer in Swift 3:
// let bytes = UnsafePointer<UInt8>(buffer)
let bytes = buffer.assumingMemoryBound(to: UInt8.self)
But why so complicated? You can achieve the same result with
func nsDataToStringArray(nsData: NSData) -> [String] {
let data = nsData as Data
return data.split(separator: 0).flatMap { String(bytes: $0, encoding: .utf8) }
}
How does this work?
Data is a Sequence of UInt8, therefore
split(separator: 0) can be called on it, returning an array of
"data slices" (which are views into the source data, not copies).
Each "data slice" is again a Sequence of UInt8, from which a
String can be created with String(bytes: $0, encoding: .utf8).
This is a failable initializer (because the data may be invalid UTF-8).
flatMap { ... } returns an array with all non-nil results,
i.e. an array with all strings which could be created from
valid UTF-8 code sequences between zero bytes.

Convert hex-encoded String to String

I want to convert following hex-encoded String in Swift 3:
dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4
to its equivalant String:
Ü°J:\ص7cï ¼f«à)ýë®^A~/ü*¤
Following websites do the job very fine:
http://codebeautify.org/hex-string-converter
http://string-functions.com/hex-string.aspx
But I am unable to do the same in Swift 3. Following code doesn't do the job too:
func convertHexStringToNormalString(hexString:String)->String!{
if let data = hexString.data(using: .utf8){
return String.init(data:data, encoding: .utf8)
}else{ return nil}
}
Your code doesn't do what you think it does. This line:
if let data = hexString.data(using: .utf8){
means "encode these characters into UTF-8." That means that "01" doesn't encode to 0x01 (1), it encodes to 0x30 0x31 ("0" "1"). There's no "hex" in there anywhere.
This line:
return String.init(data:data, encoding: .utf8)
just takes the encoded UTF-8 data, interprets it as UTF-8, and returns it. These two methods are symmetrical, so you should expect this whole function to return whatever it was handed.
Pulling together Martin and Larme's comments into one place here. This appears to be encoded in Latin-1. (This is a really awkward way to encode this data, but if it's what you're looking for, I think that's the encoding.)
import Foundation
extension Data {
// From http://stackoverflow.com/a/40278391:
init?(fromHexEncodedString string: String) {
// Convert 0 ... 9, a ... f, A ...F to their decimal value,
// return nil for all other input characters
func decodeNibble(u: UInt16) -> UInt8? {
switch(u) {
case 0x30 ... 0x39:
return UInt8(u - 0x30)
case 0x41 ... 0x46:
return UInt8(u - 0x41 + 10)
case 0x61 ... 0x66:
return UInt8(u - 0x61 + 10)
default:
return nil
}
}
self.init(capacity: string.utf16.count/2)
var even = true
var byte: UInt8 = 0
for c in string.utf16 {
guard let val = decodeNibble(u: c) else { return nil }
if even {
byte = val << 4
} else {
byte += val
self.append(byte)
}
even = !even
}
guard even else { return nil }
}
}
let d = Data(fromHexEncodedString: "dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4")!
let s = String(data: d, encoding: .isoLatin1)
You want to use the hex encoded data as an AES key, but the
data is not a valid UTF-8 sequence. You could interpret
it as a string in ISO Latin encoding, but the AES(key: String, ...)
initializer converts the string back to its UTF-8 representation,
i.e. you'll get different key data from what you started with.
Therefore you should not convert it to a string at all. Use the
extension Data {
init?(fromHexEncodedString string: String)
}
method from hex/binary string conversion in Swift
to convert the hex encoded string to Data and then pass that
as an array to the AES(key: Array<UInt8>, ...) initializer:
let hexkey = "dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4"
let key = Array(Data(fromHexEncodedString: hexkey)!)
let encrypted = try AES(key: key, ....)
There is still a way to convert the key from hex to readable string by adding the below extension:
extension String {
func hexToString()->String{
var finalString = ""
let chars = Array(self)
for count in stride(from: 0, to: chars.count - 1, by: 2){
let firstDigit = Int.init("\(chars[count])", radix: 16) ?? 0
let lastDigit = Int.init("\(chars[count + 1])", radix: 16) ?? 0
let decimal = firstDigit * 16 + lastDigit
let decimalString = String(format: "%c", decimal) as String
finalString.append(Character.init(decimalString))
}
return finalString
}
func base64Decoded() -> String? {
guard let data = Data(base64Encoded: self) else { return nil }
return String(data: data, encoding: .init(rawValue: 0))
}
}
Example of use:
let hexToString = secretKey.hexToString()
let base64ReadableKey = hexToString.base64Decoded() ?? ""

Fit Swift string in database VARCHAR(255)

I'm trying to get a valid substring of at most 255 UTF8 code units from a Swift string (the idea is to be able to store it an a database VARCHAR(255) field).
The standard way of getting a substring is this :
let string: String = "Hello world!"
let startIndex = string.startIndex
let endIndex = string.startIndex.advancedBy(255, limit: string.endIndex)
let databaseSubstring1 = string[startIndex..<endIndex]
But obviously that would give me a string of 255 characters that may require more than 255 bytes in UTF8 representation.
For UTF8 I can write this :
let utf8StartIndex = string.utf8.startIndex
let utf8EndIndex = utf8StartIndex.advancedBy(255, limit: string.utf8.endIndex)
let databaseSubstringUTF8View = name.utf8[utf8StartIndex..<utf8EndIndex]
let databaseSubstring2 = String(databaseSubstringUTF8View)
But I run the risk of having half a character at the end, which means my UTF8View would not be a valid UTF8 sequence.
And as expected databaseSubstring2 is an optional string because the initializer can fail (it is defined as public init?(_ utf8: String.UTF8View)).
So I need some way of stripping invalid UTF8 code points at the end, or – if possible – a builtin way of doing what I'm trying to do here.
EDIT
Turns out that databases understand characters, so I should not try to count UTF8 code units, but rather how many characters the database will count in my string (which will probably depend on the database).
According to #OOPer, MySQL counts characters as UTF-16 code units. I have come up with the following implementation :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of UTF-16 code units
while (string.utf16.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
The idea is to count UTF-16 code units, but remove characters from the end (that is what Swift think what a character is).
EDIT 2
Still according to #OOPer, Posgresql counts characters as unicode scalars, so this should probably work :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of Unicode Scalars
while (string.unicodeScalars.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
As I write in my comment, you may need your databaseStringForString(_:maxLength:) to truncate your string to match the length limit of your DBMS. PostgreSQL with utf8, MySQL with utf8mb4.
And I would write the same functionality as your EDIT 2:
func databaseStringForString(string: String, maxUnicodeScalarLength: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].unicodeScalars.count > maxUnicodeScalarLength {
return string[start..<index]
}
}
return string
}
This may be less efficient, but a little bit shorter.
let s = "abc\u{1D122}\u{1F1EF}\u{1F1F5}" //->"abc𝄢🇯🇵"
let dbus = databaseStringForString(s, maxUnicodeScalarLength: 5) //->"abc𝄢"(=="abc\u{1D122}")
So, someone who works with MySQL with utf8(=utf8mb3) needs something like this:
func databaseStringForString(string: String, maxUTF16Length: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].utf16.count > maxUTF16Length {
return string[start..<index]
}
}
return string
}
let dbu16 = databaseStringForString(s, maxUTF16Length: 4) //->"abc"

Swift - converting from UnsafePointer<UInt8> with length to String

I considered a lot of similar questions, but still can't get the compiler to accept this.
Socket Mobile API (in Objective-C) passes ISktScanDecodedData into a delegate method in Swift (the data may be binary, which I suppose is why it's not provided as string):
func onDecodedData(device: DeviceInfo?, DecodedData d: ISktScanDecodedData?) {
let symbology: String = d!.Name()
let rawData: UnsafePointer<UInt8> = d!.getData()
let rawDataSize: UInt32 = decoded!.getDataSize()
// want a String (UTF8 is OK) or Swifty byte array...
}
In C#, this code converts the raw data into a string:
string s = Marshal.PtrToStringAuto(d.GetData(), d.GetDataSize());
In Swift, I can get as far as UnsafeArray, but then I'm stuck:
let rawArray = UnsafeArray<UInt8>(start: rawData, length: Int(rawDataSize))
Alternatively I see String.fromCString and NSString.stringWithCharacters, but neither will accept the types of arguments at hand. If I could convert from UnsafePointer<UInt8> to UnsafePointer<()>, for example, then this would be available (though I'm not sure if it would even be safe):
NSData(bytesNoCopy: UnsafePointer<()>, length: Int, freeWhenDone: Bool)
Is there an obvious way to get a string out of all this?
This should work:
let data = NSData(bytes: rawData, length: Int(rawDataSize))
let str = String(data: data, encoding: NSUTF8StringEncoding)
Update for Swift 3:
let data = Data(bytes: rawData, count: Int(rawDataSize))
let str = String(data: data, encoding: String.Encoding.utf8)
The resulting string is nil if the data does not represent
a valid UTF-8 sequence.
How about this, 'pure' Swift 2.2 instead of using NSData:
public extension String {
static func fromCString
(cs: UnsafePointer<CChar>, length: Int!) -> String?
{
if length == .None { // no length given, use \0 standard variant
return String.fromCString(cs)
}
let buflen = length + 1
var buf = UnsafeMutablePointer<CChar>.alloc(buflen)
memcpy(buf, cs, length))
buf[length] = 0 // zero terminate
let s = String.fromCString(buf)
buf.dealloc(buflen)
return s
}
}
and Swift 3:
public extension String {
static func fromCString
(cs: UnsafePointer<CChar>, length: Int!) -> String?
{
if length == nil { // no length given, use \0 standard variant
return String(cString: cs)
}
let buflen = length + 1
let buf = UnsafeMutablePointer<CChar>.allocate(capacity: buflen)
memcpy(buf, cs, length)
buf[length] = 0 // zero terminate
let s = String(cString: buf)
buf.deallocate(capacity: buflen)
return s
}
}
Admittedly it's a bit stupid to alloc a buffer and copy the data just to add the zero terminator.
Obviously, as mentioned by Zaph, you need to make sure your assumptions about the string encoding are going to be right.