How to read a null terminated String from Data? - swift

An iOS/Swift library delivers a Data object containing a null terminated string.
When converting it to a String by calling String(data: dataInstance, encoding: .utf8), the returned String ends with "\0".
Question now is, how do you convert the Data instance to a String without having "\0" appended? I.e. how can you omit the null terminating character at the end?
Trying to just .trimmingCharacters(in: .whitespacesAndNewlines) doesn't have any effect, so your advise is very much appreciated. Thank you.

A possible solution: Determine the index of the terminating zero, and convert only the preceding part of the data:
let data = Data([65, 66, 0, 67, 0])
let end = data.firstIndex(where: { $0 == 0 }) ?? data.endIndex
if let string = String(data: data[..<end], encoding:.utf8) {
print(string.debugDescription) // "AB"
}
If the data does not contain a null byte then everything will be converted.

You can do some pointer operations and use the init(cString:) initialiser, which takes a null terminated string.
let resultString = data.withUnsafeBytes { buffer in
guard let pointer = buffer.baseAddress?.assumingMemoryBound(to: CChar.self) else {
return ""
}
return String(cString: pointer)
}

Related

Decoding strings including utf8-literals like '\xc3\xa6' in Swift?

Follow up question to my former thread about UTF-8 literals:
It was established that you can decode UTF-8 literals from string like this that exclusively includes UTF-8 literals:
let s = "\\xc3\\xa6"
let bytes = s
.components(separatedBy: "\\x")
// components(separatedBy:) would produce an empty string as the first element
// because the string starts with "\x". We drop this
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
if let decoded = String(bytes: bytes, encoding: .utf8) {
print(decoded)
} else {
print("The UTF8 sequence was invalid!")
}
However this only works if the string only contains UTF-8 literals. As I am fetching a Wi-Fi list of names that has these UTF-8 literals within, how do I go about decoding the entire string?
Example:
let s = "This is a WiFi Name \\xc3\\xa6 including UTF-8 literals \\xc3\\xb8"
With the expected result:
print(s)
> This is a WiFi Name æ including UTF-8 literals ø
In Python there is a simple solution to this:
contents = source_file.read()
uni = contents.decode('unicode-escape')
enc = uni.encode('latin1')
dec = enc.decode('utf-8')
Is there a similar way to decode these strings in Swift 5?
To start with add the decoding code into a String extension as a computed property (or create a function)
extension String {
var decodeUTF8: String {
let bytes = self.components(separatedBy: "\\x")
.dropFirst()
.compactMap { UInt8($0, radix: 16) }
return String(bytes: bytes, encoding: .utf8) ?? self
}
}
Then use a regular expression and match using a while loop to replace all matching values
while let range = string.range(of: #"(\\x[a-f0-9]{2}){2}"#, options: [.regularExpression, .caseInsensitive]) {
string.replaceSubrange(range, with: String(string[range]).decodeUTF8)
}
As far as I know there's no native Swift solution to this. To make it look as compact as the Python version at the call site you can build an extension on String to hide the complexity
extension String {
func replacingUtf8Literals() -> Self {
let regex = #"(\\x[a-zAZ0-9]{2})+"#
var str = self
while let range = str.range(of: regex, options: .regularExpression) {
let literalbytes = str[range]
.components(separatedBy: "\\x")
.dropFirst()
.compactMap{UInt8($0, radix: 16)}
guard let actuals = String(bytes: literalbytes, encoding: .utf8) else {
fatalError("Regex error")
}
str.replaceSubrange(range, with: actuals)
}
return str
}
}
This lets you call
print(s.replacingUtf8Literals()).
//prints: This is a WiFi Name æ including UTF-8 literals ø
For convenience I'm trapping a failed conversion with fatalError. You may want to handle this in a better way in production code (although, unless the regex is wrong it should never occur!). There needs to be some form of break or error thrown here else you have an infinite loop.

Using Swift to write a character to a file

I'm trying to write a Swift program that writes a single character to a file. I've researched this but so far haven't figured out how to do this (note, I'm new to Swift). Note that the text file I'm reading and writing to can contain a series of characters, one per line. I want to read the last character and update the file so it only contains that last character.
Here's what I have so far:
let will_file = "/Users/willf/Drobox/foo.txt"
do {
let statusStr = try String(contentsOfFile: will_file, encoding: .utf8)
// find the last character in the string
var strIndex = statusStr.index(statusStr.endIndex, offsetBy: -1)
if statusStr[strIndex] == "\n" {
// I need to access the character just before the last \n
strIndex = statusStr.index(statusStr.endIndex, offsetBy: -2)
}
if statusStr[strIndex] == "y" {
print("yes")
} else if statusStr[strIndex] == "n" {
print("no")
} else {
// XXX deal with error here
print("The char isn't y or n")
}
// writing
// I get a "cannot invoke 'write with an arg list of type (to: String)
try statusStr[strIndex].write(to: will_file)
}
I would appreciate advice on how to write the character returned by statusStr[strIndex].
I will further point out that I have read this Read and write a String from text file but I am still confused as to how to write to a text file under my Dropbox folder. I was hoping that there was a write method that could take an absolute path as a string argument but I have not found any doc or code sample showing how to do this that will compile in Xcode 9.2. I have also tried the following code which will not compile:
let dir = FileManager.default.urls(for: .userDirectory, in: .userDomainMask).first
let fileURL = dir?.appendingPathComponent("willf/Dropbox/foo.txt")
// The compiler complains about extra argument 'atomically' in call
try statusStr[strIndex].write(to: fileURL, atomically: false, encoding: .utf8)
I have figured out how to write a character as a string to a file thanks to a couple answers on stack overflow. The key is to coerce a character type to a string type because the string object supports the write method I want to use. Note that I used both the answers in Read and write a String from text file and in Swift Converting Character to String to come up with the solution. Here is the Swift code:
import Cocoa
let will_file = "/Users/willf/Dropbox/foo.txt"
do {
// Read data from will_file into String object
let statusStr = try String(contentsOfFile: will_file, encoding: .utf8)
// find the last character in the string
var strIndex = statusStr.index(statusStr.endIndex, offsetBy: -1)
if statusStr[strIndex] == "\n" {
// I need to access the character just before the last \n
strIndex = statusStr.index(statusStr.endIndex, offsetBy: -2)
}
if statusStr[strIndex] != "n" && statusStr[strIndex] != "y" {
// XXX deal with error here
print("The char isn't y or n")
}
// Update file so it contains only the last status char
do {
// String(statusStr[strIndex]) coerces the statusStr[strIndex] character to a string for writing
try String(statusStr[strIndex]).write(toFile: will_file, atomically: false, encoding: .utf8)
} catch {
print("There was a write error")
}
} catch {
print("there is an error!")
}

How to decode UTF-8 knowing character count but not byte count?

I need to decode a UTF-8-encoded string I don’t know the byte count for. I do know the character count.
With the byte count, I would do this:
NSString(bytes: UnsafePointer<Byte>(bytes),
length: byteCount,
encoding: String.Encoding.utf8.rawValue)
How can I use the character count instead?
A possible solution is to use the UTF-8 UnicodeCodec to decode
bytes until the wanted number of characters is reached
(or an error occurs):
func decodeUTF8<S: Sequence>(bytes: S, numCharacters: Int) -> String
where S.Iterator.Element == UInt8 {
var iterator = bytes.makeIterator()
var utf8codec = UTF8()
var string = ""
while string.characters.count < numCharacters {
switch (utf8codec.decode(&iterator)) {
case let .scalarValue(val):
string.unicodeScalars.append(val)
default:
// Error or out of bytes:
return string
}
}
return string
}
(You could also return nil or throw an error in the error case.)
Example:
let bytes = "H€llo".utf8
let dec = decodeUTF8(bytes: bytes, numCharacters: 3)
print(dec) // H€l

Why is my variable returning a nil after being converted to an int?

I'm taking data from an api and storing it in a string variable. When I print the variable it returns what I'm looking for but when I try to convert it to an int using the .toInt method it returns a nil?
func getWeather() {
let url = NSURL(string: "http://api.openweathermap.org/data/2.5/weather?q=London&mode=xml")
let task = NSURLSession.sharedSession().dataTaskWithURL(url!) {
(data, response, error) in
if error == nil {
var urlContent = NSString(data: data, encoding: NSUTF8StringEncoding) as NSString!
var urlContentArray = urlContent.componentsSeparatedByString("temperature value=\"")
var temperatureString = (urlContentArray[1].substringWithRange(NSRange(location: 0, length:6))) as String
println(temperatureString) // returns 272.32
var final = temperatureString.toInt()
println(final) //returns nil
println(temperatureString.toInt())
self.temperature.text = "\(temperatureString)"
}
}
task.resume()
}
Even simpler, though slightly a trick, you can use integerValue:
temperatureString.integerValue
Unlike toInt, integerValue will stop converting when it finds a non-digit (it also throws away leading spaces.
If temperatureString is a String (rather than an NSString), you'll need to push it over:
(temperatureString as NSString).integerValue
That's because 272.32 is not an integer.
You could convert it to a Float.
272.32 isn't an Integer. If your temperature string is an NSString, you can do this to convert it:
Int("272.32".floatValue)

Swift - converting from UnsafePointer<UInt8> with length to String

I considered a lot of similar questions, but still can't get the compiler to accept this.
Socket Mobile API (in Objective-C) passes ISktScanDecodedData into a delegate method in Swift (the data may be binary, which I suppose is why it's not provided as string):
func onDecodedData(device: DeviceInfo?, DecodedData d: ISktScanDecodedData?) {
let symbology: String = d!.Name()
let rawData: UnsafePointer<UInt8> = d!.getData()
let rawDataSize: UInt32 = decoded!.getDataSize()
// want a String (UTF8 is OK) or Swifty byte array...
}
In C#, this code converts the raw data into a string:
string s = Marshal.PtrToStringAuto(d.GetData(), d.GetDataSize());
In Swift, I can get as far as UnsafeArray, but then I'm stuck:
let rawArray = UnsafeArray<UInt8>(start: rawData, length: Int(rawDataSize))
Alternatively I see String.fromCString and NSString.stringWithCharacters, but neither will accept the types of arguments at hand. If I could convert from UnsafePointer<UInt8> to UnsafePointer<()>, for example, then this would be available (though I'm not sure if it would even be safe):
NSData(bytesNoCopy: UnsafePointer<()>, length: Int, freeWhenDone: Bool)
Is there an obvious way to get a string out of all this?
This should work:
let data = NSData(bytes: rawData, length: Int(rawDataSize))
let str = String(data: data, encoding: NSUTF8StringEncoding)
Update for Swift 3:
let data = Data(bytes: rawData, count: Int(rawDataSize))
let str = String(data: data, encoding: String.Encoding.utf8)
The resulting string is nil if the data does not represent
a valid UTF-8 sequence.
How about this, 'pure' Swift 2.2 instead of using NSData:
public extension String {
static func fromCString
(cs: UnsafePointer<CChar>, length: Int!) -> String?
{
if length == .None { // no length given, use \0 standard variant
return String.fromCString(cs)
}
let buflen = length + 1
var buf = UnsafeMutablePointer<CChar>.alloc(buflen)
memcpy(buf, cs, length))
buf[length] = 0 // zero terminate
let s = String.fromCString(buf)
buf.dealloc(buflen)
return s
}
}
and Swift 3:
public extension String {
static func fromCString
(cs: UnsafePointer<CChar>, length: Int!) -> String?
{
if length == nil { // no length given, use \0 standard variant
return String(cString: cs)
}
let buflen = length + 1
let buf = UnsafeMutablePointer<CChar>.allocate(capacity: buflen)
memcpy(buf, cs, length)
buf[length] = 0 // zero terminate
let s = String(cString: buf)
buf.deallocate(capacity: buflen)
return s
}
}
Admittedly it's a bit stupid to alloc a buffer and copy the data just to add the zero terminator.
Obviously, as mentioned by Zaph, you need to make sure your assumptions about the string encoding are going to be right.