I am in the process of converting some iOS code(Swift) to Android(Kotlin) that is used to control a Bluetooth device (BLE).
I believe there are some differences between Swift and Kotlin and Unsigned Ints etc but I can't seem to get the same output.
iOS code : Outputs 13365
print ("IMPORTANT \(fourthData as NSData)") // Value is 0x3534
var fourth = Int32()
_ = Swift.withUnsafeMutableBytes(of: &fourth, { fourthData.copyBytes(to: $0) } )
print ("IMPORTANT \(fourth)") // 13365
Android code : Output is 13620
#ExperimentalUnsignedTypes // just to make it clear that the experimental unsigned types are used
fun ByteArray.toHexString() = asUByteArray().joinToString("") { it.toString(16).padStart(2, '0') }
Log.i("Decode", fourthData.toHexString()) // 3534
Log.i("Decode", "${fourthData.toHexString().toUInt(16)}") //13620
I have tried Int, UInt BigInteger and Long. What am I missing
As commenters already pointed out, values 13620 and 13365 are 0x3534 and 0x3435 respectively. In other words, the values differ by the byte ordering.
Decadic 32-bit number 13620 is equal to 0x00003534 in hexadecimal, therefore it will be represented by four bytes 00-00-35-34.
However, computers usually don't represent the value in that order. Two commonly used representations are Big Endian and Little Endian. The Big Endian will represent bytes in the natural order 00-00-35-34 but Little Endian will have bytes swapped to 00-00-34-35.
Java (Kotlin) always uses Big Endian representation for everything. On the other hand, if you just get the memory layout of an Int in Swift, you get the machine representation. The machine representation is usually Little Endian but that can differ between architectures. You should always be careful when directly reinterpreting numeric values as bytes or viceversa.
In this specific case, if you are sure the Data contains a Big Endian integer, you can use Int32(bigEndian: fourth) to swap the values from Big Endian to machine representation.
Related
The descriptions seem to be the same to me. "required" vs "needed" what does that mean?
// Returns the number of bytes required to store the String in a given encoding.
lengthOfBytes(using:)
// Returns the maximum number of bytes needed to store the receiver in a given encoding.
maximumLengthOfBytes(using:)
The lengthOfBytes(using:) returns the exact number, while maximumLengthOfBytes(using:) returns an estimate, which "may be considerably greater than the actual length" (in Apple own words)
The main difference between these methods is given by the Discussion sections of their documentation.
lengthOfBytes(using:):
The result is exact and is returned in O(n) time.
maximumLengthOfBytes(using:):
The result is an estimate and is returned in O(1) time; the estimate may be considerably greater than the actual length needed.
An example where they may differ: the UTF-8 string encoding requires between 1 and 4 bytes to represent a code point, but the exact representation depends on which code point is being represented. lengthOfBytes(using:) will go through the string, calculating the exact number of bytes for every single character, while maximumLengthOfBytes(using:) is allowed to round up to 4 for every code point without looking at the actual value in the string. In this case, the maximum returned is 3× as much as actually needed:
import Foundation
let str = "Hello, world!"
print(str.lengthOfBytes(using: .utf8)) // => 13
print(str.maximumLengthOfBytes(using: .utf8)) // => 39
maximumLengthOfBytes(using:) can give you an immediate answer with little to no computation, at the cost of overestimating, sometimes greatly. The tradeoff of which to use depends on your specific use-case.
I was trying to implement Boyer-Moore algorithm in Swift Playground and I used Swift String.Index a lot and something that started to bother me is why indexes are kept 4 times bigger that what it seems they should be.
For example:
let why = "is s on 4th position not 1st".index(of: "s")
This code in Swift Playground will generate _compoundOffset 4 not 1. I'm sure there is a reason for doing this, but I couldn't find explanation anywhere.
It's not a duplicate of any question that explains how to get index of char in Swift, I know that, I used index(of:) function just to illustrate the question. I wanted to know why value of 2nd char is 4 not 1 when using String.Index.
So I guess the way it keeps indexes is private and I don't need to know the inside implementation, it's probably connected with UTF16 and UTF32 coding.
First of all, don’t ever assume _compoundOffset to be anything else than an implementation detail. _compoundOffset is an internal property of String.Index that uses bit masking to store two values in this one number:
The encodedOffset, which is the index's byte offset in terms of UTF-16 code units. This one is public and can be relied on. In your case encodedOffset is 1 because that's the offset for that character, as measured in UTF-16 code units. Note that the encoding of the string in memory doesn't matter! encodedOffset is always UTF-16.
The transcodedOffset, which stores the index's offset inside the current UTF-16 code unit. This is also an internal property that you can't access. The value is usually 0 for most indices, unless you have an index into the string's UTF-8 view that refers to a code unit which doesn't fall on a UTF-16 boundary. In that case, the transcodedOffset will store the offset in bytes from the encodedOffset.
Now why is _compoundOffset == 4? Because it stores the transcodedOffset in the two least significant bits and the encodedOffset in the 62 most significant bits. So the bit pattern for encodedOffset == 1, transcodedOffset == 0 is 0b100, which is 4.
You can verify all this in the source code for String.Index.
In Swift 3, you can count the characters in a String with:
str.characters.count
I need to do this frequently, and that line above looks like it could be O(N). Is there a way to get a string length, or a length of something — maybe the underlying unicode buffer — with an operation that is guaranteed to not have to walk the entire string? Maybe:
str.utf16.count
I ask because I'm checking the length of some text every time the user types a character, to limit the size of a UITextView. The call doesn't need to be an exact count of the glyphs, like characters.count.
This is a good question. The answer is... complicated. Converting from UTF-8 to UTF-16, or vice-versa, or converting to or from some other encoding, will all require examining the string, since the characters can be made up of more than one code unit. So if you want to get the count in constant time, it's going to come down to what the internal representation is. If the string is using UTF-16 internally, then it's a reasonable assumption that string.utf16.count would be in constant time, but if the internal representation is UTF-8 or something else, then the string will need to be analyzed to determine what the length in UTF-16 would be. So what's String using internally? Well:
https://github.com/apple/swift/blob/master/stdlib/public/core/StringCore.swift
/// The core implementation of a highly-optimizable String that
/// can store both ASCII and UTF-16, and can wrap native Swift
/// _StringBuffer or NSString instances.
This is discouraging. The internal representation could be ASCII or UTF-16, or it could be wrapping a Foundation NSString. Hrm. We do know that NSString uses UTF-16 internally, since this is actually documented, so that's good. So the main outlier here is when the string stores ASCII. The saving grace is that since the first 128 Unicode code points have the same values as the ASCII character set, any ASCII character 0xXX should correspond to the UTF-16 character 0x00XX, so the UTF-16 length should simply be the ASCII length times two, and thus calculable in constant time. Is this the case in the implementation? Let's look.
In the UTF16View source, there is no implementation of count. It appears that count is inherited from Collection's implementation, which is implemented via distance():
public var count: IndexDistance {
return distance(from: startIndex, to: endIndex)
}
UTF16View's implementation of distance() looks like this:
public func distance(from start: Index, to end: Index) -> IndexDistance {
// FIXME: swift-3-indexing-model: range check start and end?
return start.encodedOffset.distance(to: end.encodedOffset)
}
And in the String.Index source, encodedOffset looks like this:
public var encodedOffset : Int {
return Int(_compoundOffset >> _Self._strideBits)
}
where _compoundOffset appears to be a simple 64-bit integer:
internal var _compoundOffset : UInt64
and _strideBits appears to be a straight integer as well:
internal static var _strideBits : Int { return 2 }
So it... looks... like you should get constant time from string.utf16.count, since unless I'm making a mistake somewhere, you're just bit-shifting a couple of integers and then comparing the results (I'd probably still run some tests to be sure). The caveat is, of course, that this isn't documented, and thus could change in the future—particularly since the documentation for String does claim that it needs to iterate through the string:
Unlike with isEmpty, calculating a view’s count property requires iterating through the elements of the string.
With all that said, you're using a UITextView, which is implemented in Objective-C via NSAttributedString. If you're willing to incur the Objective-C message-passing overhead (which, let's be honest, is probably occurring under the scenes anyway to generate the String), you can just call its length property, which, since NSAttributedString is built on top of NSString, which does guarantee that it uses UTF-16 internally, is almost certain to be in constant time.
How do you use NSCoding to code (and decode) an array of of ten values of primitive type int? Encode each integer individually (in a for-loop). But what if my array held one million integers? Is there a more satisfying alternative to using a for-loop here?
Edit (after first answer): And decode? (#Justin: I'll then tick your answer.)
If performance is your concern here: CFData/NSData is NSCoding compliant, so just wrap your serialized representation of the array as NSCFData.
edit to detail encoding/decoding:
your array of ints will need to to be converted to a common endian format (depending on the machine's endianness) - e.g. always store it as little or big endian. during encoding, convert it to an array of integers in the specified endianness, which is passed to the NSData object. then pass the NSData representation to the NSCoder instance. at decode, you'll receive an NSData object for the key, you conditionally convert it to the native endianness of the machine when decoding it. one set of byte swapping routines available for OS X and iOS begin with OSSwap*.
alternatively, see -[NSCoder encodeBytes:voidPtr length:numBytes forKey:key]. this routine also requires the client to swap endianness.
I see NSInteger is used quite often and the typedef for it on the iPhone is a long, so technically I could use it when I am expect int(64) values. But should I be more explicit and use something like int64_t or long directly? What would be the downside of just using long?
IIRC, long on the iPhone/ARM is 32 bits. If you want a guaranteed 64-bit integer, you should (indeed) use int64_t.
Integer Data Types Sizes
short - ILP32: 2 bytes; LP64: 2 bytes
int - ILP32: 4 bytes; LP64: 4 bytes
long - ILP32: 4 bytes; LP64: 8 bytes
long long - ILP32: 8 bytes; LP64: 8 bytes
It may be useful to know that:
The compiler defines the __LP64__ macro when compiling for the 64-bit runtime.
NSInteger is a typedef of long so it will be 32-bits in a 32-bit environment and 64-bits in a 64-bit environment.
When converting to 64-bit you can simply replace all your ints and longs to NSInteger and you should be good to go.
Important: pay attention to the alignment of data, LP64 uses natural alignment for all Integer data types but ILP32 uses 4 bytes for all Integer data types with size equal to or greater than 4 bytes.
You can read more about 32 to 64 bit conversion in the Official 64-Bit Transition Guide for Cocoa Touch.
Answering you questions:
How should I declare a long in Objective-C? Is NSInteger appropriate?
You can use either long or NSInteger but NSInteger is more idiomatic IMHO.
But should I be more explicit and use something like int64_t or long directly?
If you expect consistent 64-bit sizes neither long nor NSInteger will do, you'll have to use int64_t (as Wevah said).
What would be the downside of just using long?
It's not idiomatic and you may have problems if Apple rolls out a new architecture again.
If you need a type of known specific size, use the type that has that known specific size: int64_t.
If you need a generic integer type and the size is not important, go ahead and use int or NSInteger.
NSInteger's length depends on whether you are compiling for 32 bit or 64 bit. It's defined as long for 64 bit and iPhone and int for 32 bit.
So on iPhone the length of NSInteger is the same as the length of a long, which is compiler dependent. Most compilers make long the same length as the native word. i.e. 32 bit for 32 bit architectures and 64 bit for 64 bit architectures.
Given the uncertainty over the width of NSInteger, I use it only for types of variables to be used in the Cocoa API when NSInteger is specified. If I need a fixed width type, I go for the ones defined in stdint.h. If I don't care about the width I use the C built in types
If you want to declare something as long, declare it as long. Be aware that long can be 32 or 64 bit, depending on the compiler.
If you want to declare something to be as efficient as possible, and big enough to count items, use NSInteger or NSUInteger. Note that both can be 32 or 64 bits, and can be actually different types (int or long), depending on the compiler. Which protects you from mixing up types in some cases.
If you want 32 or 64 bit, and nothing else, use int32_t, uint32_t, int64_t, uint64_t. Be aware that either type can be unnecessarily inefficient on some compiler.