Data <-> MutableRandomAccessSlice - swift

I am really struggling with the fact that someData[start...stop] returns a MutableRandomAccessSlice. My someData was a let to begin with, so why would I want a Mutable thing? Why don't I get just a RandomAccessSlice. What's really frustrating though, is that it returns a thing that is pretty API incompatible with the original source. With a Data, I can use .withUnsafeBytes, but not so with this offspring. And how you turn the Slice back into a Data isn't clear either. There is no init that takes one of those.
I could use the subdata(in:) method instead of subscripting, but then, what's the point of the subscript if I only ever want a sub collection representation that behaves like the original collection. Furthermore, the subdata method can only do open subranges, why the subscript can do both closed and open. Is this just something they haven't quite finished up for Swift3 final yet?

Remember that the MutableRandomAccessSlice you get back is a value type, not a reference type. It just means you can modify it if you like, but it has nothing to do with the thing you sliced it out of:
let x = Data(bytes: [1,2,3]) // <010203>
var y = x[0...1]
y[0] = 2
x // <010203>
If you look in the code, you'll note that the intent is to return a custom slice type:
public subscript(bounds: Range<Index>) -> MutableRandomAccessSlice<Data> {
get {
return MutableRandomAccessSlice(base: self, bounds: bounds)
}
set {
// Ideally this would be:
// replaceBytes(in: bounds, with: newValue._base)
// but we do not have access to _base due to 'internal' protection
// TODO: Use a custom Slice type so we have access to the underlying data
let arrayOfBytes = newValue.map { $0 }
arrayOfBytes.withUnsafeBufferPointer {
let otherData = Data(buffer: $0)
replaceBytes(in: bounds, with: otherData)
}
}
}
That said, a custom slice will still not be acceptable to a function that takes a Data. That is consistent with other types, though, like Array, which slices to an ArraySlice which cannot be passed where an Array is expected. This is by design (and likely is for Data as well for the same reasons). The concern is that a slice "pins" all of the memory that backs it. So if you took a 3 byte slice out of a megabyte Data and stored it in an ivar, the entire megabyte has to hang around. The theory (according to Swift devs I spoke with) is that Arrays could be massive, so you need to be careful with slicing them, while Strings are usually much smaller, so it's ok for a String to slice to a String.
In my experience so far, you generally want subdata(in:). My experimentation with it is that it's very similar in speed to slicing, so I believe it's still copy on write (but it doesn't seem to pin the memory either in my initial tests). I've only tested on Mac so far, though. It's possible that there are more significant performance differences on iOS devices.

Based on Rob's comments, I just added the following pythonesque subscript extension:
extension Data {
subscript(start:Int?, stop:Int?) -> Data {
var front = 0
if let start = start {
front = start < 0 ? Swift.max(self.count + start, 0) : Swift.min(start, self.count)
}
var back = self.count
if let stop = stop {
back = stop < 0 ? Swift.max(self.count + stop, 0) : Swift.min(stop, self.count)
}
if front >= back {
return Data()
}
let range = Range(front..<back)
return self.subdata(in: range)
}
}
That way I can just do
let input = Data(bytes: [0x60, 0x0D, 0xF0, 0x0D])
input[nil, nil] // <600df00d>
input[1, 3] // <0df0>
input[-2, nil] // <f00d>
input[nil, -2] // <600d>

Related

Swift string vs [Character] and performance

From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}
First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?

Efficiency of using filter{where:} vs. removeAll{where:} when modifying a parameter value

Swift 4.2 introduced a new removeAll {where:} function. From what I have read, it is supposed to be more efficient than using filter {where:}. I have several scenarios in my code like this:
private func getListOfNullDates(list: [MyObject]) -> [MyObject] {
return list.filter{ $0.date == nil }
.sorted { $0.account?.name < $1.account?.name }
}
However, I cannot use removeAll{where:} with a param because it is a constant. So I would need to redefine it like this:
private func getListOfNullDates(list: [MyObject]) -> [MyObject] {
var list = list
list.removeAll { $0.date == nil }
return list.sorted { $0.account?.name < $1.account?.name }
}
Is using the removeAll function still more efficient in this scenario? Or is it better to stick with using the filter function?
Thank you for this question 🙏🏻
I've benchmarked both functions using this code on TIO:
let array = Array(0..<10_000_000)
do {
let start = Date()
let filtering = array.filter { $0 % 2 == 0 }
let end = Date()
print(filtering.count, filtering.last!, end.timeIntervalSince(start))
}
do {
let start = Date()
var removing = array
removing.removeAll { $0 % 2 == 0 }
let end = Date()
print(removing.count, removing.last!, end.timeIntervalSince(start))
}
(To have the removing and filtering identical, the closure passed to removeAll should have been { $0 % 2 != 0 }, but I didn't want to give an advantage to either snippet by using a faster or slower comparison operator.)
And indeed, removeAll(where:) is faster when the probability of removing elements (let's call it Pr)is 50%! Here are the benchmark results :
filter : 94ms
removeAll : 74ms
This is the same case when Pr is less than 50%.
Otherwise, filtering is faster for a higher Pr.
One thing to bear in mind is that in your code list is mutable, and that opens the possibility for accidental modifications.
Personally, I would choose performance over old habits, and in a sense, this use case is more readable since the intention is clearer.
Bonus : Removing in-place
What's meant by removing in-place is the swap the elements in the array in such a way that the elements to be removed are placed after a certain pivot index. The elements to keep are the ones before the pivot element :
var inPlace = array
let p = inPlace.partition { $0 % 2 == 0 }
Bear in mind that partition(by:) doesn't keep the original order.
This approach clocks better than removeAll(where:)
Beware of premature optimization. The efficiency of a method often depends on your specific data and configuration, and unless you're working with a large data set or performing many operations at once, it's not likely to have a significant impact either way. Unless it does, you should favor the more readable and maintainable solution.
As a general rule, just use removeAll when you want to mutate the original array and filter when you don't. If you've identified it as a potential bottleneck in your program, then test it to see if there's a performance difference.

How to Define Generic “Invalid“ Values for Different Types

In my app, I am using Integers, Doubles, Floats, and CGFloats to represent a number of different values. According to my app’s semantic, these values may become “invalid“, a state which I represent using a reserved value, i. e. -1. The simplest approach to make this usable in code would be this:
anIntVariable = -1
aFloatVariable = -1
aDoubleVariable = -1.0
...
To get away from this convention driven approach and increase readability and adaptability I defined a number of extensions:
extension Int {
static var invalid = -1
}
extension Float {
static var invalid = -1.0
}
extension Double {
static var invalid = -1.0
}
...
So the above code would now read:
anIntVariable = .invalid
aFloatVariable = .invalid
aDoubleVariable = .invalid
...
It does work. However, I’m not really happy with this approach. Does anyone of you have an idea for a better way of expressing this?
To add some complexity, in addition to simple types like Int, Float, or Double, I also use Measurement based types like this:
let length = Measurement(value: .invalid, unit: UnitLength.baseUnit())
Extra bonus point if you find a way to include “invalid“ measurements in your solution as well...
Thanks for helping!
Some Additional Thoughts
I know I could use optionals with nil meaning “invalid”. In this case, however, you’d have additional overhead with conditional unwrapping... Also, using nil as “invalid” is yet another convention.
It isn’t better or worse, just different. Apple uses “invalid” values in its own APIs, i. e. the NSTableViewmethod row(for:) will return -1 if the view is not in the table view. I agree, however, that this very method perfectly illustrates that returning an optional would make a lot of sense...
I'd use optionals for that.
If you want lack of value and invalid value to be different states in your app, i'd suggest creating a wrapper for your values:
enum Validatable<T> {
case valid(T)
case invalid
}
And use it like that:
let validValue : Validatable<Int> = .valid(5)
let invalidValue : Validatable<Int> = .invalid
var validOptionalDouble : Validatable<Double?> = .valid(nil)
validOptionalDouble = .valid(5.0)
let measurement : Validatable<Measurement> = .invalid
etc.
You can then check for value by switch on that enum to access the associated value like this:
switch validatableValue {
case .valid(let value):
//do something with value
case .invalid:
//handle invalid state
}
or
if case .valid(let value) = validatableValue {
//handle valid state
}
etc

How to convert UnsafePointer<[Float]> or fit Float Array data to Unsafe[Mutable]Pointer<Float> in Swift 3?

I have an array of float value (floatArray) coming from HDF5 file, and I'd like to use this data as an input for the function that accepts Unsafe[Mutable]Pointer (inputForFunction). I'm new to UnsafePointer, so I need someone's help.
One way I have in mind is, to store the array to a file, then to open it and to map the file descriptor to the memory of the variable for the input of the function:
// read the data from HDF5 file
var floatArray: [Float] = try originalHDF5DataSet.read()
// store the data
let dataToStore = NSData(bytes: &floatArray, length: sizeOfArray * MemoryLayout<Float>.size)
let tmpPath = URL(fileURLWithPath: NSTemporaryDirectory())
let filePath = tmpPath.appendingPathComponent("floatArrayData").appendingPathExtension("dat")
do{
try data.write(to: filePath, options: .atomic)
let fileManager : FileManager = FileManager.default
if fileManager.fileExists(atPath:filePath.path){
print("Success")
}else{
print("Fail")
}
// open the data
let dataPath = filePath.path
let fd = open(dataPath, O_RDONLY, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH)
assert(fd != -1, "Error: failed to open output file at \""+dataPath+"\" errno = \(errno)\n")
// memory map the parameters
let hdr = mmap(nil, Int(sizeOfArray), PROT_READ, MAP_FILE | MAP_SHARED, fd, 0)
// cast Void pointers to Float
let inputForFunction = UnsafePointer<Float>(hdr!.assumingMemoryBound(to: Float.self))
} catch{
print("failed")
}
However, it may not be the best implementation because the process to store the data seems time-consuming.
That's why I'm considering to use UnsafePointer to pass the pointer from floatArray to inputForFunction, but I'm not sure how to implement it. Just for your information, one way I'm thinking of is to use withUnsafePointermethod directly:
var floatArray: [Float] = try originalHDF5DataSet.read()
var inputForFunction = UnsafeMutablePointer<Float>.allocate(capacity: Int(sizeOfArray))
withUnsafePointer(to: &floatArray[0]) { (ptr: UnsafePointer<Float>) in
inputForFunction = UnsafeMutablePointer(mutating: ptr)
}
or the other way is to use withMemoryRebound (but the following didn't work):
var floatArray: [Float] = try originalHDF5DataSet.read()
var inputForFunction = UnsafeMutablePointer<Float>.allocate(capacity: Int(sizeOfArray))
withUnsafePointer(to: &floatArray) { (ptrDataArray: UnsafePointer<[Float]>) in
inputForFunction = ptrDataArray.withMemoryRebound(to: inputForFunction.self, capacity: Int(sizeOfArray), { (ptr: UnsafePointer<[Float]>) -> UnsafeMutablePointer<Float> in
return ptr[0]
})
}
My question is
How to pass the address of floatArray's data to inputForFunction?
If I need to convert from UnsafePointer<[Float]> to Unsafe[Mutable]Pointer, how to do it?
If needed, how to use withMemoryRebound for this purpose? Can I assign
the address/value to inputForFunction inside the closure?
When I checked the addresses of floatArray and floatArray[0], they were different (This seems different from C/C++). Which address should be passed to inputForFunction?
You can find many articles explaining how you can get Unsafe(Mutable)Pointer<Float> from [Float]. Some of them answer some of your questions. Try find them later.
How to pass the address of floatArray's data to inputForFunction?
You usually use the Array's methods withUnsafeBufferPointer or withUnsafeMutableBufferPointer. You can get an Unsafe(Mutable)BufferPointer<Element> inside the closure parameter, which contains an Unsafe(Mutable)Pointer<Element> as its baseAddress property.
var floatArray: [Float] = try originalHDF5DataSet.read()
floatArray.withUnsafeBufferPointer {unsafeBufferPointer in
let inputForFunction = unsafeBufferPointer.baseAddress
//`unsafeBufferPointer` (including its `baseAddress`) is valid only in this closure.
//So, do not pass `inputForFunction` outside of the closure, use it inside.
//...
}
If I need to convert from UnsafePointer<[Float]> to Unsafe[Mutable]Pointer, how to do it?
When you find something like UnsafePointer<[Float]>, then you are going the wrong way. In Swift, Array is a hidden struct which may contain (multi-level) indirect reference to its elements. You usually have no need to use UnsafePointer<Array<Float>>.
If needed, how to use withMemoryRebound for this purpose? Can I assign the address/value to inputForFunction inside the closure?
When you have an Array of AType and want to pass it as an Unsafe(Mutable)Pointer<AnotherType>, you may utilize withMemoryRebound. But in your case, you only need to work with Float, so withMemoryRebound may not be needed for your purpose.
When I checked the addresses of floatArray and floatArray[0], they were different (This seems different from C/C++). Which address should be passed to inputForFunction?
None. As I wrote above, address of floatArray is just pointing somewhere which contains a hidden struct, so it's completely useless for your purpose. And address of floatArray[0] is neither guaranteed to be the address of the first element of the whole content. Swift may create a temporal region which contains only one element, and passing the address of the temporal region.
By the way, do you really need to convert your [Float] to Unsafe(Mutable)Pointer<Float>?
If your functions' signature is something like this:
func takingUnsafePointer(_ pointer: UnsafePointer<Float>)
You can call it like:
takingUnsafePointer(floatArray)
//In this case `floatArray` can be a `let` constant.
Or else, if the function signature is:
func takingUnsafeMutablePointer(_ pointer: UnsafeMutablePointer<Float>)
Call it as:
takingUnsafeMutablePointer(&floatArray)
//In this case `floatArray` needs to be a `var`, not `let`.
Remember, in both cases, the passed pointer is valid only while the function call. You should not export the pointer somewhere else.
Array type have withUnsafeBufferPointer method.
This method accept closure with UnsafeBufferPointer which is like UnsafePointer<[Float]> you're asking for.
In you need UnsafePointer to the start of Array you can use baseAddress property of UnsafeBufferPointer.
Example code:
let bufferPointer: UnsafeBufferPointer<Float> = floatArray.withUnsafeBufferPointer { bufferPointer in
return bufferPointer
}
or
let baseAddress: UnsafePointer<Float> = floatArray.withUnsafeBufferPointer { bufferPointer in
return bufferPointer.baseAddress
}
EDIT: Or if it's function just passing &floatArray may work.

How to create a type that either hold an `Array<Int>` or `UnsafePointer<UInt8>`

I'm doing some performance testing of Swift vs Objective-C.
I created a Mac OS hybrid Swift/Objective-C project that creates large arrays of prime numbers using either Swift or Objective-C.
It's got a decent UI and shows the results in a clear display. You can check out the project on Github if you're interested. It's called SwiftPerformanceBenchmark.
The Objective-C code uses a malloc'ed C array of ints, and the Swift code uses an Array object.
The Objective C code is therefore a lot faster.
I've read about creating an Array-like wrapper around a buffer of bytes using code like this:
let size = 10000
var ptr = UnsafePointer<Int>malloc(size)
var bytes = UnsafeBufferPointer<Int>(start: ptr, count: data.length)
I'd like to modify my sample program so I can switch between my Array<Int> storage and using an UnsafeBufferPointer<Int> at runtime with a checkbox in the UI.
Thus I need a base type for my primes array that will hold either an Array<Int> or an UnsafeBufferPointer<Int>. I'm still too weak on Swift syntax to figure out how to do this.
For my Array- based code, I'll have to use array.append(value), and for the UnsafeBufferPointer<Int>, which is pre-filled with data, I'll use array[index]. I guess if I have to I could pre-populate my Array object with placeholder values so I could use array[index] syntax in both cases.
Can somebody give me a base type that can hold either an Array<Int> or an UnsafeBufferPointer<Int>, and the type-casts to allocate either type at runtime?
EDIT:
Say, for example, I have the following:
let count = 1000
var swiftArray:[Int]?
let useSwiftArrays = checkbox.isChecked
typealias someType = //A type that lets me use either unsafeArray or swiftArray
var primesArray: someType?
if useSwiftArrays
{
//Create a swift array version
swiftArray [Int](count: count, repeatedValue: 0)
primesArray = someType(swiftArray)
}
else
{
var ptr = UnsafePointer<Int>malloc(count*sizeof(Int))
var unsafeArray = UnsafeBufferPointer<Int>(start: ptr, count: data.length)
primesArray = someType(unsafeArray)
}
if let requiredPrimes = primesArray
{
requiredPrimes[0] = 2
}
#MartinR's suggestion should help get code that can switch between the two. But there's a shortcut you can take to prove whether the performance difference is between Swift arrays and C arrays, and that's to switch the Swift compiler optimization to -Ounchecked. Doing this eliminates the bounds checks on array indices etc that you would be doing manually by using unsafe pointers.
If I download your project from github and do that, I find that the Objective-C version is twice as fast as the Swift version. But... that’s because sizeof(int) is 4, but sizeof(Int) is 8. If you switch the C version to use 8-byte arithmetic as well...
p.s. it works the other way around as well, if I switch the Swift code to use UInt32, it runs at 2x the speed.
OK, it’s not pretty but here is a generic function that will work on any kind of collection, which means you can pass in either an Array, or an UnsafeMutableBufferPointer, which means you can use it on a malloc’d memory range, or using the array’s .withUnsafeMutableBufferPointer.
Unfortunately, some of the necessities of the generic version make it slightly less efficient than the non-generic version when used on an array. But it does show quite a nice performance boost over arrays in -O when used with a buffer:
func storePrimes<C: MutableCollectionType where C.Generator.Element: IntegerType>(inout store: C) {
if isEmpty(store) { return }
var candidate: C.Generator.Element = 3
var primeCount = store.startIndex
store[primeCount++] = 2
var isPrime: Bool
while primeCount != store.endIndex {
isPrime = true
var oldPrimeCount = store.startIndex
for oldPrime in store {
if oldPrimeCount++ == primeCount { break }
if candidate % oldPrime == 0 { isPrime = false; break }
if candidate < oldPrime &* oldPrime { isPrime = true; break }
}
if isPrime { store[primeCount++] = candidate }
candidate = candidate.advancedBy(2)
}
}
let totalCount = 2_000_000
var primes = Array<CInt>(count: totalCount, repeatedValue: 0)
let startTime = CFAbsoluteTimeGetCurrent()
storePrimes(&primes)
// or…
primes.withUnsafeMutableBufferPointer { (inout buffer: UnsafeMutableBufferPointer<CInt>) -> Void in
storePrimes(&buffer)
}
let now = CFAbsoluteTimeGetCurrent()
let totalTime = now - startTime
println("Total time: \(totalTime), per second: \(Double(totalCount)/totalTime)")
I am not 100% sure if I understand your problem correctly, but perhaps
this goes into the direction that you need.
Both Array and UnsafeMutablePointer conform to MutableCollectionType (which requires a subscript getter and setter).
So this function would accept both types:
func foo<T : MutableCollectionType where T.Generator.Element == Int, T.Index == Int>(inout storage : T) {
storage[0] = 1
storage[1] = 2
}
Example with buffer pointer:
let size = 2
var ptr = UnsafeMutablePointer<Int>(malloc(UInt(size * sizeof(Int))))
var buffer = UnsafeMutableBufferPointer<Int>(start: ptr, count: size)
foo(&buffer)
for elem in buffer {
println(elem)
}
Example with array:
var array = [Int](count: 2, repeatedValue: 0)
foo(&array)
for elem in array {
println(elem)
}
For non-mutating functions you can use CollectionType
instead of MutableCollectionType.