Swift string vs [Character] and performance

Swift string vs [Character] and performance - swift

From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}

First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?

Related

Converting Pascal's variant record to Swift

I am converting a program written in Pascal to Swift and some Pascal features do not have direct Swift equivalents such as variant records and defining sets as types. A variant record in Pascal enables you to assign different field types to the same area of memory in a record. In other words, one particular location in a record could be either of type A or of type B. This can be useful in either/or cases, where a record can have either one field or the other field, but not both. What are the Swift equivalents for a variant record and a set type like setty in the Pascal fragment?
The Pascal code fragment to be converted is:
const
strglgth = 16;
sethigh = 47;
setlow = 0;
type
setty = set of setlow..sethigh;
cstclass = (reel,pset,strg);
csp = ^constant; /* pointer to constant type */
constant = record case cclass: cstclass of
reel: (rval: packed array [1..strglgth] of char);
pset: (pval: setty);
strg: (slgth: 0..strglgth;
sval: packed array [1..strglgth] of char)
end;
var
lvp: csp
My partial Swift code is
let strglgth = 16
let sethigh = 47
let setlow = 0
enum cstclass : Int {case reel = 0, pset, strg}
var lvp: csp
Any advice is appreciated. Thanks in advance.

Variant records in Pascal are very simular to unions in C.
So this link will probably be helpful:
https://developer.apple.com/documentation/swift/imported_c_and_objective-c_apis/using_imported_c_structs_and_unions_in_swift
In case the link ever goes dead, here's the relevant example:
union SchroedingersCat {
bool isAlive;
bool isDead;
};
In Swift, it’s imported like this:
struct SchroedingersCat {
var isAlive: Bool { get set }
var isDead: Bool { get set }
init(isAlive: Bool)
init(isDead: Bool)
init()
}
That would be more like a functional port. It does not seem to take care of the fact that Variant records are actually meant to use the same piece of memory in different ways, so if you'd have some low-level code that reads from a stream or you have a pointer to such a structure, this might not help you.
In that case you might want to try just reserving some bytes, and write different getters/setters to access them. That would even work if you'd have to port more complex structures, like nested variant types.
But overall, if possible, I'd recommend to avoid porting such structures too literally, and use idioms that match Swift better.

Swift4 right way to test substrings against strings?

I'm parsing the first two characters on a line of text and doing lots of comparisons against possible patterns:
In my Card class:
static let ourTypes = ["PL", "SY", "XT"]
In lots of other places:
if Card.ourTypes.contains(line[0..<2]) { continue }
Swift4 (3?) changed the []'s to return a Substring. I know I can cast it back with String(line[0..<2]), but I suspect that's the wrong solution... is there a better way?

One way would be to make your ourTypes array to be [Substring], then you wouldn't have to convert your Substring to make contains work:
static let ourTypes: [Substring] = ["PL", "SY", "XT"]
if Card.ourTypes.contains(line.prefix(2)) { continue }
#matt's observation that searching with contains is better with a Set (because it's more efficient) can be accomplished with:
static let ourTypes: Set<Substring> = ["PL", "SY", "XT"]

The String cast, while a bit jarring, is not expensive. Deriving a true independent substring from a string simply is a two-step process: access the slice, then unlink the indices and storage from the original. That is all that String() means here. So I think your original approach is actually correct and nonproblematic.
If you really want to stay in the String world, though, you can, by calling removeSubrange instead of taking a slice. You give up the convenience of slice notation and slice-related methods, but everything depends on your priorities. And by the way, if contains is your main test here, use a Set, not an Array:
let ourTypes = Set(["PL", "SY", "XT"])
var line = "PLARF"
line.removeSubrange(line.index(line.startIndex, offsetBy: 2)...)
ourTypes.contains(line) // true

Efficiency of using filter{where:} vs. removeAll{where:} when modifying a parameter value

Swift 4.2 introduced a new removeAll {where:} function. From what I have read, it is supposed to be more efficient than using filter {where:}. I have several scenarios in my code like this:
private func getListOfNullDates(list: [MyObject]) -> [MyObject] {
return list.filter{ $0.date == nil }
.sorted { $0.account?.name < $1.account?.name }
}
However, I cannot use removeAll{where:} with a param because it is a constant. So I would need to redefine it like this:
private func getListOfNullDates(list: [MyObject]) -> [MyObject] {
var list = list
list.removeAll { $0.date == nil }
return list.sorted { $0.account?.name < $1.account?.name }
}
Is using the removeAll function still more efficient in this scenario? Or is it better to stick with using the filter function?

Thank you for this question 🙏🏻
I've benchmarked both functions using this code on TIO:
let array = Array(0..<10_000_000)
do {
let start = Date()
let filtering = array.filter { $0 % 2 == 0 }
let end = Date()
print(filtering.count, filtering.last!, end.timeIntervalSince(start))
}
do {
let start = Date()
var removing = array
removing.removeAll { $0 % 2 == 0 }
let end = Date()
print(removing.count, removing.last!, end.timeIntervalSince(start))
}
(To have the removing and filtering identical, the closure passed to removeAll should have been { $0 % 2 != 0 }, but I didn't want to give an advantage to either snippet by using a faster or slower comparison operator.)
And indeed, removeAll(where:) is faster when the probability of removing elements (let's call it Pr)is 50%! Here are the benchmark results :
filter : 94ms
removeAll : 74ms
This is the same case when Pr is less than 50%.
Otherwise, filtering is faster for a higher Pr.
One thing to bear in mind is that in your code list is mutable, and that opens the possibility for accidental modifications.
Personally, I would choose performance over old habits, and in a sense, this use case is more readable since the intention is clearer.
Bonus : Removing in-place
What's meant by removing in-place is the swap the elements in the array in such a way that the elements to be removed are placed after a certain pivot index. The elements to keep are the ones before the pivot element :
var inPlace = array
let p = inPlace.partition { $0 % 2 == 0 }
Bear in mind that partition(by:) doesn't keep the original order.
This approach clocks better than removeAll(where:)

Beware of premature optimization. The efficiency of a method often depends on your specific data and configuration, and unless you're working with a large data set or performing many operations at once, it's not likely to have a significant impact either way. Unless it does, you should favor the more readable and maintainable solution.
As a general rule, just use removeAll when you want to mutate the original array and filter when you don't. If you've identified it as a potential bottleneck in your program, then test it to see if there's a performance difference.

Data <-> MutableRandomAccessSlice

I am really struggling with the fact that someData[start...stop] returns a MutableRandomAccessSlice. My someData was a let to begin with, so why would I want a Mutable thing? Why don't I get just a RandomAccessSlice. What's really frustrating though, is that it returns a thing that is pretty API incompatible with the original source. With a Data, I can use .withUnsafeBytes, but not so with this offspring. And how you turn the Slice back into a Data isn't clear either. There is no init that takes one of those.
I could use the subdata(in:) method instead of subscripting, but then, what's the point of the subscript if I only ever want a sub collection representation that behaves like the original collection. Furthermore, the subdata method can only do open subranges, why the subscript can do both closed and open. Is this just something they haven't quite finished up for Swift3 final yet?

Remember that the MutableRandomAccessSlice you get back is a value type, not a reference type. It just means you can modify it if you like, but it has nothing to do with the thing you sliced it out of:
let x = Data(bytes: [1,2,3]) // <010203>
var y = x[0...1]
y[0] = 2
x // <010203>
If you look in the code, you'll note that the intent is to return a custom slice type:
public subscript(bounds: Range<Index>) -> MutableRandomAccessSlice<Data> {
get {
return MutableRandomAccessSlice(base: self, bounds: bounds)
}
set {
// Ideally this would be:
// replaceBytes(in: bounds, with: newValue._base)
// but we do not have access to _base due to 'internal' protection
// TODO: Use a custom Slice type so we have access to the underlying data
let arrayOfBytes = newValue.map { $0 }
arrayOfBytes.withUnsafeBufferPointer {
let otherData = Data(buffer: $0)
replaceBytes(in: bounds, with: otherData)
}
}
}
That said, a custom slice will still not be acceptable to a function that takes a Data. That is consistent with other types, though, like Array, which slices to an ArraySlice which cannot be passed where an Array is expected. This is by design (and likely is for Data as well for the same reasons). The concern is that a slice "pins" all of the memory that backs it. So if you took a 3 byte slice out of a megabyte Data and stored it in an ivar, the entire megabyte has to hang around. The theory (according to Swift devs I spoke with) is that Arrays could be massive, so you need to be careful with slicing them, while Strings are usually much smaller, so it's ok for a String to slice to a String.
In my experience so far, you generally want subdata(in:). My experimentation with it is that it's very similar in speed to slicing, so I believe it's still copy on write (but it doesn't seem to pin the memory either in my initial tests). I've only tested on Mac so far, though. It's possible that there are more significant performance differences on iOS devices.

Based on Rob's comments, I just added the following pythonesque subscript extension:
extension Data {
subscript(start:Int?, stop:Int?) -> Data {
var front = 0
if let start = start {
front = start < 0 ? Swift.max(self.count + start, 0) : Swift.min(start, self.count)
}
var back = self.count
if let stop = stop {
back = stop < 0 ? Swift.max(self.count + stop, 0) : Swift.min(stop, self.count)
}
if front >= back {
return Data()
}
let range = Range(front..<back)
return self.subdata(in: range)
}
}
That way I can just do
let input = Data(bytes: [0x60, 0x0D, 0xF0, 0x0D])
input[nil, nil] // <600df00d>
input[1, 3] // <0df0>
input[-2, nil] // <f00d>
input[nil, -2] // <600d>

Generic Random Function in Swift

I have researched and looked into many different random function, but the downside to them is that they only work for one data type. I want a way to have one function work for a Double, Floats, Int, CGFloat, etc. I know I could just convert it to different data types, but I would like to know if there is a way to have it done using generics. Does someone know a way to make a generic random function using swift. Thanks.

In theory you could port code such as this answer (which is not necessarily a great solution, depending on your needs, since UInt32.max is relatively small)
extension FloatingPointType {
static func rand() -> Self {
let num = Self(arc4random())
let denom = Self(UInt32.max)
// this line won’t compile:
return num / denom
}
}
// then you could do
let d = Double.rand()
let f = Float.rand()
let c = CGFloat.rand()
Except… FloatingPointType doesn’t conform to a protocol that guarantees operators like /, + etc (unlike IntegerType which conforms to _IntegerArithmeticType).
You could do this manually though:
protocol FloatingPointArithmeticType: FloatingPointType {
func /(lhs: Self, rhs: Self) -> Self
// etc
}
extension Double: FloatingPointArithmeticType { }
extension Float: FloatingPointArithmeticType { }
extension CGFloat: FloatingPointArithmeticType { }
extension FloatingPointArithmeticType {
static func rand() -> Self {
// same as before
}
}
// so these now work
let d = Double.rand() // etc
but this probably isn’t quite as out-of-the-box as you were hoping (and no doubt contains some subtle invalid assumption about floating-point numbers on my part).
As for something that works across both integers and floats, the Swift team have mentioned on their mail groups before that the lack of protocols spanning both types is a conscious decision since it’s rarely correct to write the same code for both, and that’s certainly the case for generating random numbers, which need two very different approaches.

Before you get too far into this, think about what you want the behavior of a type-agnostic random function to be, and whether that's something that you want. It sounds like you're proposing something like this:
// signature only
func random<T>() -> T
// example call sites, with specialization by inference from declared result type
let randInt: Int = random()
let randFloat: Float = random()
let randDouble: Double = random()
let randInt64: Int = random()
(Note this syntax is sort of fake: without a parameter of type T, the implementation of random<T>() can't determine which type to return.)
What do you expect the possible values in each of these to be? Is randInt always a value between zero and Int.max? (Or maybe between zero and UInt32.max?) Is randFloat always between 0.0 and 1.0? Should randDouble actually have a larger count of possible values than randFloat (per the increased resolution between 0.0 and 1.0 of the Double type)? How do you account for the fact that Int is actually Int32 on 32-bit systems and Int64 on 64-bit hardware?
Are you sure it makes sense to have two (or more) calls that look identical but return values in different ranges?
Second, do you really want "arbitrarily random" random number generation everywhere in your app/game? Most use of RNGs is in game design, where typically there are a couple of important things you want in your RNG before you get your product past the prototyping stage:
Independence: I've seen games where you could learn to predict the next "random" enemy encounter based on recent "random" NPC chitchat/emotes. If you're using random elements in multiple gameplay systems, you don't want
Determinism: If you want to be able to reproduce a sequence of game events — either for testing/debugging or for consistent results between clients in a networked game — you don't want to be using a random function where you can't control that sequence. arc4random doesn't let you control the initial seed, and you have no control over the sequence because you don't know what other code in your process (library code, or just other code of your own that you forgot about) is also pulling numbers from the generator.
(If you're not making a game... much of this still applies, though it may be less important. You still don't want to be re-running your test case until the heat death of the universe trying to randomly find the same bug that one of your users reported.)
In iOS 9 / OS X 10.11 / tvOS, GameplayKit provides a suite of randomization tools to address these issues.
let randA = GKARC4RandomSource()
let someNumbers = (0..<1000).map { _ in randA.nextInt() }
let randB = GKARC4RandomSource()
let someMoreNumbers = (0..<1000).map { _ in randB.nextInt() }
let randC = GKARC4RandomSource(seed: randA.seed)
let evenMoreNumbers = (0..<1000).map { _ in randC.nextInt() }
Here, someMoreNumbers is always nondeterministic, no matter what happens in the generation of someNumbers or what other randomization activity happens on the system. And evenMoreNumbers is the exact same sequence of numbers as someNumbers.
Okay, so the GameplayKit syntax isn't quite what you want? Well, some of that is a natural consequence of having to manage RNGs as objects so that you can keep them independent and deterministic. But if you really want to have a super-simple random() call that you can slot in wherever needed, regardless of return type, and be independent and deterministic...
Here's one possible recipe for that. It implements random() as a static function on a type extension, so that you can use type-inference shorthand to write it; e.g.:
// func somethingThatTakesAnInt(a: Int, andAFloat Float: b)
somethingThatTakesAnInt(.random(), andAFloat: random())
The first parameter automatically calls Int.random() and the second calls Float.random(). (This is the same mechanism that lets you use shorthand for referring to enum cases, or .max instead of Int.max, etc.)
And it makes the type extensions private, with the idea that different source files will want independent RNGs.
// EnemyAI.swift
private extension Int {
static func random() -> Int {
return EnemyAI.die.nextInt()
}
}
class EnemyAI: NSObject {
static let die = GKARC4RandomSource()
// ...
}
// ProceduralWorld.swift
private extension Int {
static func random() -> Int {
return ProceduralWorld.die.nextInt()
}
}
class ProceduralWorld: NSObject {
static let die = GKARC4RandomSource()
// ...
}
Repeat with extensions for more types as desired.
If you add some breakpoints or logging to the different random() functions you'll see that the two implementations of Int.random() are specific to the file they're in.
Anyway, that's a lot of boilerplate, but perhaps it's good food for thought...

Personally, I'd probably write individual implementations for each thing you wanted. There just aren't that many, and it's a lot safer and more flexible. But… sure, you can do this. Just create a random bit pattern and say "that's a number."
func randomValueOfType<T>(type: T.Type) -> T {
let size = sizeof(type)
let bits = UnsafeMutablePointer<T>.alloc(1)
arc4random_buf(bits, size)
return bits.memory
}
(This is technically "that's a something" but if you pass it something other than number-like types, you'll probably crash because most random bit patterns aren't a legal object.)
I believe every possible bit pattern will generate a legal IEEE 754 number, but "legal" may be more complex than you're thinking. A "totally random" float would rightly include NaN (not-a-number) which will show up reasonably often in your results.
There are some other special cases like the infinities and negative zero, but in practice those should never occur. Any single bit pattern showing up in a random choice of 32-bits is effectively zero. There are lots of NaN bit patterns, so it shows up a lot.
And that's the problem with this whole approach. If your random generator can accept that NaN shows up, then it's probably testing real floating point. But if you're testing real floating point, you really want to be checking the edge cases like infinity and negative zero. But if you don't want to include NaN, then you don't really mean "a random Float" you mean "a random Real number that can be expressed as a Float." There's no type for that, so you would need to write specific logic to handle it, and if you're doing that, you might as well write a specialized random generator for each type.
But this function is probably still a useful foundation for building that. You could just generate values until one is a legal number (NaN doesn't show up that often, so you'll almost certainly get it in less than 2 tries).
This kind of emphasizes the point Airspeed Velocity made about why there's no generic "number" protocol. You usually can't just treat floating point numbers like integers. They just work differently, and you very often need to think about that fact.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse