Efficiency of using filter{where:} vs. removeAll{where:} when modifying a parameter value - swift

Swift 4.2 introduced a new removeAll {where:} function. From what I have read, it is supposed to be more efficient than using filter {where:}. I have several scenarios in my code like this:
private func getListOfNullDates(list: [MyObject]) -> [MyObject] {
return list.filter{ $0.date == nil }
.sorted { $0.account?.name < $1.account?.name }
}
However, I cannot use removeAll{where:} with a param because it is a constant. So I would need to redefine it like this:
private func getListOfNullDates(list: [MyObject]) -> [MyObject] {
var list = list
list.removeAll { $0.date == nil }
return list.sorted { $0.account?.name < $1.account?.name }
}
Is using the removeAll function still more efficient in this scenario? Or is it better to stick with using the filter function?

Thank you for this question 🙏🏻
I've benchmarked both functions using this code on TIO:
let array = Array(0..<10_000_000)
do {
let start = Date()
let filtering = array.filter { $0 % 2 == 0 }
let end = Date()
print(filtering.count, filtering.last!, end.timeIntervalSince(start))
}
do {
let start = Date()
var removing = array
removing.removeAll { $0 % 2 == 0 }
let end = Date()
print(removing.count, removing.last!, end.timeIntervalSince(start))
}
(To have the removing and filtering identical, the closure passed to removeAll should have been { $0 % 2 != 0 }, but I didn't want to give an advantage to either snippet by using a faster or slower comparison operator.)
And indeed, removeAll(where:) is faster when the probability of removing elements (let's call it Pr)is 50%! Here are the benchmark results :
filter : 94ms
removeAll : 74ms
This is the same case when Pr is less than 50%.
Otherwise, filtering is faster for a higher Pr.
One thing to bear in mind is that in your code list is mutable, and that opens the possibility for accidental modifications.
Personally, I would choose performance over old habits, and in a sense, this use case is more readable since the intention is clearer.
Bonus : Removing in-place
What's meant by removing in-place is the swap the elements in the array in such a way that the elements to be removed are placed after a certain pivot index. The elements to keep are the ones before the pivot element :
var inPlace = array
let p = inPlace.partition { $0 % 2 == 0 }
Bear in mind that partition(by:) doesn't keep the original order.
This approach clocks better than removeAll(where:)

Beware of premature optimization. The efficiency of a method often depends on your specific data and configuration, and unless you're working with a large data set or performing many operations at once, it's not likely to have a significant impact either way. Unless it does, you should favor the more readable and maintainable solution.
As a general rule, just use removeAll when you want to mutate the original array and filter when you don't. If you've identified it as a potential bottleneck in your program, then test it to see if there's a performance difference.

Related

How can I compare two Arrays in Swift and mutate one of the arrays if two Items are the same

I want to compare two Arrays with each other, that means each single item of them.
I need to run some code if two items in this arrays are the same.
I've done that so far with two For-Loops, but someone in the comments says that's not that good (because I get an Error too.
Has anybody an Idea which Code I can use to reach that?
Btw: That's the code I used before:
var index = 0
for Item1 in Array1 {
for Item2 in Array2 {
if (Item1 == Item2) {
// Here I want to put some code if the items are the same
}
}
// Index is in this case the counter which counts on which place in the Array1 I am.
index += 1
}
Okay I'll try again to describe what I mean:
I want to compare two Arrays. If there are some Items the same, I want to delete that Item from Array A / Array 1.
Okay if that is working, I want to add a few more statements that alow me to delete a Item only if an parameter of this item has a special worth, but I think I can do this step alone.
If you want to compare items from different array you need to add Equatable protocol for your Item
For example:
struct Item: Equatable {
let name: String
static func ==(l: Item, r: Item) -> Bool {
return l.name == r.name
}
}
You need to decide by which attributes you want to compare your Item. In my case I compare by name.
let array1: [Item] = [
.init(name: "John"),
.init(name: "Jack"),
.init(name: "Soer"),
.init(name: "Kate")
]
let array2: [Item] = [
.init(name: "John"),
]
for item1 in array1 {
if array2.contains(item1) {
// Array 2 contains item from the array1 and you can perform your code.
}
}
If you want to support this
Okay I'll try again to describe what I mean: I want to compare two
Arrays. If there are some Items the same, I want to delete that Item
from Array A / Array 1. Okay if that is working, I want to add a few
more statements that alow me to delete a Item only if an parameter of
this item has a special worth, but I think I can do this step alone.
I guess it can fit for you
You need to make your array1 mutable
var array1: [Item] = ...
Filter the array1 like this
let filteredArray1 = array1.filter { (item) -> Bool in
return !array2.contains(item)
}
And redefine your array1 with filtered array.
array1 = filteredArray1
Let me know it it works for you.
var itemsToRemove = array1.filter { array2.contains($0) }
for item in itemsToRemove {
if let i = array1.firstIndex(of: item) {
// run your conditional code here.
array1.remove(at: i)
}
}
Update
The question has been restated that elements are to be removed from one of the arrays.
You don't provide the actual code where you get the index out of bounds error, but it's almost certainly because you don't account for having removed elements when using the index, so you're probably indexing into a region that was valid at the start, but isn't anymore.
My first advice is don't do the actual removal inside the search loop. There is a solution to achieve the same result, which I'll give, but apart from having to be very careful about indexing into the shortened array, there is also a performance issue: Every deletion requires Array to shift all the later elements down one. That means that every deletion is an O(n) operation, which makes an the overall algorithim O(n^3).
So what do you do instead? The simplest method is to create a new array containing only the elements you wish to keep. For example, let's say you want to remove from array1 all elements that are also in array2:
array1 = array1.filter { !array2.contains($0) }
I should note that one of the reasons I kept my original answer below is because you can use those methods to replace array2.contains($0) to achieve better performance in some cases, and the original answer describes those cases and how to achieve it.
Also even though the closure is used to determine whether or not to keep the element, so it has to return a Bool, there is nothing that prevents you from putting additional code in it to do any other work you might want to:
array1 = array1.filter
{
if array2.contains($0)
{
doSomething(with: $0)
return false // don't keep this element
}
return true
}
The same applies to all of the closures below.
You could just use the removeAll(where:) method of Array.
array1.removeAll { array2.contains($0) }
In this case, if the closure returns true, it means "remove this element" which is the opposite sense of the closure used in filter, so you have to take that into account if you do additional work in it.
I haven't looked up how the Swift library implements removeAll(where:) but I'm assuming it does its job the efficient way rather than the naive way. If you find the performance isn't all that good you could roll your own version.
extension Array where Element: Equatable
{
mutating func efficientRemoveAll(where condition: (Element) -> Bool)
{
guard count > 0 else { return }
var i = startIndex
var iEnd = endIndex
while i < iEnd
{
if condition(self[i])
{
iEnd -= 1
swapAt(i, iEnd)
continue // note: skips incrementing i,iEnd has been decremented
}
i += 1
}
removeLast(endIndex - iEnd)
}
}
array1.efficientRemoveAll { array2.contains($0) }
Instead of actually removing the elements inside the loop, this works by swapping them with the end element, and handling when to increment appropriately. This collects the elements to be removed at the end, where they can be removed in one go after the loop finishes. So the deletion is just one O(n) pass at the end, which avoids increasing the algorithmic complexity that removing inside the loop would entail.
Because of the swapping with the current "end" element, this method doesn't preserve the relative order of the elements in the mutating array.
If you want to preserve the order you can do it like this:
extension Array where Element: Equatable
{
mutating func orderPreservingRemoveAll(where condition: (Element) -> Bool)
{
guard count > 0 else { return }
var i = startIndex
var j = i
repeat
{
swapAt(i, j)
if !condition(self[i]) { i += 1 }
j += 1
} while j != endIndex
removeLast(endIndex - i)
}
}
array1.orderPreservingRemoveAll { array2.contains($0) }
If I had to make a bet, this would be very close to how standard Swift's removeAll(where:) for Array is implemented. It keeps two indices, one for the current last element to be kept, i, and one for the next element to be examined, j. This has the effect of accumulating elements to be removed at the end (those past i).
Original answer
The previous solutions are fine for small (yet still surprisingly large) arrays, but they are O(n^2) solutions.
If you're not mutating the arrays, they can be expressed more succinctly
array1.filter { array2.contains($0) }.forEach { doSomething($0) }
where doSomething doesn't actually have to be a function call - just do whatever you want to do with $0 (the common element).
You'll normally get better performance by putting the smaller array inside the filter.
Special cases for sorted arrays
If one of your arrays is sorted, then you might get better performance in a binary search instead of contains, though that will require that your elements conform to Comparable. There isn't a binary search in the Swift Standard Library, and I also couldn't find one in the new swift-algorithms package, so you'd need to implement it yourself:
extension RandomAccessCollection where Element: Comparable, Index == Int
{
func sortedContains(_ element: Element) -> Bool
{
var range = self[...]
while range.count > 0
{
let midPoint = (range.startIndex + range.endIndex) / 2
if range[midPoint] == element { return true }
range = range[midPoint] > element
? self[..<midPoint]
: self[index(after: midPoint)...]
}
return false
}
}
unsortedArray.filter { sortedArray.sortedContains($0) }.forEach { doSomething($0) }
This will give O(n log n) performance. However, you should test it for your actual use case if you really need performance, because binary search is not especially friendly for the CPU's branch predictor, and it doesn't take that many mispredictions to result in slower performance than just doing a linear search.
If both arrays are sorted, you can get even better performance by exploiting that, though again you have to implement the algorithm because it's not supplied by the Swift Standard Library.
extension Array where Element: Comparable
{
// Assumes no duplicates
func sortedIntersection(_ sortedOther: Self) -> Self
{
guard self.count > 0, sortedOther.count > 0 else { return [] }
var common = Self()
common.reserveCapacity(Swift.min(count, sortedOther.count))
var i = self.startIndex
var j = sortedOther.startIndex
var selfValue = self[i]
var otherValue = sortedOther[j]
while true
{
if selfValue == otherValue
{
common.append(selfValue)
i += 1
j += 1
if i == self.endIndex || j == sortedOther.endIndex { break }
selfValue = self[i]
otherValue = sortedOther[j]
continue
}
if selfValue < otherValue
{
i += 1
if i == self.endIndex { break }
selfValue = self[i]
continue
}
j += 1
if j == sortedOther.endIndex { break }
otherValue = sortedOther[j]
}
return common
}
}
array1.sortedIntersection(array2).forEach { doSomething($0) }
This is O(n) and is the most efficient of any of the solutions I'm including, because it makes exactly one pass through both arrays. That's only possible because both of the arrays are sorted.
Special case for large array of Hashable elements
However, if your arrays meet some criteria, you can get O(n) performance by going through Set. Specifically if
The arrays are large
The array elements conform to Hashable
You're not mutating the arrays
Make the larger of the two arrays a Set:
let largeSet = Set(largeArray)
smallArray.filter { largeSet.contains($0) }.forEach { doSomething($0) }
For the sake of analysis if we assume both arrays have n elements, the initializer for the Set will be O(n). The intersection will involve testing each element of the smallArray's elements for membership in largeSet. Each one of those tests is O(1) because Swift implements Set as a hash table. There will be n of those tests, so that's O(n). Then the worst case for forEach will be O(n). So O(n) overall.
There is, of course, overhead in creating the Set, the worst of which is the memory allocation involved, so it's not worth it for small arrays.

Swift string vs [Character] and performance

From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}
First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?

Does initializing an array from a set have a complexity and if so what is it?

Swift: Option 1
var dictionaryWithoutDuplicates = [Int: Int]()
for item in arrayWithDuplicates {
if dictionaryWithoutDuplicates[item] == nil {
dictionaryWithoutDuplicates[item] = 1
}
}
print(dictionaryWithoutDuplicates.keys)
// [1,2,3,4]
Option 2
let arrayWithDuplicates = [1,2,3,3,2,4,1]
let arrayWithoutDuplicates = Array(Set(arrayWithDuplicates))
print(arrayWithoutDuplicates)
// [1,2,3,4]
For the first option there might be a more elegant way to do it but that's not my point, I just wanted to show an example that has a complexity of n.
Both options return an array without duplicates. Since the first option has a complexity of O(n), I was wondering if the second option even has a complexity and if so what is it?
What you did is pretty much exactly what Set does. A Set<T> is pretty much just a [T: Void] (a.k.a. Dictionary<T, Void>).
Both examples have O(arrayWithDuplicates.count) time and space complexity.

Improving performance of higher order functions vs loops in Swift (3.0)

import XCTest
class testTests: XCTestCase {
static func makearray() -> [Int] {
var array: [Int] = []
for x in 0..<1000000 {
array.append(x)
}
return array
}
let array = testTests.makearray()
func testPerformanceExample() {
self.measure {
var result: [String] = []
for x in self.array {
let tmp = "string\(x)"
if tmp.hasSuffix("1234") {
result.append(tmp)
}
}
print(result.count)
}
}
func testPerformanceExample2() {
self.measure {
let result = self.array.map { "string\($0)" }
.filter { $0.hasSuffix("1234") }
print(result.count)
}
}
func testPerformanceExample3() {
self.measure {
let result = self.array.flatMap { int -> String? in
let tmp = "string\(int)"
return tmp.hasSuffix("1234") ? tmp : nil
}
print(result.count)
}
}
}
In this code I am trying to see how the higher order functions perform with respect to processing a large array.
The 3 tests produce the same results with times of around 0.75s for loop, 1.38s map/filter, 1.21s flatmap.
Assuming HOFs are, more or less, functions wrapping loops, this makes sense as in the map/filter case, it is looping through the first array for map, then looping through the result of that to filter.
In the case of flatmap, is it doing the map first, and then able to do a simpler filter operation?
Is my understanding of what is happening under the hood (roughly) correct?
If so, would it be fair to say that the compiler is not able to do much optimisation of this?
And finally, is there a better way of doing this? The HOF versions are definitely easier for me to understand, but for performance critical areas, it looks like for-loops are the way to go?
The flatmap approach is likely to be nearly equivalent to the loop approach. Algorithmically, it is equivalent. I would add that even the map/filter approach, in this instance, should be "nearly" as fast because the bulk of the running time in taken by the operations on strings.
For good performance, one wants to avoid working with temporary strings. We can achieve the desired result as follows...
func fastloopflatmap (_ test_array: [Int]) -> [String] {
var result: [String] = []
for x in array {
if x % 10000 == 1234 {
result.append("string\(x)")
}
}
return result;
}
Here are my timings :
loop : 632 ns/element
filter/map : 650 ns/element
flatmap : 632 ns/element
fast loop : 1.2 ns/element
Thus, as you can see, the bulk (99%) of the running time of the slow functions is due to the operation over temporary strings.
Source code : https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/flatmap
Technically speaking, these HOF could have better Big O performance if the compiler substituted an implementation that used multiple cores in parallel. However, Swift does not do this at this time. A step further would be using granularity control to use an iterative or parallel implementation appropriately, based on weighing the overhead costs of parallelization vs. the input size:
https://en.wikipedia.org/wiki/Granularity_(parallel_computing)

Data <-> MutableRandomAccessSlice

I am really struggling with the fact that someData[start...stop] returns a MutableRandomAccessSlice. My someData was a let to begin with, so why would I want a Mutable thing? Why don't I get just a RandomAccessSlice. What's really frustrating though, is that it returns a thing that is pretty API incompatible with the original source. With a Data, I can use .withUnsafeBytes, but not so with this offspring. And how you turn the Slice back into a Data isn't clear either. There is no init that takes one of those.
I could use the subdata(in:) method instead of subscripting, but then, what's the point of the subscript if I only ever want a sub collection representation that behaves like the original collection. Furthermore, the subdata method can only do open subranges, why the subscript can do both closed and open. Is this just something they haven't quite finished up for Swift3 final yet?
Remember that the MutableRandomAccessSlice you get back is a value type, not a reference type. It just means you can modify it if you like, but it has nothing to do with the thing you sliced it out of:
let x = Data(bytes: [1,2,3]) // <010203>
var y = x[0...1]
y[0] = 2
x // <010203>
If you look in the code, you'll note that the intent is to return a custom slice type:
public subscript(bounds: Range<Index>) -> MutableRandomAccessSlice<Data> {
get {
return MutableRandomAccessSlice(base: self, bounds: bounds)
}
set {
// Ideally this would be:
// replaceBytes(in: bounds, with: newValue._base)
// but we do not have access to _base due to 'internal' protection
// TODO: Use a custom Slice type so we have access to the underlying data
let arrayOfBytes = newValue.map { $0 }
arrayOfBytes.withUnsafeBufferPointer {
let otherData = Data(buffer: $0)
replaceBytes(in: bounds, with: otherData)
}
}
}
That said, a custom slice will still not be acceptable to a function that takes a Data. That is consistent with other types, though, like Array, which slices to an ArraySlice which cannot be passed where an Array is expected. This is by design (and likely is for Data as well for the same reasons). The concern is that a slice "pins" all of the memory that backs it. So if you took a 3 byte slice out of a megabyte Data and stored it in an ivar, the entire megabyte has to hang around. The theory (according to Swift devs I spoke with) is that Arrays could be massive, so you need to be careful with slicing them, while Strings are usually much smaller, so it's ok for a String to slice to a String.
In my experience so far, you generally want subdata(in:). My experimentation with it is that it's very similar in speed to slicing, so I believe it's still copy on write (but it doesn't seem to pin the memory either in my initial tests). I've only tested on Mac so far, though. It's possible that there are more significant performance differences on iOS devices.
Based on Rob's comments, I just added the following pythonesque subscript extension:
extension Data {
subscript(start:Int?, stop:Int?) -> Data {
var front = 0
if let start = start {
front = start < 0 ? Swift.max(self.count + start, 0) : Swift.min(start, self.count)
}
var back = self.count
if let stop = stop {
back = stop < 0 ? Swift.max(self.count + stop, 0) : Swift.min(stop, self.count)
}
if front >= back {
return Data()
}
let range = Range(front..<back)
return self.subdata(in: range)
}
}
That way I can just do
let input = Data(bytes: [0x60, 0x0D, 0xF0, 0x0D])
input[nil, nil] // <600df00d>
input[1, 3] // <0df0>
input[-2, nil] // <f00d>
input[nil, -2] // <600d>