I'd like to learn more about Swift's Collection Types by creating a custom collection.
The problem is that I can't find any examples of "custom" collection types that don't just use an internal array / dictionary.
These aren't helpful to me, because when it comes time to conform to the collection protocol, the examples just propagate the required methods to the army / dictionary.
That said, after looking through Wikipedia's List of Data Structures, I can't find any that meet the performance characteristics of collection types, that aren't just specialized arrays.
Does anyone know of a data structure that could be implemented with a custom collection type, without using an internal collection type?
EDIT
Collection protocol conformance requires that accesing the startIndex, the endIndex,and the elements of the collection be done constant time - O(1).
EDIT 2
The consensus in the comments seems to be that a LinkedList is a data structure that satisfies these characteristics. My LinkedList is defined as follows:
indirect enum LinkedList<T> {
case value(element: T, next: LinkedList<T>)
case end
}
extension LinkedList: Sequence {
func makeIterator() -> LinkedListIterator<T> {
return LinkedListIterator(current: self)
}
}
struct LinkedListIterator<T>: IteratorProtocol {
var current: LinkedList<T>
mutating func next() -> T? {
switch current {
case let .value(element, next):
current = next
return element
case .end:
return nil
}
}
}
What I still don't understand, is how subscript can be returned in constant time. For the LinkedList:
let data = LinkedList<Int>.value(element: 0, next: LinkedList<Int>.value(element: 1, next: LinkedList<Int>.value(element: 2, next: LinkedList<Int>.value(element: 3, next: LinkedList<Int>.end))))
Assume that I want access to the 3rd element in the Collection:
let example = data[2]
Currently, this is how I have implemented subscript:
subscript (position: Index) -> Element {
precondition(position < endIndex && position >= startIndex)
var iterator = makeIterator()
for i in 0 ..< position {
iterator.next()
if i + 1 == position {
return iterator.next()!
}
}
var zero = makeIterator()
return zero.next()!
}
Because the method's completion time depends on `i, it finishes in linear rather than constant time. How could such a constant time method be implemented?
Related
I have seen this question on SO but the answers there appear to be talking about functions that return self.
I am creating a class extension that starts like this
extension Sequence where Element: Comparable {
func normalize() -> [Element] {
let count = self.count
}
}
I need to get the number of elements self.count and in subsequent lines use the array elements, like self[i] but Swift complains saying that self nas no member called count and will not let me use self in any context.
How do I do that?
In swift, the count property is not defined in Sequence, but in Collection, so you need to extend from Collection instead.
extension Collection where Element: Comparable {
func normalize() -> [Element] {
let count = self.count
}
}
If you also need to access a value of the collection by indexes (self[i]), you should extend RandomAccessCollection instead, which provide both count (because a random access collection is a collection) and subscript function.
extension RandomAccessCollection where Element: Comparable {
func normalize() -> [Element] {
let count = self.count
let first = self[startIndex]
let second = self[index(startIndex, offsetBy: 1)]
return [first, second]
}
}
Note: As RandomAccessCollection indexes are not necessarily an int, you must use the index(_:offsetBy:) function to create an index that can be passed in the subscript method.
I'm trying to write a function that accepts a generic Collection and a single element, then returns the index or indices of this element as an array. However, I'm getting an error at the return statement.
func findAll<T: Collection, U: Equatable>(_ items: T, _ find: U) -> [U] where T.Iterator.Element == U {
var found = [Int]()
for (index, item) in items.enumerated() {
if item == find {
found.append(index)
}
}
return found
}
Ken's answer is good, but Collection can have an Index other than Int. If you try to use the indexes you get back, you many not be able to subscript with them.
Instead of [Int], you want [Self.Index]. So instead of enumerated, you want the more general zip:
extension Collection where Self.Element: Equatable {
func indicesOfElements(equalTo element: Self.Element) -> [Self.Index] {
return zip(self.indices, self) // Zip together the index and element
.filter { $0.1 == element } // Find the ones that match
.map { $0.0 } // Return the elements
}
}
[1,2,3,4,1].indicesOfElements(equalTo: 1) // => [0,4]
That said, a simpler approach would be:
extension Collection where Self.Element: Equatable {
func indicesOfElements(equalTo element: Self.Element) -> [Self.Index] {
return indices.filter { self[$0] == element }
}
}
To see the problem w/ subscripting, consider this:
let s = "abcabc"
let i = s.indicesOfElements(equalTo: "a").first!
s[i] // "a"
let j = findAll(s, "a").first!
s[j] // error: 'subscript' is unavailable: cannot subscript String with an Int, see the documentation comment for discussion
While a protocol extension is the preferred way to do this in Swift, this is directly convertible to the following generic function syntax:
func indicesOfElements<C: Collection>(in collection: C, equalTo element: C.Element) -> [C.Index]
where C.Element: Equatable {
return collection.indices.filter { collection[$0] == element }
}
The following style is also equivalent, and has a slightly shallower learning curve (no need for filter.
func simplerIndicesOfElements<C: Collection>(in collection: C, equalTo element: C.Element) -> [C.Index]
where C.Element: Equatable {
var results: [C.Index] = []
for index in collection.indices {
if collection[index] == element {
results.append(index)
}
}
return results
}
I believe even fairly new Swift developers should learn to read simple filter and map expressions (though these should aways be kept simple, even by experts!) But there is absolutely nothing wrong with learning simple for iteration first.
If you're just getting started, note the naming style here as well. Your initial example included two unnamed parameters. In many (most) cases this is poor Swift. In Swift we generally try to name our parameters such that they read fairly naturally in English. In findAll(xs, x) it's unclear what the parameters are or what the return value will be in. In indicesOfElements(in: xs, equalTo: x), all the information is available at the call site.
In Swift 3, this takes a lot more syntax than you probably expect:
func indicesOfElements<C: Collection>(in collection: C, equalTo element: C.Iterator.Element) -> [C.Index]
where C.Iterator.Element: Equatable, C.Indices.Iterator.Element == C.Index {
// ... Same body ...
}
In Swift 3, associated types can't have additional constraints put on them. This seems a small thing, but it's huge. It means that there's no way to say Collection.Indices actually contains Collection.Index. And there's no way to create a Collection.Element (the type returned by subscripting the collection) that is promised to be the same as Collection.Iterator.Element (the type returned by iterating over the collection). This leads to a lot of C.Iterator... and where clauses that seem obvious like C.Indices.Iterator.Element == C.Index.
If you're near the start of your Swift journey, I'd probably skip over the generics entirely, and write this in terms of [Int]. A very common mistake by Swift programmers (new and "old") is to jump to generic programming before they have any need of it. But if you're at the stage where you're tackling generics, this is how you often have to write them in Swift 3. (Swift 4 is a dramatic improvement for generic programming.)
Since you would like to return the indices of the found elements, you should return an array of Ints instead of [U]. Here's the updated code:
func findAll<T: Collection, U: Equatable>(_ items: T, _ find: U)
-> [Int] where T.Iterator.Element == U {
var found = [Int]()
for (index, item) in items.enumerated() {
if item == find {
found.append(index)
}
}
return found
}
And here's a test case:
let numbers = [1,0,1,0,0,1,1]
let indicesOfOnes = findAll(numbers, 1)
print(indicesOfOnes)
Which will print:
[0, 2, 5, 6]
A while ago, I made a Binary Search Tree type in Swift that I wanted to conform to the Collection protocol. However, the endIndex requirement is a "past the end" index which isn't really appropriate for a tree because each index should hold a reference to its corresponding node for O(1) access. I ended up with an optional reference (being nil in the case of endIndex), but it involved a lot of boilerplate code that I'd rather avoid.
I decided to make a ValidIndexCollection protocol that looks like this:
/// A collection defined by valid indices only, rather than a
/// startIndex and a "past the end" endIndex.
protocol ValidIndexCollection: Collection {
associatedtype ValidIndex: Comparable
/// The first valid index if the collection is nonempty,
/// nil otherwise.
var firstValidIndex: ValidIndex? { get }
/// The last valid index if the collection is nonempty,
/// nil otherwise.
var lastValidIndex: ValidIndex? { get }
/// Returns the index right after the given index.
func validIndex(after index: ValidIndex) -> ValidIndex
/// Returns the element at the given index.
func element(at index: ValidIndex) -> Iterator.Element
}
Before I can extend this protocol to satisfy the Collection requirements, I have to introduce an appropriate index first:
enum ValidIndexCollectionIndex<ValidIndex: Comparable> {
case index(ValidIndex)
case endIndex
}
extension ValidIndexCollectionIndex: Comparable {
// ...
}
Now I can extend ValidIndexCollection:
// Implementing the Collection protocol requirements.
extension ValidIndexCollection {
typealias _Index = ValidIndexCollectionIndex<ValidIndex>
var startIndex: _Index {
return firstValidIndex.flatMap { .index($0) } ?? .endIndex
}
var endIndex: _Index {
return .endIndex
}
func index(after index: _Index) -> _Index {
guard case .index(let validIndex) = index else { fatalError("cannot increment endIndex") }
return .index(self.validIndex(after: validIndex))
}
subscript(index: _Index) -> Iterator.Element {
guard case .index(let validIndex) = index else { fatalError("cannot subscript using endIndex") }
return element(at: validIndex)
}
}
All seems well, the compiler doesn't complain! However, I tried to implement this protocol for a custom type:
struct CollectionOfTwo<Element> {
let first, second: Element
}
extension CollectionOfTwo: ValidIndexCollection {
var firstValidIndex: Int? { return 0 }
var lastValidIndex: Int? { return 1 }
func validIndex(after index: Int) -> Int {
return index + 1
}
subscript(index: Int) -> Element {
return index == 0 ? first : second
}
}
Now the compiler complains that CollectionOfTwo doesn't conform to Collection, Sequence, and IndexableBase. The error messages are very unhelpful, it's mostly messages like:
Protocol requires nested type SubSequence; do you want to add it?
or
Default type DefaultIndices<CollectionOfTwo<Element>> for associated type Indices (from protocol Collection) does not conform to IndexableBase
Is there any way to make this work? As far as I can tell, ValidIndexCollection satisfies the Collection requirements just fine.
Some things to note:
I called the ValidIndexCollection protocol method
validIndex(after:) that way because calling it index(after:)
resulted in a segmentation fault when trying to implement this protocol. That probably has something to do with the
index(after:) method from the Collection protocol.
For the same reason I used element(at:) instead of a subscript.
I used typealias _Index instead of typealias Index because the latter resulted in an error message saying "Index is ambiguous for type lookup in this context". Again, this probably has something to do with Collection's Index associated type.
Adding associatedtype Element to ValidIndexCollection and replacing all occurrences of Iterator.Element by Element fixed it.
Suppose I have some function that I want to populate my data structure using a multi-dimensional array (e.g. a Tensor class):
class Tensor {
init<A>(array:A) { /* ... */ }
}
while I could add in a shape parameter, I would prefer to automatically calculate the dimensions from the array itself. If you know apriori the dimensions, it's trivial to read it off:
let d1 = array.count
let d2 = array[0].count
However, it's less clear how to do it for an N-dimensional array. I was thinking there might be a way to do it by extending the Array class:
extension Int {
func numberOfDims() -> Int {
return 0
}
}
extension Array {
func numberOfDims() -> Int {
return 1+Element.self.numberOfDims()
}
}
Unfortunately, this won't (rightfully so) compile, as numberOfDims isn't defined for most types. However, I'm don't see any way of constraining Element, as Arrays-of-Arrays make things complicated.
I was hoping someone else might have some insight into how to solve this problem (or explain why this is impossible).
If you're looking to get the depth of a nested array (Swift's standard library doesn't technically provide you with multi-dimensional arrays, only jagged arrays) – then, as shown in this Q&A, you can use a 'dummy protocol' and typecasting.
protocol _Array {
var nestingDepth: Int { get }
}
extension Array : _Array {
var nestingDepth: Int {
return 1 + ((first as? _Array)?.nestingDepth ?? 0)
}
}
let a = [1, 2, 3]
print(a.nestingDepth) // 1
let b = [[1], [2, 3], [4]]
print(b.nestingDepth) // 2
let c = [[[1], [2]], [[3]], [[4], [5]]]
print(c.nestingDepth) // 3
(I believe this approach would've still worked when you had originally posted the question)
In Swift 3, this can also be achieved without a dummy protocol, but instead by casting to [Any]. However, as noted in the linked Q&A, this is inefficient as it requires traversing the entire array in order to box each element in an existential container.
Also note that this implementation assumes that you're calling it on a homogenous nested array. As Paul notes, it won't give a correct answer for [[[1], 2], 3].
If this needs to be accounted for, you could write a recursive method which will iterate through each of the nested arrays and returning the minimum depth of the nesting.
protocol _Array {
func _nestingDepth(minimumDepth: Int?, currentDepth: Int) -> Int
}
extension Array : _Array {
func _nestingDepth(minimumDepth: Int?, currentDepth: Int) -> Int {
// for an empty array, the minimum depth is the current depth, as we know
// that _nestingDepth is called where currentDepth <= minimumDepth.
guard !isEmpty else { return currentDepth }
var minimumDepth = minimumDepth
for element in self {
// if current depth has exceeded minimum depth, then return the minimum.
// this allows for the short-circuiting of the function.
if let minimumDepth = minimumDepth, currentDepth >= minimumDepth {
return minimumDepth
}
// if element isn't an array, then return the current depth as the new minimum,
// given that currentDepth < minimumDepth.
guard let element = element as? _Array else { return currentDepth }
// get the new minimum depth from the next nesting,
// and incrementing the current depth.
minimumDepth = element._nestingDepth(minimumDepth: minimumDepth,
currentDepth: currentDepth + 1)
}
// the force unwrap is safe, as we know array is non-empty, therefore minimumDepth
// has been assigned at least once.
return minimumDepth!
}
var nestingDepth: Int {
return _nestingDepth(minimumDepth: nil, currentDepth: 1)
}
}
let a = [1, 2, 3]
print(a.nestingDepth) // 1
let b = [[1], [2], [3]]
print(b.nestingDepth) // 2
let c = [[[1], [2]], [[3]], [[5], [6]]]
print(c.nestingDepth) // 3
let d: [Any] = [ [[1], [2], [[3]] ], [[4]], [5] ]
print(d.nestingDepth) // 2 (the minimum depth is at element [5])
Great question that sent me off on a goose chase!
To be clear: I’m talking below about the approach of using the outermost array’s generic type parameter to compute the number of dimensions. As Tyrelidrel shows, you can recursively examine the runtime type of the first element — although this approach gives nonsensical answers for heterogenous arrays like [[[1], 2], 3].
Type-based dispatch can’t work
As you note, your code as written doesn’t work because numberOfDims is not defined for all types. But is there a workaround? Does this direction lead somewhere?
No, it’s a dead end. The reason is that extension methods are statically dispatched for non-class types, as the following snippet demonstrates:
extension CollectionType {
func identify() {
print("I am a collection of some kind")
}
func greetAndIdentify() {
print("Hello!")
identify()
}
}
extension Array {
func identify() {
print("I am an array")
}
}
[1,2,3].identify() // prints "I am an array"
[1,2,3].greetAndIdentify() // prints "Hello!" and "I am a collection of some kind"
Even if Swift allowed you to extend Any (and it doesn’t), Element.self.numberOfDims() would always call the Any implementation of numberOfDims() even if the runtime type of Element.self were an Array.
This crushing static dispatch limitation means that even this promising-looking approach fails (it compiles, but always returns 1):
extension CollectionType {
var numberOfDims: Int {
return self.dynamicType.numberOfDims
}
static var numberOfDims: Int {
return 1
}
}
extension CollectionType where Generator.Element: CollectionType {
static var numberOfDims: Int {
return 1 + Generator.Element.numberOfDims
}
}
[[1],[2],[3]].numberOfDims // return 1 ... boooo!
This same constraint also applies to function overloading.
Type inspection can’t work
If there’s a way to make it work, it would be something along these lines, which uses a conditional instead of type-based method dispatch to traverse the nested array types:
extension Array {
var numberOfDims: Int {
return self.dynamicType.numberOfDims
}
static var numberOfDims: Int {
if let nestedArrayType = Generator.Element.self as? Array.Type {
return 1 + nestedArrayType.numberOfDims
} else {
return 1
}
}
}
[[1,2],[2],[3]].numberOfDims
The code above compiles — quite confusingly — because Swift takes Array.Type to be a shortcut for Array<Element>.Type. That completely defeats the attempt to unwrap.
What’s the workaround? There isn’t one. This approach can’t work because we need to say “if Element is some kind of Array,” but as far as I know, there’s no way in Swift to say “array of anything,” or “just the Array type regardless of Element.”
Everywhere you mention the Array type, its generic type parameter must be materialized to a concrete type or a protocol at compile time.
Cheating can work
What about reflection, then? There is a way. Not a nice way, but there is a way. Swift’s Mirror is currently not powerful enough to tell us what the element type is, but there is another reflection method that is powerful enough: converting the type to a string.
private let arrayPat = try! NSRegularExpression(pattern: "Array<", options: [])
extension Array {
var numberOfDims: Int {
let typeName = "\(self.dynamicType)"
return arrayPat.numberOfMatchesInString(
typeName, options: [], range: NSMakeRange(0, typeName.characters.count))
}
}
Horrid, evil, brittle, probably not legal in all countries — but it works!
Unfortunately I was not able to do this with a Swift array but you can easily convert a swift array to an NSArray.
extension NSArray {
func numberOfDims() -> Int {
var count = 0
if let x = self.firstObject as? NSArray {
count += x.numberOfDims() + 1
} else {
return 1
}
return count
}
}
Given a struct-based generic CollectionType …
struct MyCollection<Element>: CollectionType, MyProtocol {
typealias Index = MyIndex<MyCollection>
subscript(i: Index) -> Element { … }
func generate() -> IndexingGenerator<MyCollection> {
return IndexingGenerator(self)
}
}
… how would one define an Index for it …
struct MyIndex<Collection: MyProtocol>: BidirectionalIndexType {
func predecessor() -> MyIndex { … }
func successor() -> MyIndex { … }
}
… without introducing a dependency cycle of death?
The generic nature of MyIndex is necessary because:
It should work with any type of MyProtocol.
MyProtocol references Self and thus can only be used as a type constraint.
If there were forward declarations (à la Objective-C) I would just[sic!] add one for MyIndex<MyCollection> to my MyCollection<…>. Alas, there is no such thing.
A possible concrete use case would be binary trees, such as:
indirect enum BinaryTree<Element>: CollectionType, BinaryTreeType {
typealias Index = BinaryTreeIndex<BinaryTree>
case Nil
case Node(BinaryTree, Element, BinaryTree)
subscript(i: Index) -> Element { … }
}
Which would require a stack-based Index:
struct BinaryTreeIndex<BinaryTree: BinaryTreeType>: BidirectionalIndexType {
let stack: [BinaryTree]
func predecessor() -> BinaryTreeIndex { … }
func successor() -> BinaryTreeIndex { … }
}
One cannot (yet?) nest structs inside generic structs in Swift.
Otherwise I'd just move BinaryTreeIndex<…> inside BinaryTree<…>.
Also I'd prefer to have one generic BinaryTreeIndex,
which'd then work with any type of BinaryTreeType.
You cannot nest structs inside structs because they are value types. They aren’t pointers to an object, instead they hold their properties right there in the variable. Think about if a struct contained itself, what would its memory layout look like?
Forward declarations work in Objective-C because they are then used as pointers. This is why the indirect keyword was added to enums - it tells the compiler to add a level of indirection via a pointer.
In theory the same keyword could be added to structs, but it wouldn’t make much sense. You could do what indirect does by hand instead though, with a class box:
// turns any type T into a reference type
final class Box<T> {
let unbox: T
init(_ x: T) { unbox = x }
}
You could the use this to box up a struct to create, e.g., a linked list:
struct ListNode<T> {
var box: Box<(element: T, next: ListNode<T>)>?
func cons(x: T) -> ListNode<T> {
return ListNode(node: Box(element: x, next: self))
}
init() { box = nil }
init(node: Box<(element: T, next: ListNode<T>)>?)
{ box = node }
}
let nodes = ListNode().cons(1).cons(2).cons(3)
nodes.box?.unbox.element // first element
nodes.box?.unbox.next.box?.unbox.element // second element
You could turn this node directly into a collection, by conforming it to both ForwardIndexType and CollectionType, but this isn’t a good idea.
For example, they need very different implementations of ==:
the index needs to know if two indices from the same list are at the same position. It does not need the elements to conform to Equatable.
The collection needs to compare two different collections to see if they hold the same elements. It does need the elements to conform to Equatable i.e.:
func == <T where T: Equatable>(lhs: List<T>, rhs: List<T>) -> Bool {
// once the List conforms to at least SequenceType:
return lhs.elementsEqual(rhs)
}
Better to wrap it in two specific types. This is “free” – the wrappers have no overhead, just help you build the right behaviours more easily:
struct ListIndex<T>: ForwardIndexType {
let node: ListNode<T>
func successor() -> ListIndex<T> {
guard let next = node.box?.unbox.next
else { fatalError("attempt to advance past end") }
return ListIndex(node: next)
}
}
func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
switch (lhs.node.box, rhs.node.box) {
case (nil,nil): return true
case (_?,nil),(nil,_?): return false
case let (x,y): return x === y
}
}
struct List<T>: CollectionType {
typealias Index = ListIndex<T>
var startIndex: Index
var endIndex: Index { return ListIndex(node: ListNode()) }
subscript(idx: Index) -> T {
guard let element = idx.node.box?.unbox.element
else { fatalError("index out of bounds") }
return element
}
}
(no need to implement generate() – you get an indexing generator “for free” in 2.0 by implementing CollectionType)
You now have a fully functioning collection:
// in practice you would add methods to List such as
// conforming to ArrayLiteralConvertible or init from
// another sequence
let list = List(startIndex: ListIndex(node: nodes))
list.first // 3
for x in list { print(x) } // prints 3 2 1
Now all of this code looks pretty disgusting for two reasons.
One is because box gets in the way, and indirect is much better as the compiler sorts it all out for you under the hood. But it’s doing something similar.
The other is that structs are not a good solution to this. Enums are much better. In fact the code is really using an enum – that’s what Optional is. Only instead of nil (i.e. Optional.None), it would be better to have a End case for the end of the linked list. This is what we are using it for.
For more of this kind of stuff you could check out these posts.
While Airspeed Velocity's answer applies to the most common cases, my question was asking specifically about the special case of generalizing CollectionType indexing in order to be able to share a single Index implementation for all thinkable kinds of binary trees (whose recursive nature makes it necessary to make use of a stack for index-based traversals (at least for trees without a parent pointer)), which requires the Index to be specialized on the actual BinaryTree, not the Element.
The way I solved this problem was to rename MyCollection to MyCollectionStorage, revoke its CollectionType conformity and wrap it with a struct that now takes its place as MyCollection and deals with conforming to CollectionType.
To make things a bit more "real" I will refer to:
MyCollection<E> as SortedSet<E>
MyCollectionStorage<E> as BinaryTree<E>
MyIndex<T> as BinaryTreeIndex<T>
So without further ado:
struct SortedSet<Element>: CollectionType {
typealias Tree = BinaryTree<Element>
typealias Index = BinaryTreeIndex<Tree>
subscript(i: Index) -> Element { … }
func generate() -> IndexingGenerator<SortedSet> {
return IndexingGenerator(self)
}
}
struct BinaryTree<Element>: BinaryTreeType {
}
struct BinaryTreeIndex<BinaryTree: BinaryTreeType>: BidirectionalIndexType {
func predecessor() -> BinaryTreeIndex { … }
func successor() -> BinaryTreeIndex { … }
}
This way the dependency graph turns from a directed cyclic graph into a directed acyclic graph.