Creating a ValidIndexCollection protocol in Swift 3 - swift

A while ago, I made a Binary Search Tree type in Swift that I wanted to conform to the Collection protocol. However, the endIndex requirement is a "past the end" index which isn't really appropriate for a tree because each index should hold a reference to its corresponding node for O(1) access. I ended up with an optional reference (being nil in the case of endIndex), but it involved a lot of boilerplate code that I'd rather avoid.
I decided to make a ValidIndexCollection protocol that looks like this:
/// A collection defined by valid indices only, rather than a
/// startIndex and a "past the end" endIndex.
protocol ValidIndexCollection: Collection {
associatedtype ValidIndex: Comparable
/// The first valid index if the collection is nonempty,
/// nil otherwise.
var firstValidIndex: ValidIndex? { get }
/// The last valid index if the collection is nonempty,
/// nil otherwise.
var lastValidIndex: ValidIndex? { get }
/// Returns the index right after the given index.
func validIndex(after index: ValidIndex) -> ValidIndex
/// Returns the element at the given index.
func element(at index: ValidIndex) -> Iterator.Element
}
Before I can extend this protocol to satisfy the Collection requirements, I have to introduce an appropriate index first:
enum ValidIndexCollectionIndex<ValidIndex: Comparable> {
case index(ValidIndex)
case endIndex
}
extension ValidIndexCollectionIndex: Comparable {
// ...
}
Now I can extend ValidIndexCollection:
// Implementing the Collection protocol requirements.
extension ValidIndexCollection {
typealias _Index = ValidIndexCollectionIndex<ValidIndex>
var startIndex: _Index {
return firstValidIndex.flatMap { .index($0) } ?? .endIndex
}
var endIndex: _Index {
return .endIndex
}
func index(after index: _Index) -> _Index {
guard case .index(let validIndex) = index else { fatalError("cannot increment endIndex") }
return .index(self.validIndex(after: validIndex))
}
subscript(index: _Index) -> Iterator.Element {
guard case .index(let validIndex) = index else { fatalError("cannot subscript using endIndex") }
return element(at: validIndex)
}
}
All seems well, the compiler doesn't complain! However, I tried to implement this protocol for a custom type:
struct CollectionOfTwo<Element> {
let first, second: Element
}
extension CollectionOfTwo: ValidIndexCollection {
var firstValidIndex: Int? { return 0 }
var lastValidIndex: Int? { return 1 }
func validIndex(after index: Int) -> Int {
return index + 1
}
subscript(index: Int) -> Element {
return index == 0 ? first : second
}
}
Now the compiler complains that CollectionOfTwo doesn't conform to Collection, Sequence, and IndexableBase. The error messages are very unhelpful, it's mostly messages like:
Protocol requires nested type SubSequence; do you want to add it?
or
Default type DefaultIndices<CollectionOfTwo<Element>> for associated type Indices (from protocol Collection) does not conform to IndexableBase
Is there any way to make this work? As far as I can tell, ValidIndexCollection satisfies the Collection requirements just fine.
Some things to note:
I called the ValidIndexCollection protocol method
validIndex(after:) that way because calling it index(after:)
resulted in a segmentation fault when trying to implement this protocol. That probably has something to do with the
index(after:) method from the Collection protocol.
For the same reason I used element(at:) instead of a subscript.
I used typealias _Index instead of typealias Index because the latter resulted in an error message saying "Index is ambiguous for type lookup in this context". Again, this probably has something to do with Collection's Index associated type.

Adding associatedtype Element to ValidIndexCollection and replacing all occurrences of Iterator.Element by Element fixed it.

Related

Why I cannot access self in this particular method but am allowed in others?

I have seen this question on SO but the answers there appear to be talking about functions that return self.
I am creating a class extension that starts like this
extension Sequence where Element: Comparable {
func normalize() -> [Element] {
let count = self.count
}
}
I need to get the number of elements self.count and in subsequent lines use the array elements, like self[i] but Swift complains saying that self nas no member called count and will not let me use self in any context.
How do I do that?
In swift, the count property is not defined in Sequence, but in Collection, so you need to extend from Collection instead.
extension Collection where Element: Comparable {
func normalize() -> [Element] {
let count = self.count
}
}
If you also need to access a value of the collection by indexes (self[i]), you should extend RandomAccessCollection instead, which provide both count (because a random access collection is a collection) and subscript function.
extension RandomAccessCollection where Element: Comparable {
func normalize() -> [Element] {
let count = self.count
let first = self[startIndex]
let second = self[index(startIndex, offsetBy: 1)]
return [first, second]
}
}
Note: As RandomAccessCollection indexes are not necessarily an int, you must use the index(_:offsetBy:) function to create an index that can be passed in the subscript method.

Custom Collections Without Internal Collection Types?

I'd like to learn more about Swift's Collection Types by creating a custom collection.
The problem is that I can't find any examples of "custom" collection types that don't just use an internal array / dictionary.
These aren't helpful to me, because when it comes time to conform to the collection protocol, the examples just propagate the required methods to the army / dictionary.
That said, after looking through Wikipedia's List of Data Structures, I can't find any that meet the performance characteristics of collection types, that aren't just specialized arrays.
Does anyone know of a data structure that could be implemented with a custom collection type, without using an internal collection type?
EDIT
Collection protocol conformance requires that accesing the startIndex, the endIndex,and the elements of the collection be done constant time - O(1).
EDIT 2
The consensus in the comments seems to be that a LinkedList is a data structure that satisfies these characteristics. My LinkedList is defined as follows:
indirect enum LinkedList<T> {
case value(element: T, next: LinkedList<T>)
case end
}
extension LinkedList: Sequence {
func makeIterator() -> LinkedListIterator<T> {
return LinkedListIterator(current: self)
}
}
struct LinkedListIterator<T>: IteratorProtocol {
var current: LinkedList<T>
mutating func next() -> T? {
switch current {
case let .value(element, next):
current = next
return element
case .end:
return nil
}
}
}
What I still don't understand, is how subscript can be returned in constant time. For the LinkedList:
let data = LinkedList<Int>.value(element: 0, next: LinkedList<Int>.value(element: 1, next: LinkedList<Int>.value(element: 2, next: LinkedList<Int>.value(element: 3, next: LinkedList<Int>.end))))
Assume that I want access to the 3rd element in the Collection:
let example = data[2]
Currently, this is how I have implemented subscript:
subscript (position: Index) -> Element {
precondition(position < endIndex && position >= startIndex)
var iterator = makeIterator()
for i in 0 ..< position {
iterator.next()
if i + 1 == position {
return iterator.next()!
}
}
var zero = makeIterator()
return zero.next()!
}
Because the method's completion time depends on `i, it finishes in linear rather than constant time. How could such a constant time method be implemented?

Declare a Swift protocol which has a property return value CollectionType<Int>?

Is something like
protocol A {
var intCollection: CollectionType<Int> { get }
}
or
protocol A {
typealias T: CollectionType where T.Generator.Element == Int
var intCollection: T
}
possible in Swift 2.1?
Update for Swift 4
Swift 4 now support this feature! read more in here
Not as a nested protocol, but it's fairly straightforward using the type erasers (the "Any" structs).
protocol A {
var intCollection: AnyRandomAccessCollection<Int> { get }
}
This is actually often quite convenient for return values because the caller usually doesn't care so much about the actual type. You just have to throw a return AnyRandomAccessCollection(resultArray) at the end of your function and it all just works. Lots of stdlib now returns Any erasers. For the return value problem, it's almost always the way I recommend. It has the nice side effect of making A concrete, so it's much easier to work with.
If you want to keep the CollectionType, then you need to restrict it at the point that you create a function that needs it. For example:
protocol A {
typealias IntCollection: CollectionType
var intCollection: IntCollection { get }
}
extension A where IntCollection.Generator.Element == Int {
func sum() -> Int {
return intCollection.reduce(0, combine: +)
}
}
This isn't ideal, since it means you can have A with the wrong kind of collection type. They just won't have a sum method. You also will find yourself repeating that "where IntCollection.Generator.Element == Int" in a surprising number of places.
In my experience, it is seldom worth this effort, and you quickly come back to Arrays (which are the dominant CollectionType anyway). But when you need it, these are the two major approaches. That's the best we have today.
You can't do this upright as in your question, and there exists several thread here on SO on the subject of using protocols as type definitions, with content that itself contains Self or associated type requirements (result: this is not allowed). See e.g. the link provided by Christik, or thread Error using associated types and generics.
Now, for you example above, you could do the following workaround, however, perhaps mimicing the behaviour you're looking for
protocol A {
typealias MyCollectionType
typealias MyElementType
func getMyCollection() -> MyCollectionType
func printMyCollectionType()
func largestValue() -> MyElementType?
}
struct B<U: Comparable, T: CollectionType where T.Generator.Element == U>: A {
typealias MyCollectionType = T
typealias MyElementType = U
var myCollection : MyCollectionType
init(coll: MyCollectionType) {
myCollection = coll
}
func getMyCollection() -> MyCollectionType {
return myCollection
}
func printMyCollectionType() {
print(myCollection.dynamicType)
}
func largestValue() -> MyElementType? {
guard var largestSoFar = myCollection.first else {
return nil
}
for item in myCollection {
if item > largestSoFar {
largestSoFar = item
}
}
return largestSoFar
}
}
So you can implement blueprints for your generic collection types in you protocol A, and implement these blueprints in the "interface type" B, which also contain the actual collection as a member property. I have taken the largestValue() method above from here.
Example usage:
/* Examples */
var myArr = B<Int, Array<Int>>(coll: [1, 2, 3])
var mySet = B<Int, Set<Int>>(coll: [10, 20, 30])
var myRange = B<Int, Range<Int>>(coll: 5...10)
var myStrArr = B<String, Array<String>>(coll: ["a", "c", "b"])
myArr.printMyCollectionType() // Array<Int>
mySet.printMyCollectionType() // Set<Int>
myRange.printMyCollectionType() // Range<Int>
myStrArr.printMyCollectionType() // Array<String>
/* generic T type constrained to protocol 'A' */
func printLargestValue<T: A>(coll: T) {
print(coll.largestValue() ?? "Empty collection")
}
printLargestValue(myArr) // 3
printLargestValue(mySet) // 30
printLargestValue(myRange) // 10
printLargestValue(myStrArr) // c

Circular dependencies between generic types (CollectionType and its Index/Generator, e.g.)

Given a struct-based generic CollectionType …
struct MyCollection<Element>: CollectionType, MyProtocol {
typealias Index = MyIndex<MyCollection>
subscript(i: Index) -> Element { … }
func generate() -> IndexingGenerator<MyCollection> {
return IndexingGenerator(self)
}
}
… how would one define an Index for it …
struct MyIndex<Collection: MyProtocol>: BidirectionalIndexType {
func predecessor() -> MyIndex { … }
func successor() -> MyIndex { … }
}
… without introducing a dependency cycle of death?
The generic nature of MyIndex is necessary because:
It should work with any type of MyProtocol.
MyProtocol references Self and thus can only be used as a type constraint.
If there were forward declarations (à la Objective-C) I would just[sic!] add one for MyIndex<MyCollection> to my MyCollection<…>. Alas, there is no such thing.
A possible concrete use case would be binary trees, such as:
indirect enum BinaryTree<Element>: CollectionType, BinaryTreeType {
typealias Index = BinaryTreeIndex<BinaryTree>
case Nil
case Node(BinaryTree, Element, BinaryTree)
subscript(i: Index) -> Element { … }
}
Which would require a stack-based Index:
struct BinaryTreeIndex<BinaryTree: BinaryTreeType>: BidirectionalIndexType {
let stack: [BinaryTree]
func predecessor() -> BinaryTreeIndex { … }
func successor() -> BinaryTreeIndex { … }
}
One cannot (yet?) nest structs inside generic structs in Swift.
Otherwise I'd just move BinaryTreeIndex<…> inside BinaryTree<…>.
Also I'd prefer to have one generic BinaryTreeIndex,
which'd then work with any type of BinaryTreeType.
You cannot nest structs inside structs because they are value types. They aren’t pointers to an object, instead they hold their properties right there in the variable. Think about if a struct contained itself, what would its memory layout look like?
Forward declarations work in Objective-C because they are then used as pointers. This is why the indirect keyword was added to enums - it tells the compiler to add a level of indirection via a pointer.
In theory the same keyword could be added to structs, but it wouldn’t make much sense. You could do what indirect does by hand instead though, with a class box:
// turns any type T into a reference type
final class Box<T> {
let unbox: T
init(_ x: T) { unbox = x }
}
You could the use this to box up a struct to create, e.g., a linked list:
struct ListNode<T> {
var box: Box<(element: T, next: ListNode<T>)>?
func cons(x: T) -> ListNode<T> {
return ListNode(node: Box(element: x, next: self))
}
init() { box = nil }
init(node: Box<(element: T, next: ListNode<T>)>?)
{ box = node }
}
let nodes = ListNode().cons(1).cons(2).cons(3)
nodes.box?.unbox.element // first element
nodes.box?.unbox.next.box?.unbox.element // second element
You could turn this node directly into a collection, by conforming it to both ForwardIndexType and CollectionType, but this isn’t a good idea.
For example, they need very different implementations of ==:
the index needs to know if two indices from the same list are at the same position. It does not need the elements to conform to Equatable.
The collection needs to compare two different collections to see if they hold the same elements. It does need the elements to conform to Equatable i.e.:
func == <T where T: Equatable>(lhs: List<T>, rhs: List<T>) -> Bool {
// once the List conforms to at least SequenceType:
return lhs.elementsEqual(rhs)
}
Better to wrap it in two specific types. This is “free” – the wrappers have no overhead, just help you build the right behaviours more easily:
struct ListIndex<T>: ForwardIndexType {
let node: ListNode<T>
func successor() -> ListIndex<T> {
guard let next = node.box?.unbox.next
else { fatalError("attempt to advance past end") }
return ListIndex(node: next)
}
}
func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
switch (lhs.node.box, rhs.node.box) {
case (nil,nil): return true
case (_?,nil),(nil,_?): return false
case let (x,y): return x === y
}
}
struct List<T>: CollectionType {
typealias Index = ListIndex<T>
var startIndex: Index
var endIndex: Index { return ListIndex(node: ListNode()) }
subscript(idx: Index) -> T {
guard let element = idx.node.box?.unbox.element
else { fatalError("index out of bounds") }
return element
}
}
(no need to implement generate() – you get an indexing generator “for free” in 2.0 by implementing CollectionType)
You now have a fully functioning collection:
// in practice you would add methods to List such as
// conforming to ArrayLiteralConvertible or init from
// another sequence
let list = List(startIndex: ListIndex(node: nodes))
list.first // 3
for x in list { print(x) } // prints 3 2 1
Now all of this code looks pretty disgusting for two reasons.
One is because box gets in the way, and indirect is much better as the compiler sorts it all out for you under the hood. But it’s doing something similar.
The other is that structs are not a good solution to this. Enums are much better. In fact the code is really using an enum – that’s what Optional is. Only instead of nil (i.e. Optional.None), it would be better to have a End case for the end of the linked list. This is what we are using it for.
For more of this kind of stuff you could check out these posts.
While Airspeed Velocity's answer applies to the most common cases, my question was asking specifically about the special case of generalizing CollectionType indexing in order to be able to share a single Index implementation for all thinkable kinds of binary trees (whose recursive nature makes it necessary to make use of a stack for index-based traversals (at least for trees without a parent pointer)), which requires the Index to be specialized on the actual BinaryTree, not the Element.
The way I solved this problem was to rename MyCollection to MyCollectionStorage, revoke its CollectionType conformity and wrap it with a struct that now takes its place as MyCollection and deals with conforming to CollectionType.
To make things a bit more "real" I will refer to:
MyCollection<E> as SortedSet<E>
MyCollectionStorage<E> as BinaryTree<E>
MyIndex<T> as BinaryTreeIndex<T>
So without further ado:
struct SortedSet<Element>: CollectionType {
typealias Tree = BinaryTree<Element>
typealias Index = BinaryTreeIndex<Tree>
subscript(i: Index) -> Element { … }
func generate() -> IndexingGenerator<SortedSet> {
return IndexingGenerator(self)
}
}
struct BinaryTree<Element>: BinaryTreeType {
}
struct BinaryTreeIndex<BinaryTree: BinaryTreeType>: BidirectionalIndexType {
func predecessor() -> BinaryTreeIndex { … }
func successor() -> BinaryTreeIndex { … }
}
This way the dependency graph turns from a directed cyclic graph into a directed acyclic graph.

Extend CollectionType add indexOutOfRange function

I'm trying to add a function that tells me if an index is out of range in an array.
the startIndex and endIndex of CollectionType seem to be generic, so I'm trying to restrict the extension only when the index type is Int.
This code does not compile:
extension CollectionType where Index.Type is Int {
public func psoIndexOutOfRange(index: Index.Type) -> Bool{
return index < self.startIndex || index > self.endIndex
}
}
Is it possible? and what would be the correct way to add this.
Personally I think this would be better as an extension to Range, rather than to CollectionType:
extension Range where T: Comparable {
func contains(element: Generator.Element) -> Bool {
return element >= startIndex && element < endIndex
}
}
which you could call like so (indices returns the range from the collection’s start to end index):
[1,2,3].indices.contains(2)
Note, CollectionType (which Range conforms to) already has a contains method – but done via linear search. This overloads contains for ranges specifically to do it in constant time.
Also, if you're doing this in order to combine it with a subscript fetch, consider adding an optional fetch to make things easier:
extension CollectionType where Index: Comparable {
subscript(safe idx: Index) -> Generator.Element? {
guard indices.contains(idx) else { return nil }
return self[idx]
}
}
let a = [1,2,3]
a[safe: 4] // nil
How about:
extension CollectionType where Index: Comparable {
public func psoIndexOutOfRange(index: Index) -> Bool{
return index < self.startIndex || index >= self.endIndex
}
}
As #MartinR suggested, it's more general if you use Comparable instead of constraining Index to be of type Int.