Strange behaviour for recursive enum in Swift (Beta 7) - swift

enum Tree{
case Leaf(String)
case Node(Tree)
} //compiler not happy!!
enum Tree{
case Leaf(String)
case Node([Tree])
} //compiler is happy in (arguably) a more complex recursive scenario?
How can the Swift compiler work for the second (more complex) scenario and not the first?

It is worth noting that Swift 2 beta 2 and further has indirect keyword for recursive enum - that means
enum Tree<T> {
case Leaf(T)
indirect case Node(Tree)
}
is valid language construct that doesn't break pattern matching in Swift 2.
TL;DR of the decision: "[…] we decided that the right solution is to simply not support general, non-obvious recursion through enums, and require the programmer to mediate that explicitly with indirect."

A value type (an enum) cannot contain itself as a direct member, since not matter how big a data structure is, it cannot contain itself. Apparently associated data of enum cases are considered direct members of the enum, so the associated data cannot be the type of the enum itself. (Actually, I wish that they would make recursive enums work; it would be so great for functional data structures.)
However, if you have a level of indirection, it is okay. For example, the associated data can be an object (instance of a class), and that class can have a member that is the enum. Since class types are reference types, it is just a pointer and does not directly contain the object (and thus the enum), so it is fine.
The answer to your question is: [Tree] does not contain Tree directly as a member. The fields of Array are private, but we can generally infer that the storage for the elements of the array are not stored in the Array struct directly, because this struct has a fixed size for a given Array<T>, but the array can have unlimited number of elements.

Chris Lattner (designer of Swift) says on the Apple Developer forums that autoclosure
has emerged as a way to "box" expression value in a reference (e.g.
working around limitations with recursive enums).
However, the following code (which works in Swift 1.1) does not work in Swift 1.2 that comes with Xcode Beta Version 6.3 (6D520o). The error message is "Attributes can only be applied to declarations, not types", however if this is intended, I don't know how to reconcile it with Lattner's statement about the behaviour he talks about in the previous quote as being "a useful thing, and we haven't removed it with Swift 1.2."
enum BinaryTree {
case Leaf(String)
case Node(#autoclosure () -> BinaryTree, #autoclosure () -> BinaryTree)
}
let l1 = BinaryTree.Leaf("A")
let l2 = BinaryTree.Leaf("B")
let l3 = BinaryTree.Leaf("C")
let l4 = BinaryTree.Leaf("D")
let n1 = BinaryTree.Node(l1, l2)
let n2 = BinaryTree.Node(l3, l4)
let t = BinaryTree.Node(n1, n2)

Related

When exactly do I need "indirect" with writing recursive enums?

I'm surprised to find out that this compiles:
enum Foo {
case one([Foo])
// or
// case one([String: Foo])
}
I would have expected the compiler to tell me to add indirect to the enum cases, as Foo contains more Foos. Arrays and Dictionarys are all value types, so there are no indirection here. To me, this is structurally similar to this:
enum Foo {
case one(Bar<Foo>)
}
// arrays and dictionaries are just a struct with either something or nothing in it
// right?
struct Bar<T> {
let t: T?
}
... which does require me to add indirect to the enum.
Then I tried tuples:
enum Foo {
case one((Foo, Foo))
}
and this is considered a recursive enum too. I thought arrays and tuples have the same memory layout, and that is why you can convert arrays to tuples using withUnsafeBytes and then binding the memory...
What rules does this follow? Are there actually something special about arrays and dictionaries? If so, are there other such "special" types that looks like it should require indirect, but actually doesn't?
The Swift Guide is not very helpful. It just says:
A recursive enumeration is an enumeration that has another instance of the enumeration as the associated value for one or more of the enumeration cases. You indicate that an enumeration case is recursive by writing indirect before it, which tells the compiler to insert the necessary layer of indirection.
What "has another instance of the enumeration" means is rather vague. It is even possible to interpret it to mean that case one(Bar<Foo>) is not recursive - the case only has an instance of Bar<Foo>, not Foo.
I think part of the confusion stems from this assumption:
I thought arrays and tuples have the same memory layout, and that is why you can convert arrays to tuples using withUnsafeBytes and then binding the memory...
Arrays and tuples don't have the same memory layout:
Array<T> is a fixed-size struct with a pointer to a buffer which holds the array elements contiguously* in memory
Contiguity is promised only in the case of native Swift arrays [not bridged from Objective-C]. NSArray instances do not guarantee that their underlying storage is contiguous, but in the end this does not have an effect on the code below.
Tuples are fixed-size buffers of elements held contiguously in memory
The key thing is that the size of an Array<T> does not change with the number of elements held (its size is simply the size of a pointer to the buffer), while a tuple does. The tuple is more equivalent to the buffer the array holds, and not the array itself.
Array<T>.withUnsafeBytes calls Array<T>.withUnsafeBufferPointer, which returns the pointer to the buffer, not to the array itself. *(In the case of a non-contiguous bridged NSArray, _ArrayBuffer.withUnsafeBufferPointer has to create a temporary contiguous copy of its contents in order to return a valid buffer pointer to you.)
When laying out memory for types, the compiler needs to know how large the type is. Given the above, an Array<Foo> is statically known to be fixed in size: the size of one pointer (to a buffer elsewhere in memory).
Given
enum Foo {
case one((Foo, Foo))
}
in order to lay out the size of Foo, you need to figure out the maximum size of all of its cases. It has only the single case, so it would be the size of that case.
Figuring out the size of one requires figuring out the size of its associated value, and the size of a tuple of elements is the sum of the size of the elements themselves (taking into account padding and alignment, but we don't really care about that here).
Thus, the size of Foo is the size of one, which is the size of (Foo, Foo) laid out in memory. So, what is the size of (Foo, Foo)? Well, it's the size of Foo + the size of Foo... each of which is the size of Foo + the size of Foo... each of which is the size of Foo + the size of Foo...
Where Array<Foo> had a way out (Array<T> is the same size regardless of T), we're stuck in an infinite loop with no base case.
indirect is the keyword required to break out of the recursion and give this infinite reference a base case. It inserts an implicit pointer by making a given case the fixed size of a pointer, regardless of what it contains or points to. That makes the size of one fixed, which allows Foo to have a fixed size.
indirect is less about Foo referring to Foo in any way, and more about allowing an enum case to potentially contain itself indirectly (because direct containment would lead to an infinite loop).
As an aside, this is also why a struct cannot contain a direct instance of itself:
struct Foo {
let foo: Foo // error: Value type 'Foo' cannot have a stored property that recursively contains it
}
would lead to infinite recursion, while
struct Foo {
let foo: UnsafePointer<Foo>
}
is fine.
structs don't support the indirect keyword (at least in a struct, you have more direct control over storage and layout), but there have been pitches for adding support for this on the Swift forums.
Indirect on enum cases can be used to tell the compiler to store the associated value of that enum as a pointer. Reference.
This is necessary when the compiler couldn't figure out the memory layout of the associated value without using a pointer, such as in the case of recursive enums.
The reason why you need to make the below case indirect is because by using a generic struct that holds your enum as an associated value on a case of the enum itself, you end up with a recursive enum, since you can nest Foo and Bar in each other infinitely many times.
enum MaybeRecursive {
case array([MaybeRecursive])
case dict([String: MaybeRecursive])
indirect case genericStruct(GenericStruct<MaybeRecursive>)
}
struct GenericStruct<T> {
let t: T?
}
// if you replaced `.array` with `.genericStruct` again, you could keep going forever
let associatedValue = GenericStruct(t: MaybeRecursive.genericStruct(.init(t: .array([]))))
MaybeRecursive.genericStruct(associatedValue)
As for the Array vs Tuple difference, Itai Ferber has expertly explained that part in their own answer.

Differences generic protocol type parameter vs direct protocol type

This is my playground code:
protocol A {
init(someInt: Int)
}
func direct(a: A) {
// Doesn't work
let _ = A.init(someInt: 1)
}
func indirect<T: A>(a: T) {
// Works
let _ = T.init(someInt: 1)
}
struct B: A {
init(someInt: Int) {
}
}
let a: A = B(someInt: 0)
// Works
direct(a: a)
// Doesn't work
indirect(a: a)
It gives a compile time error when calling method indirect with argument a. So I understand <T: A> means some type that conforms to A. The type of my variable a is A and protocols do not conform to themselfs so ok, I understand the compile time error.
The same applies for the compile time error inside method direct. I understand it, a concrete conforming type needs to inserted.
A compile time also arrises when trying to access a static property in direct.
I am wondering. Are there more differences in the 2 methods that are defined? I understand that I can call initializers and static properties from indirect and I can insert type A directly in direct and respectively, I can not do what the other can do. But is there something I missed?
The key confusion is that Swift has two concepts that are spelled the same, and so are often ambiguous. One of the is struct T: A {}, which means "T conforms to the protocol A," and the other is var a: A, which means "the type of variable a is the existential of A."
Conforming to a protocol does not change a type. T is still T. It just happens to conform to some rules.
An "existential" is a compiler-generated box the wraps up a protocol. It's necessary because types that conform to a protocol could be different sizes and different memory layouts. The existential is a box that gives anything that conforms to protocol a consistent layout in memory. Existentials and protocols are related, but not the same thing.
Because an existential is a run-time box that might hold any type, there is some indirection involved, and that can introduce a performance impact and prevents certain optimizations.
Another common confusion is understanding what a type parameter means. In a function definition:
func f<T>(param: T) { ... }
This defines a family of functions f<T>() which are created at compile time based on what you pass as the type parameter. For example, when you call this function this way:
f(param: 1)
a new function is created at compile time called f<Int>(). That is a completely different function than f<String>(), or f<[Double]>(). Each one is its own function, and in principle is a complete copy of all the code in f(). (In practice, the optimizer is pretty smart and may eliminate some of that copying. And there are some other subtleties related to things that cross module boundaries. But this is a pretty decent way to think about what is going on.)
Since specialized versions of generic functions are created for each type that is passed, they can in theory be more optimized, since each version of the function will handle exactly one type. The trade-off is that they can add code-bloat. Do not assume "generics are faster than protocols." There are reasons that generics may be faster than protocols, but you have to actually look at the code generation and profile to know in any particular case.
So, walking through your examples:
func direct(a: A) {
// Doesn't work
let _ = A.init(someInt: 1)
}
A protocol (A) is just a set of rules that types must conform to. You can't construct "some unknown thing that conforms to those rules." How many bytes of memory would be allocated? What implementations would it provide to the rules?
func indirect<T: A>(a: T) {
// Works
let _ = T.init(someInt: 1)
}
In order to call this function, you must pass a type parameter, T, and that type must conform to A. When you call it with a specific type, the compiler will create a new copy of indirect that is specifically designed to work with the T you pass. Since we know that T has a proper init, we know the compiler will be able to write this code when it comes time to do so. But indirect is just a pattern for writing functions. It's not a function itself; not until you give it a T to work with.
let a: A = B(someInt: 0)
// Works
direct(a: a)
a is an existential wrapper around B. direct() expects an existential wrapper, so you can pass it.
// Doesn't work
indirect(a: a)
a is an existential wrapper around B. Existential wrappers do not conform to protocols. They require things that conform to protocols in order to create them (that's why they're called "existentials;" the fact that you created one proves that such a value actually exists). But they don't, themselves, conform to protocols. If they did, then you could do things like what you've tried to do in direct() and say "make a new instance of an existential wrapper without knowing exactly what's inside it." And there's no way to do that. Existential wrappers don't have their own method implementations.
There are cases where an existential could conform to its own protocol. As long as there are no init or static requirements, there actually isn't a problem in principle. But Swift can't currently handle that. Because it can't work for init/static, Swift currently forbids it in all cases.

Swift dynamic type checking for structs?

I'm confused about dynamic type checking in Swift.
Specifically, I have a weirdo case where I want to, essentially, write (or find) a function:
func isInstanceOf(obj: Any, type: Any.Type) -> Bool
In Objective-C, this is isKindOfClass, but that won't work because Any.Type includes Swift structs, which are not classes (much less NSObject subclasses).
I can't use Swift is here, because that requires a hardcoded type.
I can't use obj.dynamicType == type because that would ignore subclasses.
The Swift book seems to suggest that this information is lost, and not available for structs at all:
Classes have additional capabilities that structures do not:
...
Type casting enables you to check and interpret the type of a class instance at runtime.
(On the Type Casting chapter, it says "Type casting in Swift is implemented with the is and as operators", so it seems to be a broader definition of "type casting" than in other languages.)
However, it can't be true that is/as don't work with structures, since you can put Strings and Ints into an [Any], and pull them out later, and use is String or is Int to figure out what they were. The Type Casting chapter of the Swift Book does exactly this!
Is there something that's as powerful as isKindOfClass but for any Swift instances? This information still must exist at runtime, right?
Actually you can use is operator.
Use the type check operator (is) to check whether an instance is of a certain subclass type. The type check operator returns true if the instance is of that subclass type and false if it is not.
Since struct can't be subclassed, is is guaranteed to be consistent when applied to instance of struct because it will check on it static type, and for classes it will query the dynamic type in runtime.
func `is`<T>(instance: Any, of kind: T.Type) -> Bool{
return instance is T;
}
This work for both, struct and class.
As already stated, is/as should work fine with structs. Other corner cases can usually be done with generic functions, something like:
let thing: Any = "hi" as Any
func item<T: Any>(_ item: Any, isType: T.Type) -> Bool {
return (item as? T) != nil
}
print(item(thing, isType: String.self)) // prints "true"
No idea if this fits your specific use case.
More generally, keep in mind that Swift (currently) requires knowing the type of everything at compile time. If you can't define the specific type that will be passed into a function or set as a variable, neither will the compiler.

indirect enums and structs

To start off, I want to say that I'm aware there are many articles and questions within SO that refer to the indirect keyword in Swift.
The most popular explanation for the usage of indirect is to allow for recursive enums.
Rather than just knowing about what indirect allows us to do, I would like to know how it allows us to use recursive enums.
Questions:
Is it because enums are value types and value types do not scale well if they are built in a recursive structure? Why?
Does indirect modify the value type behaviour to behave more like a reference type?
The following two examples compile just fine. What is the difference?
indirect enum BinaryTree<T> {
case node(BinaryTree<T>, T, BinaryTree<T>)
case empty
}
enum BinaryTree<T> {
indirect case node(BinaryTree<T>, T, BinaryTree<T>)
case empty
}
The indirect keyword introduces a layer of indirection behind the scenes.
You indicate that an enumeration case is recursive by writing indirect before it, which tells the compiler to insert the necessary layer of indirection.
From here
The important part of structs and enums is that they're a constant size. Allowing recursive structs or enums directly would violate this, as there would be an indeterminable number of recursions, hence making the size non constant and unpredictable. indirect uses a constant size reference to refer to a constant size enum instance.
There's a different between the two code snippets you show.
The first piece of code makes BinaryTree<T> stored by a reference everywhere it's used.
The second piece of code makes BinaryTree<T> stored by a reference only in the case of node. I.e. BinaryTree<T> generally has its value stored directly, except for this explicitly indirect node case.
Swift indirect enum
Since Swift v2.0
Swift Enum[About] is a value type[About], and we assign it the value is copied that is why the size of type should be calculated at compile time.
Problem with associated value
enum MyEnum { //Recursive enum <enum_name> is not marked
case case1(MyEnum) `indirect`
}
it is not possible to calculate the final size because of recursion
Indirect says to compiler to store the associated value indirectly - by reference(instead of value)
indirect enum - is stored as reference for all cases
indirect case - is stored as reference only for this case
Also indirect is not applied for other value types(struct)
You can use indirect enum. It's not exactly struct, but it is also a value type. I don't think struct has similar indirect keyword support.
From Hacking with Swift post:
Indirect enums are enums that need to reference themselves somehow, and are called “indirect” because they modify the way Swift stores them so they can grow to any size. Without the indirection, any enum that referenced itself could potentially become infinitely sized: it could contain itself again and again, which wouldn’t be possible.
As an example, here’s an indirect enum that defines a node in a linked list:
indirect enum LinkedListItem<T> {
case endPoint(value: T)
case linkNode(value: T, next: LinkedListItem)
}
Because that references itself – because one of the associated values is itself a linked list item – we need to mark the enum as being indirect.

Recursive Enumerations in Swift

I'm learning Swift 2 (and C, but also not for long) for not too long and I came to a point where I struggle a lot with recursive enumerations.
It seems that I need to put indirect before the enum if it is recursive. Then I have the first case which has Int between the parentheses because later in the switch it returns an Integer, is that right?
Now comes the first problem with the second case Addition. There I have to put ArithmeticExpression between the parentheses. I tried putting Int there but it gave me an error that is has to be an ArithmeticExpression instead of an Int. My question is why? I can't imagine anything what that is about. Why can't I just put two Ints there?
The next problem is about ArithmeticExpression again. In the func solution it goes in an value called expression which is of the type ArithmeticExpression, is that correct? The rest is, at least for now, completely clear. If anyone could explain that to me in an easy way, that'd be great.
Here is the full code:
indirect enum ArithmeticExpression {
case Number(Int)
case Addition(ArithmeticExpression, ArithmeticExpression)
}
func solution(expression: ArithmeticExpression) -> Int {
switch expression {
case .Number(let value1):
return value1;
case . Addition(let value1, let value2):
return solution(value1)+solution(value2);
}
}
var ten = ArithmeticExpression.Number(10);
var twenty = ArithmeticExpression.Number(20);
var sum = ArithmeticExpression.Addition(ten, twenty);
var endSolution = solution(sum);
print(endSolution);
PeterPan, I sometimes think that examples that are TOO realistic confuse more than help as it’s easy to get bogged down in trying to understand the example code.
A recursive enum is just an enum with associated values that are cases of the enum's own type. That's it. Just an enum with cases that can be set to associated values of the same type as the enum. #endof
Why is this a problem? And why the key word "indirect" instead of say "recursive"? Why the need for any keyword at all?
Enums are "supposed" to be copied by value which means they should have case associated values that are of predictable size - made up of cases with the basic types like Integer and so on. The compiler can then guess the MAXIMUM possible size of a regular enum by the types of the raw or associated values with which it could be instantiated. After all you get an enum with only one of the cases selected - so whatever is the biggest option of the associated value types in the cases, that's the biggest size that enum type could get on initialisation. The compiler can then set aside that amount of memory on the stack and know that any initialisation or re-assignment of that enum instance could never be bigger than that. If the user sets the enum to a case with a small size associated value it is OK, and also if the user sets it to a case with the biggest associated value type.
However as soon as you define an enum which has a mixture of cases with different sized associated types, including values that are also enums of the same type (and so could themselves be initialised with any of the enums cases) it becomes impossible to guess the maximum size of the enum instance. The user could keep initialising with a case that allows an associated value that is the same type as the enum - itself initialised with a case that is also the same type, and so on and so on: an endless recursion or tree of possibilities. This recursion of enums pointing to enums will continue until an enum is initialised with associated value of "simple" type that does not point to another enum. Think of a simple Integer type that would “terminate” the chain of enums.
So the compiler cannot set aside the correct sized chunk of memory on the stack for this type of enum. Instead it treats the case associated values as POINTERS to the heap memory where the associated value is stored. That enum can itself point to another enum and so on. That is why the keyword is "indirect" - the associated value is referenced indirectly via a pointer and not directly by a value.
It is similar to passing an inout parameter to a function - instead of the compiler copying the value into the function, it passes a pointer to reference the original object in the heap memory.
So that's all there is to it. An enum that cannot easily have its maximum size guessed at because it can be initialised with enums of the same type and unpredictable sizes in chains of unpredictable lengths.
As the various examples illustrate, a typical use for such an enum is where you want to build-up trees of values like a formula with nested calculations within parentheses, or an ancestry tree with nodes and branches all captured in one enum on initialisation. The compiler copes with all this by using pointers to reference the associated value for the enum instead of a fixed chunk of memory on the stack.
So basically - if you can think of a situation in your code where you want to have chains of enums pointing to each other, with various options for associated values - then you will use, and understand, a recursive enum!
The reason the Addition case takes two ArithmeticExpressions instead of two Ints is so that it could handle recursive situations like this:
ArithmeticExpression.Addition(ArithmeticExpression.Addition(ArithmeticExpression.Number(1), ArithmeticExpression.Number(2)), ArithmeticExpression.Number(3))
or, on more than one line:
let addition1 = ArithmeticExpression.Addition(ArithmeticExpression.Number(1), ArithmeticExpression.Number(2))
let addition2 = ArithmeticExpression.Addition(addition1, ArithmeticExpression.Number(3))
which represents:
(1 + 2) + 3
The recursive definition allows you to add not just numbers, but also other arithmetic expressions. That's where the power of this enum lies: it can express multiple nested addition operations.