Relation between Existential Container and struct instance which conform protocol - swift

I'm trying to understand how to find protocol method's implementation.
I know that Swift uses an Existential Container for fixed-size storage in Stack memory which manages how to describe an instance of a struct in memory. and it has a Value Witness Table (VWT) and Protocol Witness Table (PWT)
VWTs know how to manage real value in instance of struct (their lifecycle) and PWTs know the implementation of protocol's method.
But I want know the relation between the instance of a struct and the "existential container".
Does an instance of struct have a pointer which refers to an existential container?
How does an instance of a struct know its existential container?

Preface: I don't know how much background knowledge you have, so I might over-explain to make sure my answer is clear.
Also, I'm doing this to the best of my ability, off by memory. I might mix up some details, but hopefully this answer could at least point you towards further reading.
See also:
https://stackoverflow.com/a/41490551/3141234
https://github.com/apple/swift/blob/main/docs/SIL.rst#id201
In Swift, protocols can be used "as a type", or as a generic constraint. The latter case looks like so:
protocol SomeProtocol {}
struct SomeConformerSmall: SomeProtocol {
// No ivars
}
struct SomeConformerBig: SomeProtocol {
let a, b, c, d, e, f, g: Int // Lots of ivars
}
func fooUsingGenerics<T: SomeProtocol>(_: T) {}
let smallObject = SomeConformerSmall()
let bigObject = SomeConformerBig()
fooUsingGenerics(smallObject)
fooUsingGenerics(bigObject)
The protocol is used as a constraint for type-checking at compile time, but nothing particularly special happens at runtime (for the most part). Most of the time, the compiler will produced monomorphized variants of the foo function, as if you had defined fooUsingGenerics(_: SomeConformerSmall) or fooUsingGenerics(_: SomeConformerBig) to begin with.
When a protocol is "used like a type", it would look like this:
func fooUsingProtcolExistential(_: SomeProtocol) {}
fooUsingGenerics(smallObject)
fooUsingGenerics(bigObject)
As you see, this function can be called using both smallObject and bigObject. The problem is that these two objects have different sizes. This is a problem: how will the compiler know how much stack space is necessary to allocate for the arguments of this function, if the arguments can be different sizes? It must do something to help fooUsingProtcolExistential accommodate that.
Existential containers are the solution. When you pass a value where a protocol type is expected, the Swift compiler will generate code that automagically boxes that value into an "existential container" for you. As currently defined, an existential container is 4 words in size:
The first word is a pointer to the Protocol Witness Table (more on this later)
The next three words are inline storage for the value.
When the value being stored is less than 3 words in size (e.g. SomeConformerSmall), the value is packed directly inline into that 3 word buffer. If the value is more than 3 words in size (e.g. SomeConformerSmall), a ARC-managed box is allocated on the heap, and the value is copied into there. A pointer to this box is then copied into the first word of the existential container (the last 2 words are unused, IIRC).
This introduces a new issue: suppose that fooUsingProtcolExistential wanted to forward along its parameter to another function. How should it pass the EC? fooUsingProtcolExistential doesn't know whether the EC contains a value-inline (in which case, passing the EC just entails copying its 4 words of memory), or heap-allocated (in which case, passing the EC also requires an ARC retain on that heap-allocated buffer).
To remedy this, the Protocol Witness Table contains a pointer to a Value Witness Table (VWT). Each VWT defines the a standard set of function pointers, that define how the EC can be allocated, copied, deleted, etc. Whenever a protocol existential needs to be manipulated in someway, the VWT defines exactly how to do so.
So now we have a constant-size container (which solves our heterogeneously-sized parameter passing problem), and a way to move the container around. What can we actually do with it?
Well at a minimum, values of this protocol type must at least define the required members (initializers, properties (stored or computed), functions and subscripts) that the protocol defines.
But each conforming type might implement these members in a different way. E.g. some struct might satisfy a method requirement by defining the method directly, but another class might satisfy it by inheriting the method from a superclass. Some might implement a property as a stored property, others as a computed property, etc.
Handling these incompatibilities is the primary purpose of the Protocol Witness Table. There's one of these tables per protocol conformance (e.g. one for SomeConformerSmall and one for SomeConformerBig). They contain a set of function pointers with point to the implementations of the protocols' requirements. While the pointed-to functions might be in different places, the PWT's layout is consistent for the protocol is conforms to. As a result, fooUsingProtcolExistential is able to look at the PWT of an EC, and use it to find the implementation of a protocol method, and call it.
So in short:
An EC contains a PWT and a value (inline or indirect)
A PWT points to a VWT

My understanding:
Struct doesn't know where existential container/value witness table/protocol witness table is, the compiler knows. If needed somewhere, compiler pass them to there.

Related

In Swift, how to get the true size of an `Any` variable?

I want to be able to get the size of the underlying data type of an Any variable in Swift. I expected this to be possible by running MemoryLayout.size(ofValue: anyObject), but that expression always returns 32, regardless of the the underlying data type of the Any object. I assume 32 is the size of the internal Any construct/type, which holds metadata about the object it stores.
How do I get the underlying data type's size?
let regularInt: Int = 1
let anyInt: Any = Int(2) as Any
MemoryLayout<Int>.size // 4
MemoryLayout<type(of: anyInt)>.size // Can't do that
MemoryLayout.size(ofValue: regularInt) // 4
MemoryLayout.size(ofValue: anyInt) // 32
// How do I get size "4" out of `anyInt`?
I'll begin with some technical details about the limitations of Any in this case.
So, what is Any? It's an empty protocol to which every type implicitly conforms to.
And how does the compiler represent variables of protocol types? It's by wrapping the actual value in an existential container. So basically when you're referencing a variable of this kind, you're actually talking to the container (well, actually not you, but the compiler is :).
An existential container has a layout that can be represented like this C structure:
struct OpaqueExistentialContainer {
void *fixedSizeBuffer[3];
Metadata *type;
WitnessTable *witnessTables[NUM_WITNESS_TABLES];
};
The container elements are greatly explained in this document, I'll also try to summarize them here:
fixedSizeBuffer either holds the whole value, if it takes less than 24 bytes, or holds a pointer to a heap allocated zone, containing the value
type is a pointer to the type metadata
witnessTables is what makes this layout occupy various sizes, as the number of protocol witness tables can vary from zero to virtually any number of protocols.
So, with the above in mind:
Any needs no witness tables, thus it occupies 32 bytes
a single protocol variable occupies 40 byes
a composed protocol variable occupies 32 + N*8, where N is the number of "independent" protocols involved in the composition
Note that the above is true if there are no class protocols involved, if a class protocol is involved, then the existential container layout is a little bit simplified, this is also better described in the linked document from above.
Now, back to the problem from the question, it's the existential container created by the compiler the one which prevents you from accessing the actual type. The compiler doesn't make this structure available, and transparently translates any calls to protocol requirements to dispatches through the witness tables stored in the container.
But, might I ask you, why are you circulating Any? I assume you don't want to handle all possible and future types in a generic manner. A marker protocol might help here:
protocol MemoryLayouted { }
extension MemoryLayouted {
var memoryLayoutSize: Int { MemoryLayout.size(ofValue: self) }
}
Then all you have left to do is to add conformance for the types you want to support:
extension Int: MemoryLayouted { }
extension String: MemoryLayouted { }
extension MyAwesomeType: MemoryLayouted { }
With the above in mind, you can rewrite your initial code to something like this:
let regularInt: Int = 1
let anyInt: MemoryLayouted = 2
print(regularInt.memoryLayoutSize) // 8
print(anyInt.memoryLayoutSize) // 8
You get consistent behaviour and type safety, a type safety that might translate to a more stable application.
P.S. A hacky approach, that allows you to use Any, might pe possible by unpacking the existential container via direct memory access. The Swift ABI is stable at this point, so the existential container layout is guaranteed not to change in the future, however not recommending going that route unless absolutely necessary.
Maybe someone that stumbles this question and has experience in the ABI layout code can provide the code for it.
What I would do is cast Any to all supported types. Why would you cast Int as Any when you know what type is it! anyway?
var value: Any = Int(2) as Any
switch value {
case value is Int:
// ... other cases here
}

Differences generic protocol type parameter vs direct protocol type

This is my playground code:
protocol A {
init(someInt: Int)
}
func direct(a: A) {
// Doesn't work
let _ = A.init(someInt: 1)
}
func indirect<T: A>(a: T) {
// Works
let _ = T.init(someInt: 1)
}
struct B: A {
init(someInt: Int) {
}
}
let a: A = B(someInt: 0)
// Works
direct(a: a)
// Doesn't work
indirect(a: a)
It gives a compile time error when calling method indirect with argument a. So I understand <T: A> means some type that conforms to A. The type of my variable a is A and protocols do not conform to themselfs so ok, I understand the compile time error.
The same applies for the compile time error inside method direct. I understand it, a concrete conforming type needs to inserted.
A compile time also arrises when trying to access a static property in direct.
I am wondering. Are there more differences in the 2 methods that are defined? I understand that I can call initializers and static properties from indirect and I can insert type A directly in direct and respectively, I can not do what the other can do. But is there something I missed?
The key confusion is that Swift has two concepts that are spelled the same, and so are often ambiguous. One of the is struct T: A {}, which means "T conforms to the protocol A," and the other is var a: A, which means "the type of variable a is the existential of A."
Conforming to a protocol does not change a type. T is still T. It just happens to conform to some rules.
An "existential" is a compiler-generated box the wraps up a protocol. It's necessary because types that conform to a protocol could be different sizes and different memory layouts. The existential is a box that gives anything that conforms to protocol a consistent layout in memory. Existentials and protocols are related, but not the same thing.
Because an existential is a run-time box that might hold any type, there is some indirection involved, and that can introduce a performance impact and prevents certain optimizations.
Another common confusion is understanding what a type parameter means. In a function definition:
func f<T>(param: T) { ... }
This defines a family of functions f<T>() which are created at compile time based on what you pass as the type parameter. For example, when you call this function this way:
f(param: 1)
a new function is created at compile time called f<Int>(). That is a completely different function than f<String>(), or f<[Double]>(). Each one is its own function, and in principle is a complete copy of all the code in f(). (In practice, the optimizer is pretty smart and may eliminate some of that copying. And there are some other subtleties related to things that cross module boundaries. But this is a pretty decent way to think about what is going on.)
Since specialized versions of generic functions are created for each type that is passed, they can in theory be more optimized, since each version of the function will handle exactly one type. The trade-off is that they can add code-bloat. Do not assume "generics are faster than protocols." There are reasons that generics may be faster than protocols, but you have to actually look at the code generation and profile to know in any particular case.
So, walking through your examples:
func direct(a: A) {
// Doesn't work
let _ = A.init(someInt: 1)
}
A protocol (A) is just a set of rules that types must conform to. You can't construct "some unknown thing that conforms to those rules." How many bytes of memory would be allocated? What implementations would it provide to the rules?
func indirect<T: A>(a: T) {
// Works
let _ = T.init(someInt: 1)
}
In order to call this function, you must pass a type parameter, T, and that type must conform to A. When you call it with a specific type, the compiler will create a new copy of indirect that is specifically designed to work with the T you pass. Since we know that T has a proper init, we know the compiler will be able to write this code when it comes time to do so. But indirect is just a pattern for writing functions. It's not a function itself; not until you give it a T to work with.
let a: A = B(someInt: 0)
// Works
direct(a: a)
a is an existential wrapper around B. direct() expects an existential wrapper, so you can pass it.
// Doesn't work
indirect(a: a)
a is an existential wrapper around B. Existential wrappers do not conform to protocols. They require things that conform to protocols in order to create them (that's why they're called "existentials;" the fact that you created one proves that such a value actually exists). But they don't, themselves, conform to protocols. If they did, then you could do things like what you've tried to do in direct() and say "make a new instance of an existential wrapper without knowing exactly what's inside it." And there's no way to do that. Existential wrappers don't have their own method implementations.
There are cases where an existential could conform to its own protocol. As long as there are no init or static requirements, there actually isn't a problem in principle. But Swift can't currently handle that. Because it can't work for init/static, Swift currently forbids it in all cases.

What's the difference between using a generic where condition and specifying argument type? [duplicate]

This question already has answers here:
What is the in-practice difference between generic and protocol-typed function parameters?
(2 answers)
Closed 3 years ago.
What advantages are there to using generics with a where clause over specifying a protocol for an argument, as in the following function signatures?
func encode<T>(_ value: T) throws -> Data where T : Encodable {...}
func encode(value: Encodable) throws -> Data {...}
The first is a generic method that requires a concrete type that conforms to Encodable. That means for each call to encode with a different type, a completely new copy of the function may be created, optimized just for that concrete type. In some cases the compiler may remove some of these copies, but in principle encode<Int>() is a completely different function than encode<String>(). It's a (generic) system for creating functions at compile time.
In contrast, the second is a non-generic function that accepts a parameter of the "Encodable existential" type. An existential is a compiler-generated box that wraps some other type. In principle this means that the value will be copied into the box at run time before being passed, possibly requiring a heap allocation if it's too large for the box (again, it may not be because the compiler is very smart and can sometimes see that it's unnecessary).
This ambiguity between the name of the protocol and the name of the existential will hopefully be fixed in the future (and there's discussion about doing so). In the future, the latter function will hopefully be spelled (note "any"):
func encode(value: any Encodable) throws -> Data {...}
The former might be faster. It might also take more space for all the copies of the function. (But see above about the compiler. Do not assume you know which of these will be faster in an actual, optimized build.)
The former provides a real, concrete type. That means it can be used for things that require a real, concrete type, such as calling a static method, or init. This means it can be used when the protocol has an associated type.
The latter is boxed into an existential, meaning it can be stored into heterogeneous collections. The former can only be put into collections of its particular concrete type.
So they're pretty different things, and each has its purpose.
You can use multiple type constraints.
func encode<T>(encodable: T) -> Data where T: Encodable, T: Decodable {
...
}

Swift passing protocol variable to generic function

Can someone explain why is passing protocol var to a generic function an error in Swift?
protocol P {}
func f<T: P>(_: T) {}
func g(x: P) { f(x) } // Error
This, however, is not an error:
protocol P {}
func f(_: P) {}
func g(x: P) { f(x) }
I was just wondering what is the difference of the code generated by the compiler which makes it to reject the first example but in second case the generated code is good to go. Both seem to give the behavior I would expect.
Can someone explain why is passing protocol var to a generic function an error in Swift?
protocol P {}
func f<T: P>(_: T) {}
func g(x: P) { f(x) } // Error
It’s because currently non-#objc protocols don't conform to themselves. Therefore P cannot satisfy the generic placeholder T : P, as P is not a type that conforms to itself.
However in this particular example, that is, one where P doesn't have any static requirements, there's no fundamental limitation preventing P from conforming to itself (I explain this in more detail in the above linked Q&A). It's merely an implementation limitation.
What is the difference of the code generated by the compiler which makes it to reject the first example but in second case the generated code is good to go
Protocol-typed values (existentials) are implemented in a slightly different manner to generic-typed values constrained to a protocol.
A protocol-typed value P consists of:
An inline value buffer for the stored conforming value (currently 3 words in length, but is subject to change until ABI stability). If the value to store is more than 3 words in length, it's put into a heap allocated box, and a reference to this box is stored in the buffer.
A pointer to the conforming type's metadata.
A pointer to the protocol witness table for the conformance of the value to P, which lists the implementations to call for each of the protocol requirements.
On the flip side, a generic-typed value T where T : P consists of only the value buffer. The type metadata and witness table(s) are instead passed as implicit arguments to the generic function, and any member accesses or memory manipulations for values of type T can be done by consulting these arguments. Why? Because Swift's generics system ensures that two values of type T must be of the same type, so they must share the same conformance to the protocol constraint(s).
However this guarantee breaks down if we allow protocols to conform to themselves. Now, if T is a protocol type P, two values of type T could potentially have different underlying concrete types and therefore different conformances to P (so different protocol witness tables). We'd need to consult protocol witness tables on a per-value (rather than per-type) basis – just like we do with existentials.
So what we'd want is for generic-typed values to have the same layout as an existential of the protocol constraints. However this would make things pretty inefficient for the vast majority of the cases when the generic placeholder is not being satisfied by a protocol type, as values of type T would be carrying about redundant information.
The reason why #objc protocols are allowed to conform to themselves when they don't have static requirements is because they have a much simpler layout than non-#objc existentials – they just consist of a reference to the class instance, where protocol requirements are dispatched to via objc_msgSend. This layout is shared with that of a value typed as a placeholder T constrained to the protocol, which is why it's supported.

A guess that the Swift type alias mechanism is the automatic initializer Inheritance

The question popped in my head, what is happening when I define a Swift type alias? What is the mechanism behind it? Until I learned the Automatic Initializer Inheritance chapter from the Swift official document:
If your subclass doesn't define any designated initializer, it automatically inherits all of its superclass designated initializers
And here is my practice code for learning
class Vehicle{
var numberOfWheels = 0
var description: String {
return "\(numberOfWheels) wheel(s)"
}
}
class Auto: Vehicle{}
let VM = Auto()
VM.numberOfWheels
Wow! this works,at least performs, exactly as the Swift type alias. Auto is the alias of the type Vehicle
Question: Am I understand it right? This is the mechanism behind type alias.
Question: Am I understand it right? This is the mechanism behind type alias.
NO, typealises and subclassing (with inheriting all methods and initializers) are different things and based on different semantics and mechanisms.
let v1 = Vehicle()
v1 is Auto //->false
typealias Norimono = Vehicle
v1 is Norimono //->true (with warning "'is' test is always true)
The last result (including the warning you may find) is exactly the same as v1 is Vehicle.
Typealias is literally an alias, it's giving another name for the same type.
One more, you can define typealias of structs or enums, which you cannot define inherited classes (types).
Not really, but if you've never seen object oriented programming they could look somewhat similar, i agree.
Auto is a subclass that extends the original vehicle and could add additional properties and method to the Vehicle even if in that example it doesn't do it.
Auto and Vehicle are not the same thing, a Vehicle is a basic type and and Auto is one of its subtypes, what you can do with a Vehicle you can do with an Auto but not vice-versa.
A typealias is just an alias, a way to give and additional "name" to a pre-existing type, just that. A type and his alias are the same thing.