In Swift, from a technical standpoint, why does the compiler care if a protocol can only be used as a generic constraint? - swift

In Swift, from a technical standpoint, why does the compiler care if a protocol can only be used as a generic constraint?
Say I have:
protocol Fooable {
associated type Bar: Equatable
func foo(bar: Bar) {
bar==bar
}
}
Why can't I later declare a func that takes a Fooable object as an argument?
It seems to me that the compiler should only care that a given Fooable can be sent a message "foo" with an argument "bar" that is an Equatable and therefore responds to the message, "==".
I understand Swift is statically typed, but why should Swift even really care about type in this context, since the only thing that matters is whether or not a given message can be validly sent to an object?
I am trying to understand the why behind this, since I suspect there must be a good reason.

In your example above, if you wrote a function that takes a Fooable parameter, e.g.
func doSomething(with fooable:Fooable) {
fooable.foo(bar: ???) // what type is allowed to be passed here? It's not _any_ Equatable, it's the associated type Bar, which here would be...???
}
What type could be passed into fooable.foo(bar:)? It can't be any Equatable, it must be the specific associated type Bar.
Ultimately, it boils down the the problem that protocols that reference "Self" or have an associated type end up having different interfaces based on which concrete implementation has conformed (i.e. the specific type for Self, or the specific associated type). So these protocols can be considered as missing information about types and signatures needed to address them directly, but still serve as templates for conforming types and so can be used as generic constraints.
For example, the compiler would accept the function written like this:
func doSomething<T: Fooable>(with fooable:T, bar: T.Bar) {
fooable.foo(bar: bar)
}
In this scenario we aren't trying to address the Fooable protocol as a protocol. Instead, we are accepting any concrete type, T, that is itself constrained to conform to Fooable. But the compiler will know the exact concrete type of T each time you call the function, so it therefore will know the exact associated type Bar, and will know exactly what type may be passed as a parameter to fooable.foo(bar:)
More Details
It may help to think of "generic protocols" — i.e. protocols that have an associated type, including possibly the type Self — as something a little different than normal protocols. Normal protocols define messaging requirements, as you say, and can be used to abstract away a specific implementation and address any conforming type as the protocol itself.
Generic protocols are better understood as part of the generics system in Swift rather than as normal protocols. You can't cast to a generic protocol like Equatable (no if let equatable = something as? Equatable) because as part of the generic system, Equatable must be specialized and understood at compile time. More on this below.
What you do get from generic protocols that is the same as normal protocols is the concept of a contract that conforming types must adhere to. By saying associatedtype Bar: Equatable you are getting a contract that the type Bar will provide a way for you to call `func ==(left:Bar, right: Bar) -> Bool'. It requires conforming types to provide a certain interface.
The difference between generic protocols and normal protocols is that you can cast to and message normal protocols as the protocol type (not a concrete type), but you must always address conformers to generic protocols by their concrete type (just like all generics). This means normal protocols are a runtime feature (for dynamic casting) as well as a compile time feature (for type checking). But generic protocols are only a compile time feature (no dynamic casting).
Why can't we say var a:Equatable? Well, let's dig in a little. Equatable means that one instance of a specific type can be compared for equality with another instance of the same type. I.e. func ==(left:A, right:A) -> Bool. If Equatable were a normal protocol, you would say something more like: func ==(left:Equatable, right:Equatable) -> Bool. But if you think about that, it doesn't make sense. String is Equatable with other Strings, Int is Equatable with other Ints, but that doesn't in any way mean Strings are Equatable with Ints. If the Equatable protocol just required the implementation of func ==(left:Equatable, right:Equatable) -> Bool for your type, how could you possibly write that function to compare your type to every other possible Equatable type now and in the future?
Since that's not possible, Equatable requires only that you implement == for two instances of Self type. So if Foo: Equatable, then you must only define == for two instances of Foo.
Now let's look at the problem with var a:Equatable. This seems to make sense at first, but in fact, it doesn't:
var a: Equatable = "A String"
var b: Equatable = 100
let equal = a == b
Since both a and b are Equatable, we could be able to compare them for equality, right? But in fact, a's equality implementation is limited to comparing a String to a String and b's equality implementation is limited to comparing an Int to an Int. So it's best to think of generic protocols more like other generics to realize that Equatable<String> is not the same protocol as Equatable<Int> even though they are both supposedly just "Equatable".
As for why you can have a dictionary of type [AnyHashable: Any], but not [Hashable: Any], this is becoming more clear. The Hashable protocol inherits from Equatable, so it is a "generic protocol". That means for any Hashable type, there must be a func ==(left: Self, right:Self) -> Bool. Dictionaries use both the hashValue and equality comparisons to store and retrieve keys. But how can a dictionary compare a String key and an Int key for equality, even if they both conform to Hashable / Equatable? It can't. Therefore, you need to wrap your keys in a special "type eraser" called AnyHashable. How type erasers work is too detailed for the scope of this question, but suffice it to say that a type eraser like AnyHashable gets instantiated with some type T: Hashable, and then forward requests for a hashValue to its wrapped type, and implements ==(left:AnyHashable, right: AnyHashable) -> Bool in a way that also uses the wrapped type's equality implementation. I think this gist should give a great illustration of how you can implement an "AnyEquatable" type eraser.
https://gist.github.com/JadenGeller/f0d05a4699ddd477a2c1
Moving onward, because AnyHashable is a single concrete type (not a generic type like the Hashable protocol is), you can use it to define a dictionary. Because every single instance of AnyHashable can wrap a different Hashable type (String, Int, whatever), and can also produce a hashValue and be checked for equality with any other AnyHashable instance, it's exactly what a dictionary needs for its keys.
So, in a sense, type erasers like AnyHashable are a sort of implementation trick that turns a generic protocol into something like a normal protocol. By erasing / throwing away the generic associated type information, but keeping the required methods, you can effectively abstract the specific conformance of Hashable into general type "AnyHashable" that can wrap anything Hashable, but be used in non-generic circumstances.
This may all come together if you review that gist for creating an implementation of "AnyEquatable": https://gist.github.com/JadenGeller/f0d05a4699ddd477a2c1 and then go back an see how you can now turn this impossible / non-compiling code from earlier:
var a: Equatable = "A String"
var b: Equatable = 100
let equal = a == b
Into this conceptually similar, but actually valid code:
var a: AnyEquatable = AnyEquatable("A String")
var b: AnyEquatable = AnyEquatable(100)
let equal = a == b

Related

Is there a penalty for changing a generic function's argument based on `MyProtocol` to using the existential `any MyProtocol` or `some MyProtocol`?

Normally in the discussion of things like some, they are referring to return types. This question is specifically about using any or some in argument lists.
As an example, in the Swift documentation for String, you have this initializer...
init<T>(_ value: T, radix: Int = 10, uppercase: Bool = false)
where T : BinaryInteger
In Swift 5.6, they introduced the any keyword to let us work with existential types more easily. With that change, I understand you theoretically can rewrite the above like so...
init(_ value: any BinaryInteger, radix: Int = 10, uppercase: Bool = false)
Of course there's also this version based on the some keyword, which also works...
init(_ value: some BinaryInteger, radix: Int = 10, uppercase: Bool = false)
My question is... which one makes the most sense? Is there a down-side to using the existential type over the generic like that? What about the some version? I was originally thinking yes, the generic version is best because the compiler can determine at compile time what's being passed to it, but then again, so does the existential any and even the some version as it won't compile if you don't pass it a BinaryInteger and I'm not quite sure how to write a test to check this out.
When passing a value of a protocol type P to a method in Swift, there are 4 possible spellings you can currently use:
func f<T: P>(_ value: T) ("generic")
func f(_ value: some P) ("opaque parameter")
func f(_ value: P) ("'bare' existential")
func f(_ value: any P) ("'explicit' existential")
Of these spellings, (1) and (2) are synonyms, and currently, (3) and (4) are synonyms. Between using (1) and (2), there is no difference, and between (3) and (4) there is currently* no difference; but between (1)/(2) and (3)/(4), there is a difference.
f<T: P>(_: T) is the traditional way of taking a parameter of a concrete type T which is guaranteed to conform to protocol P. This way of taking a parameter:
Gives you access to the concrete type T both at compile time and at runtime, so you can perform operations on T itself
Has no overhead, as the type T is known at compile time, and the compiler knows the size and layout of the value and can set up the stack/registers appropriately; it can pass whatever parameter was given to it directly to the method
Can only be called when the type of the argument is statically known (at compile time); but as such, can be called with protocol types with Self- or associatedtype requirements
Introduced in SE-0341 (Opaque Parameter Declarations), the version of a method which takes some Protocol is exactly equivalent to the generic version spelled with angle brackets. I'll avoid repeating the content of the proposal, but the Introduction section spells out the desire for this syntax as a way to simplify the complexity of spelling generic parameters
f(_: P) is the traditional way of taking a parameter of an existential type which is guaranteed to conform to protocol P. This way of taking a parameter:
Does not give access to the concrete underlying type of the parameter at compile time; though this is accessible dynamically at runtime via type(of:)
Has runtime overhead both to pass an argument to the method, and when accessing the value inside of the method: because the type of the argument to the method might not be known statically (while the compiler still needs to know how to set up the stack and registers in order to call the method), the parameter must be boxed up in an "existential box", which has the interface of P and can dynamically pass methods along to the underlying concrete value. This both has a cost of allocating an additional "box" at runtime to hold the actual value inside of a consistently-sized and -laid-out container, as well as the cost of indirecting as method calls on the box must dynamically dispatch to the underlying type
Can be called whether or not the type of the argument is known statically; as such, cannot be called with protocol types with Self- or associatedtype requirements
Introduced in SE-0335 (Existential Any), the any keyword preceding a protocol type is a marker that helps indicate that the type is being used as an existential. It is exactly equivalent right now to using the bare name of the protocol (i.e. any P == P), but there has been some discussion about eventually using the bare name of the protocol to mean some P instead
So to address your specific example:
init(_ value: some BinaryInteger, radix: Int = 10, uppercase: Bool = false)
is exactly equivalent to the original
init<T>(_ value: T, radix: Int = 10, uppercase: Bool = false)
where T : BinaryInteger
while
init(_ value: any BinaryInteger, radix: Int = 10, uppercase: Bool = false)
is not. And, because BinaryInteger has associatedtype requirements, you also cannot use the any version as the existential type would not provide access to the underlying associated types. (If you try, you'll get the classic error: protocol 'BinaryInteger' can only be used as a generic constraint because it has Self or associated type requirements)
In general, generics are preferable to existential types when possible because of the lack of overhead and greater flexibility; but, they require knowing the static type of their parameter, which is not always possible.
Existentials are more accepting of input, but are significantly more limited in functionality and come at a cost.
Between preferring <T: P> and some P, or P and any P — the choice is currently subjective, but:
As part of SE-0335, it is currently planned that Swift 6 will require the use of the any keyword to denote existential protocol usage; the counterpart to using this is some, so if you want to start future-proofing your code and staying consistent, it might be a good idea to start migrating to any and some
Between opaque parameter syntax and generic syntax, the choice is up to you, but opaque parameters cannot currently cover all of the use cases that generics can, especially with more complicated constraints. Whether sticking with generics everywhere or preferring opaque parameters and using generics only when necessary is a code style choice that you'll need to make

Is there an equivalent in swift to Go's empty interface?

Is there an equivalent in Swift to pass as an argument an empty interface{} like in Go?
I am not very familiar with Go, but I think the Swift equivalent would be an argument of type Any or AnyObject. You can't do much with such an argument other than try to cast it to a more specific type.
Go's interfaces and Swift's protocols are quite different:
Golang's interfaces are "structural" (akin to C++'s "Concepts"), meaning that their identity is defined by their structure. If a go struct has the structure required by an interface, it implicitly implements that interface. That is, the struct doesn't say anywhere "I am strust S, and I implement interface I."
Swift's protocols are "nominal", meaning that their identity is defined their name. If a Swift struct fulfills all of the requirements of a protocol, it doesn't conform to that protocol, unless an explicit struct S: P { ... } declaration is made (the body of that declaration can even be empty, but it's still necessary for the conformance to exist).
The closest analogue to interface{} in Swift is Any. It's built-in protocol to which all type conform. It gets special treatment, it's defined by the compiler, and has hard-coded logic to make all other types conform to it. You won't see protocol Any {} or struct S: Any {} declared anywhere explicitly. AnyObject is similar, and also gets special treatment. But no other Swift protocols do. Their conformance need to be explicit.
If I am not mistaking Swift's equivalent would be:
protocol EmptyProtocol {
}
In swift interfaces are called protocols don't ask me :D
Are you trying to do type erasure or something? Just curious...

Differences generic protocol type parameter vs direct protocol type

This is my playground code:
protocol A {
init(someInt: Int)
}
func direct(a: A) {
// Doesn't work
let _ = A.init(someInt: 1)
}
func indirect<T: A>(a: T) {
// Works
let _ = T.init(someInt: 1)
}
struct B: A {
init(someInt: Int) {
}
}
let a: A = B(someInt: 0)
// Works
direct(a: a)
// Doesn't work
indirect(a: a)
It gives a compile time error when calling method indirect with argument a. So I understand <T: A> means some type that conforms to A. The type of my variable a is A and protocols do not conform to themselfs so ok, I understand the compile time error.
The same applies for the compile time error inside method direct. I understand it, a concrete conforming type needs to inserted.
A compile time also arrises when trying to access a static property in direct.
I am wondering. Are there more differences in the 2 methods that are defined? I understand that I can call initializers and static properties from indirect and I can insert type A directly in direct and respectively, I can not do what the other can do. But is there something I missed?
The key confusion is that Swift has two concepts that are spelled the same, and so are often ambiguous. One of the is struct T: A {}, which means "T conforms to the protocol A," and the other is var a: A, which means "the type of variable a is the existential of A."
Conforming to a protocol does not change a type. T is still T. It just happens to conform to some rules.
An "existential" is a compiler-generated box the wraps up a protocol. It's necessary because types that conform to a protocol could be different sizes and different memory layouts. The existential is a box that gives anything that conforms to protocol a consistent layout in memory. Existentials and protocols are related, but not the same thing.
Because an existential is a run-time box that might hold any type, there is some indirection involved, and that can introduce a performance impact and prevents certain optimizations.
Another common confusion is understanding what a type parameter means. In a function definition:
func f<T>(param: T) { ... }
This defines a family of functions f<T>() which are created at compile time based on what you pass as the type parameter. For example, when you call this function this way:
f(param: 1)
a new function is created at compile time called f<Int>(). That is a completely different function than f<String>(), or f<[Double]>(). Each one is its own function, and in principle is a complete copy of all the code in f(). (In practice, the optimizer is pretty smart and may eliminate some of that copying. And there are some other subtleties related to things that cross module boundaries. But this is a pretty decent way to think about what is going on.)
Since specialized versions of generic functions are created for each type that is passed, they can in theory be more optimized, since each version of the function will handle exactly one type. The trade-off is that they can add code-bloat. Do not assume "generics are faster than protocols." There are reasons that generics may be faster than protocols, but you have to actually look at the code generation and profile to know in any particular case.
So, walking through your examples:
func direct(a: A) {
// Doesn't work
let _ = A.init(someInt: 1)
}
A protocol (A) is just a set of rules that types must conform to. You can't construct "some unknown thing that conforms to those rules." How many bytes of memory would be allocated? What implementations would it provide to the rules?
func indirect<T: A>(a: T) {
// Works
let _ = T.init(someInt: 1)
}
In order to call this function, you must pass a type parameter, T, and that type must conform to A. When you call it with a specific type, the compiler will create a new copy of indirect that is specifically designed to work with the T you pass. Since we know that T has a proper init, we know the compiler will be able to write this code when it comes time to do so. But indirect is just a pattern for writing functions. It's not a function itself; not until you give it a T to work with.
let a: A = B(someInt: 0)
// Works
direct(a: a)
a is an existential wrapper around B. direct() expects an existential wrapper, so you can pass it.
// Doesn't work
indirect(a: a)
a is an existential wrapper around B. Existential wrappers do not conform to protocols. They require things that conform to protocols in order to create them (that's why they're called "existentials;" the fact that you created one proves that such a value actually exists). But they don't, themselves, conform to protocols. If they did, then you could do things like what you've tried to do in direct() and say "make a new instance of an existential wrapper without knowing exactly what's inside it." And there's no way to do that. Existential wrappers don't have their own method implementations.
There are cases where an existential could conform to its own protocol. As long as there are no init or static requirements, there actually isn't a problem in principle. But Swift can't currently handle that. Because it can't work for init/static, Swift currently forbids it in all cases.

Swift passing protocol variable to generic function

Can someone explain why is passing protocol var to a generic function an error in Swift?
protocol P {}
func f<T: P>(_: T) {}
func g(x: P) { f(x) } // Error
This, however, is not an error:
protocol P {}
func f(_: P) {}
func g(x: P) { f(x) }
I was just wondering what is the difference of the code generated by the compiler which makes it to reject the first example but in second case the generated code is good to go. Both seem to give the behavior I would expect.
Can someone explain why is passing protocol var to a generic function an error in Swift?
protocol P {}
func f<T: P>(_: T) {}
func g(x: P) { f(x) } // Error
It’s because currently non-#objc protocols don't conform to themselves. Therefore P cannot satisfy the generic placeholder T : P, as P is not a type that conforms to itself.
However in this particular example, that is, one where P doesn't have any static requirements, there's no fundamental limitation preventing P from conforming to itself (I explain this in more detail in the above linked Q&A). It's merely an implementation limitation.
What is the difference of the code generated by the compiler which makes it to reject the first example but in second case the generated code is good to go
Protocol-typed values (existentials) are implemented in a slightly different manner to generic-typed values constrained to a protocol.
A protocol-typed value P consists of:
An inline value buffer for the stored conforming value (currently 3 words in length, but is subject to change until ABI stability). If the value to store is more than 3 words in length, it's put into a heap allocated box, and a reference to this box is stored in the buffer.
A pointer to the conforming type's metadata.
A pointer to the protocol witness table for the conformance of the value to P, which lists the implementations to call for each of the protocol requirements.
On the flip side, a generic-typed value T where T : P consists of only the value buffer. The type metadata and witness table(s) are instead passed as implicit arguments to the generic function, and any member accesses or memory manipulations for values of type T can be done by consulting these arguments. Why? Because Swift's generics system ensures that two values of type T must be of the same type, so they must share the same conformance to the protocol constraint(s).
However this guarantee breaks down if we allow protocols to conform to themselves. Now, if T is a protocol type P, two values of type T could potentially have different underlying concrete types and therefore different conformances to P (so different protocol witness tables). We'd need to consult protocol witness tables on a per-value (rather than per-type) basis – just like we do with existentials.
So what we'd want is for generic-typed values to have the same layout as an existential of the protocol constraints. However this would make things pretty inefficient for the vast majority of the cases when the generic placeholder is not being satisfied by a protocol type, as values of type T would be carrying about redundant information.
The reason why #objc protocols are allowed to conform to themselves when they don't have static requirements is because they have a much simpler layout than non-#objc existentials – they just consist of a reference to the class instance, where protocol requirements are dispatched to via objc_msgSend. This layout is shared with that of a value typed as a placeholder T constrained to the protocol, which is why it's supported.

Array of structs conforming to protocol not recognized

I'm trying to create a method that takes an array of structs that conform to a protocol in Swift.
For the simplest example of this, I define an empty protocol and a method that takes an array of objects conforming to that protocol and just prints them
protocol SomeProtocol {}
func methodTakingProtocol(objects: [SomeProtocol]) {
// do something with the array of objects
print(objects)
}
When I try to feed this method an array of structs that conform to SomeProtocol, however, I get an error
struct SomeStruct: SomeProtocol {}
let arrayOfStructs = [ SomeStruct(), SomeStruct() ]
methodTakingProtocol(arrayOfStructs)
// ^ "Cannot convert value of type '[SomeStruct]' to expected argument type '[SomeProtocol]'"
Poking around a little, I've found that I can get around this problem by explicitly calling out SomeStruct's adoption of SomeProtocol
let arrayOfStructs: [SomeProtocol] = [ SomeStruct(), SomeStruct() ]
// This will work
methodTakingProtocol(arrayOfStructs)
Can someone tell me what's going on here? Is this a bug that I should file a radar for, or is there some reasoning as to why the compiler doesn't recognize this array of structs as conforming to the protocol they have been marked as adopting?
This is actually working as intended. In order to pass the array to the method you have to either cast it or explicitly declare it as the protocol:
protocol SomeProtocol {}
struct SomeStruct: SomeProtocol {}
// explicitly typed
let arrayOfStructs:[SomeProtocol] = [ SomeStruct(), SomeStruct() ]
func foo(bar:[SomeProtocol]) { }
foo(arrayOfStructs) // Works!
Here is an excellent article on this topic: Generic Protocols & Their Shortcomings
But that begs the question; Why can't we use generic protocols outside
of generic constraints?
The short answer is: Swift wants to be type-safe. Couple that with the
fact that it's an ahead-of-time compiled language, and you have a
language that NEEDS to be able to infer a concrete type at anytime
during compilation. I can't stress that enough. At compile time, every
one of your types that aren't function/class constraints need to be
concrete. Associated types within a protocol are abstract. Which means
they aren't concrete. They're fake. And no one likes a fake.
edit: It's still a great article but upon re-reading I realized that it doesn't exactly apply here since we're discussing "concrete protocols" and not "generic protocols".