Does passing struct into a function also using copy-on-write for optimisation, or actual copy does happen immediately? [duplicate] - swift

In Swift, when you pass a value type, say an Array to a function. A copy of the array is made for the function to use.
However the documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-XID_134 also says:
The description above refers to the “copying” of strings, arrays, and
dictionaries. The behavior you see in your code will always be as if a
copy took place. However, Swift only performs an actual copy behind
the scenes when it is absolutely necessary to do so. Swift manages all
value copying to ensure optimal performance, and you should not avoid
assignment to try to preempt this optimization.
So does it mean that the copying actually only takes placed when the passed value type is modified?
Is there a way to demonstrate that this is actually the underlying behavior?
Why this is important? If I create a large immutable array and want to pass it in from function to function, I certainly do not want to keep making copies of it. Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?
Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy? Granted that another thread can modify the original array elsewhere (only if it is mutable), making a copy at the moment the function is called necessary (but only if the array passed in is mutable). So if the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right? So what does Apple mean by the phrase above?

TL;DR:
So does it mean that the copying actually only takes placed when the passed value type is modified?
Yes!
Is there a way to demonstrate that this is actually the underlying behavior?
See the first example in the section on the copy-on-write optimization.
Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?
If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.
Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?
You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.
If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?
Exactly, hence the copy-on-write mechanism.
So what does Apple mean by the phrase above?
Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.
Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.
Value types semantics
Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.
Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.
One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...
Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).
What about mutating
Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:
struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}
var s1 = S(foo: 0, bar: 10)
s1.modify()
// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)
Reference types semantics
Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).
Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.
What about inout
An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:
func f(x: inout [Int]) {
x.append(12)
}
var a = [0]
f(x: &a)
// Prints '[0, 12]'
print(a)
In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:
func f(x: inout NSArray) {
x = [12]
}
var a: NSArray = [0]
f(x: &a)
// Prints '(12)'
print(a)
Copy-on-write
Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):
let array1 = [1, 2, 3]
var array2 = array1
// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }
array2[0] = 1
// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }
Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.
This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):
var array1 = [[1, 2], [3, 4]]
var array2 = array1
// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }
array2[0] = []
// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }
Replicating copy-on-write
It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:
final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}
struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}
var a = CopyOnWrite(value: SomeLargeObject())
// This line doesn't copy anything.
var b = a
However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:
If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.
This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.
Value types and multithreading
It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):
One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.
Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.
[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.
[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.
[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.
[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.
[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).

I don't know if that's the same for every value type in Swift, but for Arrays I'm pretty sure it's a copy-on-write, so it doesn't copy it unless you modify it, and as you said if you pass it around as a constant you don't run that risk anyway.
p.s. In Swift 1.2 there are new APIs you can use to implement copy-on-write on your own value-types too

Related

Passing variables by reference in Swift

I started learning c++ and now I am wondering if I can do some things in Swift as well.
I never actually thought about what happens when we pass a variable as an argument to a function in Swift.
Let's use a variable of type string for examples.
In c++ I can pass an argument to a function either by making a copy of it, or by passing a reference/pointer.
void foo(string s) or void foo (string& s);
In the 1st case the copy of my original variable will be created, and foo will receive a copy. In the 2nd case, I basically pass an address of the variable in memory without creating a copy.
Now in Swift I know that I can declare an argument to a function to be inout, which means I can modify the original object.
1) func foo(s:String)...
2) func testPassingByReference(s: inout String)...
I made an extension to String to print the address of the object:
extension String {
func address() -> String {
return String(format: "%p", self);
}
}
The result was not that I expected to see.
var str = "Hello world"
print(str.address())
0x7fd6c9e04ef0
func testPassingByValue(s: String) {
print("he address of s is: \(s.address())")
}
func testPassingByReference(s: inout String) {
print("he address of s is: \(s.address())")
}
testPassingByValue(s: str)
0x7fd6c9e05270
testPassingByReference(s: &str)
0x7fd6c9e7caf0
I understand why the address is different when we pass an argument by value, but it's not what I expected to see when we pass an argument as an inout parameter.
Apple developer website says that
In Swift, Array, String, and Dictionary are all value types.
So the question is, is there any way to avoid copying objects that we pass to functions (I can have a pretty big array or a dictionary) or Swift doesn't allow us do such things?
Copying arrays and strings is cheap (almost free) as long as you don't modify it. Swift implements copy-on-write for these collections in the stdlib. This isn't a language-level feature; it's actually implemented explicitly for arrays and strings (and some other types). So you can have many copies of a large array that all share the same backing storage.
inout is not the same thing as "by reference." It is literally "in-out." The value is copied in at the start of the function, and then copied back to the original location at the end.
Swift's approach tends to be performant for common uses, but Swift doesn't make strong performance promises like C++ does. (That said, this allows Swift to be faster in some cases than C++ could be, because Swift isn't as restricted in its choice of data structures.) As a general rule, I find it very difficult to reason about the likely performance of arbitrary Swift code. It's easy to reason about the worst-case performance (just assume copies always happen), but it's hard to know for certain when a copy will be avoided.
Even though inout parameters modify the variable that was used as an input parameter to the function, they don't exactly work like by reference in other languages. The behaviour in Swift is called copy-in copy-out or call by value result. It means that when you use an inout parameter, at the time of the function call, its value is copied and inside the function, a local copy of the variable is modified. At the time of the functions return, it overwrites the value at the inout parameters original memory location with the modified copy value.
You are printing the address of the variable inside the function, hence you are actually printing the location of the copied value. Try printing after the function returned and you will see that you are printing the original location with the modified value.
For more information, see the In-Out parameters part of the documentation.

Cannot retrieve associated object [duplicate]

The code below can be run in a Swift Playground:
import UIKit
func aaa(_ key: UnsafeRawPointer!, _ value: Any! = nil) {
print(key)
}
func bbb(_ key: UnsafeRawPointer!) {
print(key)
}
class A {
var key = "aaa"
}
let a = A()
aaa(&a.key)
bbb(&a.key)
Here's the result printed on my mac:
0x00007fff5dce9248
0x00007fff5dce9220
Why the results of two prints differs? What's more interesting, when I change the function signature of bbb to make it the same with aaa, the result of two prints are the same. And if I use a global var instead of a.key in these two function calls, the result of two prints are the same. Does anyone knows why this strange behavior happens?
Why the results of two prints differs?
Because for each function call, Swift is creating a temporary variable initialised to the value returned by a.key's getter. Each function is called with a pointer to their given temporary variable. Therefore the pointer values will likely not be the same – as they refer to different variables.
The reason why temporary variables are used here is because A is a non-final class, and can therefore have its getters and setters of key overridden by subclasses (which could well re-implement it as a computed property).
Therefore in an un-optimised build, the compiler cannot just pass the address of key directly to the function, but instead has to rely on calling the getter (although in an optimised build, this behaviour can change completely).
You'll note that if you mark key as final, you should now get consistent pointer values in both functions:
class A {
final var key = "aaa"
}
var a = A()
aaa(&a.key) // 0x0000000100a0abe0
bbb(&a.key) // 0x0000000100a0abe0
Because now the address of key can just be directly passed to the functions, bypassing its getter entirely.
It's worth noting however that, in general, you should not rely on this behaviour. The values of the pointers you get within the functions are a pure implementation detail and are not guaranteed to be stable. The compiler is free to call the functions however it wishes, only promising you that the pointers you get will be valid for the duration of the call, and will have pointees initialised to the expected values (and if mutable, any changes you make to the pointees will be seen by the caller).
The only exception to this rule is the passing of pointers to global and static stored variables. Swift does guarantee that the pointer values you get will be stable and unique for that particular variable. From the Swift team's blog post on Interacting with C Pointers (emphasis mine):
However, interaction with C pointers is inherently
unsafe compared to your other Swift code, so care must be taken. In
particular:
These conversions cannot safely be used if the callee
saves the pointer value for use after it returns. The pointer that
results from these conversions is only guaranteed to be valid for the
duration of a call. Even if you pass the same variable, array, or
string as multiple pointer arguments, you could receive a different
pointer each time. An exception to this is global or static stored
variables. You can safely use the address of a global variable as a
persistent unique pointer value, e.g.: as a KVO context parameter.
Therefore if you made key a static stored property of A or just a global stored variable, you are guaranteed to the get same pointer value in both function calls.
Changing the function signature
When I change the function signature of bbb to make it the same with aaa, the result of two prints are the same
This appears to be an optimisation thing, as I can only reproduce it in -O builds and playgrounds. In an un-optimised build, the addition or removal of an extra parameter has no effect.
(Although it's worth noting that you should not test Swift behaviour in playgrounds as they are not real Swift environments, and can exhibit different runtime behaviour to code compiled with swiftc)
The cause of this behaviour is merely a coincidence – the second temporary variable is able to reside at the same address as the first (after the first is deallocated). When you add an extra parameter to aaa, a new variable will be allocated 'between' them to hold the value of the parameter to pass, preventing them from sharing the same address.
The same address isn't observable in un-optimised builds due to the intermediate load of a in order to call the getter for the value of a.key. As an optimisation, the compiler is able to inline the value of a.key to the call-site if it has a property initialiser with a constant expression, removing the need for this intermediate load.
Therefore if you give a.key a non-determininstic value, e.g var key = arc4random(), then you should once again observe different pointer values, as the value of a.key can no longer be inlined.
But regardless of the cause, this is a perfect example of how the pointer values for variables (which are not global or static stored variables) are not to be relied on – as the value you get can completely change depending on factors such as optimisation level and parameter count.
inout & UnsafeMutable(Raw)Pointer
Regarding your comment:
But since withUnsafePointer(to:_:) always has the correct behavior I want (in fact it should, otherwise this function is of no use), and it also has an inout parameter. So I assume there are implementation difference between these functions with inout parameters.
The compiler treats an inout parameter in a slightly different way to an UnsafeRawPointer parameter. This is because you can mutate the value of an inout argument in the function call, but you cannot mutate the pointee of an UnsafeRawPointer.
In order to make any mutations to the value of the inout argument visible to the caller, the compiler generally has two options:
Make a temporary variable initialised to the value returned by the variable's getter. Call the function with a pointer to this variable, and once the function has returned, call the variable's setter with the (possibly mutated) value of the temporary variable.
If it's addressable, simply call the function with a direct pointer to the variable.
As said above, the compiler cannot use the second option for stored properties that aren't known to be final (but this can change with optimisation). However, always relying on the first option can be potentially expensive for large values, as they'll have to be copied. This is especially detrimental for value types with copy-on-write behaviour, as they depend on being unique in order to perform direct mutations to their underlying buffer – a temporary copy violates this.
To solve this problem, Swift implements a special accessor – called materializeForSet. This accessor allows the callee to either provide the caller with a direct pointer to the given variable if it's addressable, or otherwise will return a pointer to a temporary buffer containing a copy of the variable, which will need to be written back to the setter after it has been used.
The former is the behaviour you're seeing with inout – you're getting a direct pointer to a.key back from materializeForSet, therefore the pointer values you get in both function calls are the same.
However, materializeForSet is only used for function parameters that require write-back, which explains why it's not used for UnsafeRawPointer. If you make the function parameters of aaa and bbb take UnsafeMutable(Raw)Pointers (which do require write-back), you should observe the same pointer values again.
func aaa(_ key: UnsafeMutableRawPointer) {
print(key)
}
func bbb(_ key: UnsafeMutableRawPointer) {
print(key)
}
class A {
var key = "aaa"
}
var a = A()
// will use materializeForSet to get a direct pointer to a.key
aaa(&a.key) // 0x0000000100b00580
bbb(&a.key) // 0x0000000100b00580
But again, as said above, this behaviour is not to be relied upon for variables that are not global or static.

Why UnsafeRawPointer shows different result when function signatures differs in Swift?

The code below can be run in a Swift Playground:
import UIKit
func aaa(_ key: UnsafeRawPointer!, _ value: Any! = nil) {
print(key)
}
func bbb(_ key: UnsafeRawPointer!) {
print(key)
}
class A {
var key = "aaa"
}
let a = A()
aaa(&a.key)
bbb(&a.key)
Here's the result printed on my mac:
0x00007fff5dce9248
0x00007fff5dce9220
Why the results of two prints differs? What's more interesting, when I change the function signature of bbb to make it the same with aaa, the result of two prints are the same. And if I use a global var instead of a.key in these two function calls, the result of two prints are the same. Does anyone knows why this strange behavior happens?
Why the results of two prints differs?
Because for each function call, Swift is creating a temporary variable initialised to the value returned by a.key's getter. Each function is called with a pointer to their given temporary variable. Therefore the pointer values will likely not be the same – as they refer to different variables.
The reason why temporary variables are used here is because A is a non-final class, and can therefore have its getters and setters of key overridden by subclasses (which could well re-implement it as a computed property).
Therefore in an un-optimised build, the compiler cannot just pass the address of key directly to the function, but instead has to rely on calling the getter (although in an optimised build, this behaviour can change completely).
You'll note that if you mark key as final, you should now get consistent pointer values in both functions:
class A {
final var key = "aaa"
}
var a = A()
aaa(&a.key) // 0x0000000100a0abe0
bbb(&a.key) // 0x0000000100a0abe0
Because now the address of key can just be directly passed to the functions, bypassing its getter entirely.
It's worth noting however that, in general, you should not rely on this behaviour. The values of the pointers you get within the functions are a pure implementation detail and are not guaranteed to be stable. The compiler is free to call the functions however it wishes, only promising you that the pointers you get will be valid for the duration of the call, and will have pointees initialised to the expected values (and if mutable, any changes you make to the pointees will be seen by the caller).
The only exception to this rule is the passing of pointers to global and static stored variables. Swift does guarantee that the pointer values you get will be stable and unique for that particular variable. From the Swift team's blog post on Interacting with C Pointers (emphasis mine):
However, interaction with C pointers is inherently
unsafe compared to your other Swift code, so care must be taken. In
particular:
These conversions cannot safely be used if the callee
saves the pointer value for use after it returns. The pointer that
results from these conversions is only guaranteed to be valid for the
duration of a call. Even if you pass the same variable, array, or
string as multiple pointer arguments, you could receive a different
pointer each time. An exception to this is global or static stored
variables. You can safely use the address of a global variable as a
persistent unique pointer value, e.g.: as a KVO context parameter.
Therefore if you made key a static stored property of A or just a global stored variable, you are guaranteed to the get same pointer value in both function calls.
Changing the function signature
When I change the function signature of bbb to make it the same with aaa, the result of two prints are the same
This appears to be an optimisation thing, as I can only reproduce it in -O builds and playgrounds. In an un-optimised build, the addition or removal of an extra parameter has no effect.
(Although it's worth noting that you should not test Swift behaviour in playgrounds as they are not real Swift environments, and can exhibit different runtime behaviour to code compiled with swiftc)
The cause of this behaviour is merely a coincidence – the second temporary variable is able to reside at the same address as the first (after the first is deallocated). When you add an extra parameter to aaa, a new variable will be allocated 'between' them to hold the value of the parameter to pass, preventing them from sharing the same address.
The same address isn't observable in un-optimised builds due to the intermediate load of a in order to call the getter for the value of a.key. As an optimisation, the compiler is able to inline the value of a.key to the call-site if it has a property initialiser with a constant expression, removing the need for this intermediate load.
Therefore if you give a.key a non-determininstic value, e.g var key = arc4random(), then you should once again observe different pointer values, as the value of a.key can no longer be inlined.
But regardless of the cause, this is a perfect example of how the pointer values for variables (which are not global or static stored variables) are not to be relied on – as the value you get can completely change depending on factors such as optimisation level and parameter count.
inout & UnsafeMutable(Raw)Pointer
Regarding your comment:
But since withUnsafePointer(to:_:) always has the correct behavior I want (in fact it should, otherwise this function is of no use), and it also has an inout parameter. So I assume there are implementation difference between these functions with inout parameters.
The compiler treats an inout parameter in a slightly different way to an UnsafeRawPointer parameter. This is because you can mutate the value of an inout argument in the function call, but you cannot mutate the pointee of an UnsafeRawPointer.
In order to make any mutations to the value of the inout argument visible to the caller, the compiler generally has two options:
Make a temporary variable initialised to the value returned by the variable's getter. Call the function with a pointer to this variable, and once the function has returned, call the variable's setter with the (possibly mutated) value of the temporary variable.
If it's addressable, simply call the function with a direct pointer to the variable.
As said above, the compiler cannot use the second option for stored properties that aren't known to be final (but this can change with optimisation). However, always relying on the first option can be potentially expensive for large values, as they'll have to be copied. This is especially detrimental for value types with copy-on-write behaviour, as they depend on being unique in order to perform direct mutations to their underlying buffer – a temporary copy violates this.
To solve this problem, Swift implements a special accessor – called materializeForSet. This accessor allows the callee to either provide the caller with a direct pointer to the given variable if it's addressable, or otherwise will return a pointer to a temporary buffer containing a copy of the variable, which will need to be written back to the setter after it has been used.
The former is the behaviour you're seeing with inout – you're getting a direct pointer to a.key back from materializeForSet, therefore the pointer values you get in both function calls are the same.
However, materializeForSet is only used for function parameters that require write-back, which explains why it's not used for UnsafeRawPointer. If you make the function parameters of aaa and bbb take UnsafeMutable(Raw)Pointers (which do require write-back), you should observe the same pointer values again.
func aaa(_ key: UnsafeMutableRawPointer) {
print(key)
}
func bbb(_ key: UnsafeMutableRawPointer) {
print(key)
}
class A {
var key = "aaa"
}
var a = A()
// will use materializeForSet to get a direct pointer to a.key
aaa(&a.key) // 0x0000000100b00580
bbb(&a.key) // 0x0000000100b00580
But again, as said above, this behaviour is not to be relied upon for variables that are not global or static.

What are the benefits of an immutable struct over a mutable one?

I already know the benefit of immutability over mutability in being able to reason about code and introducing less bugs, especially in multithreaded code. In creating structs, though, I cannot see any benefit over creating a completely immutable struct over a mutable one.
Let's have as an example of a struct that keeps some score:
struct ScoreKeeper {
var score: Int
}
In this structure I can change the value of score on an existing struct variable
var scoreKeeper = ScoreKeeper(score: 0)
scoreKeeper.score += 5
println(scoreKeeper.score)
// prints 5
The immutable version would look like this:
struct ScoreKeeper {
let score: Int
func incrementScoreBy(points: Int) -> ScoreKeeper {
return ScoreKeeper(score: self.score + points)
}
}
And its usage:
let scoreKeeper = ScoreKeeper(score: 0)
let newScoreKeeper = scoreKeeper.incrementScoreBy(5)
println(newScoreKeeper.score)
// prints 5
What I don't see is the benefit of the second approach over the first, since structs are value types. If I pass a struct around, it always gets copied. So it does not seem to matter to me if the structure has a mutable property, since other parts of the code would be working on a separate copy anyway, thus removing the problems of mutability.
I have seen some people using the second example, though, which requires more code for no apparent benefit. Is there some benefit I'm not seeing?
Different approaches will facilitate different kinds of changes to the code. An immutable structure is very similar to an immutable class object, but a mutable structure and a mutable class object are very different. Thus, code which uses an immutable structure can often be readily adapted if for some reason it becomes necessary to use a class object instead.
On the flip side, use of an immutable object will often make the code to replace a variable with a modified version more brittle in case additional properties are added to the type in question. For example, if a PhoneNumber type includes methods for AreaCode, LocalExchange, and LocalNumber and a constructor that takes those parameters, and then adds an "optional" fourth property for Extension, then code which is supposed to change the area codes of certain phone numbers by passing the new area code, LocalExchange, and LocalNumber, to the three-argument constructor will erase the Extension property of every phone number, while code which could write to AreaCode directly wouldn't have had that problem.
Your remark about copying value types is very good. Maybe this doesn't make much sense in particular language (swift) and particular compiler implementation (current version) but in general if the compiler knows for sure that the data structure is immutable, it could e.g. use reference instead of a copy behind the scenes to gain some performance improvement. This could not be done with mutable type for obvious reasons.
Even more generally speaking, limitation means information. If you limit your data structure somehow, you gain some extra knowledge about it. And extra knowledge means extra possibilities ;) Maybe the current compiler does not take advantage of them but this does not mean they are not here :)
Good analysis, especially pointing out that structs are passed by value and therefore will not be altered by other processes.
The only benefit I can see is a stylistic one by making the immutability of the element explicit.
It is more of a style to make value based types be treated on par with object based types in object oriented styles. It is more of a personal choice, and I don't see any big benefits in either of them.
In general terms, immutable objects are less costly to the system than mutable ones. Mutable objects need to have infrastructure for taking on new values, and the system has to allow for the fact that their values can change at any time.
Mutable objects are also a challenge in concurrent code because you have to guard against the value changing out from under you from another thread.
However, if you are constantly creating and destroying unique immutable objects, the overhead of creating new ones becomes costly quite quickly.
In the foundation classes, NSNumber is an immutable object. The system maintains a pool of NSNumber objects that you've used before, and under the covers, gives you back an existing number if you ask for one with the same value as one you created before.
That's about the only situation in which I could see value in using static structs - where they don't change very much and you have a fairly small pool of possible values. In that case you'd probably want to se up your class with a "factory method" that kept recently used structs around and reused them if you asked for a struct with the same value again.
Such a scheme could simplify concurrent code, as mentioned above. In that case you wouldn't have to guard against the values of your structs changing in another thread. If you were using such a struct, you could know that it would never change.

When does the copying take place for swift value types

In Swift, when you pass a value type, say an Array to a function. A copy of the array is made for the function to use.
However the documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-XID_134 also says:
The description above refers to the “copying” of strings, arrays, and
dictionaries. The behavior you see in your code will always be as if a
copy took place. However, Swift only performs an actual copy behind
the scenes when it is absolutely necessary to do so. Swift manages all
value copying to ensure optimal performance, and you should not avoid
assignment to try to preempt this optimization.
So does it mean that the copying actually only takes placed when the passed value type is modified?
Is there a way to demonstrate that this is actually the underlying behavior?
Why this is important? If I create a large immutable array and want to pass it in from function to function, I certainly do not want to keep making copies of it. Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?
Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy? Granted that another thread can modify the original array elsewhere (only if it is mutable), making a copy at the moment the function is called necessary (but only if the array passed in is mutable). So if the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right? So what does Apple mean by the phrase above?
TL;DR:
So does it mean that the copying actually only takes placed when the passed value type is modified?
Yes!
Is there a way to demonstrate that this is actually the underlying behavior?
See the first example in the section on the copy-on-write optimization.
Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?
If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.
Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?
You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.
If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?
Exactly, hence the copy-on-write mechanism.
So what does Apple mean by the phrase above?
Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.
Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.
Value types semantics
Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.
Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.
One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...
Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).
What about mutating
Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:
struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}
var s1 = S(foo: 0, bar: 10)
s1.modify()
// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)
Reference types semantics
Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).
Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.
What about inout
An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:
func f(x: inout [Int]) {
x.append(12)
}
var a = [0]
f(x: &a)
// Prints '[0, 12]'
print(a)
In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:
func f(x: inout NSArray) {
x = [12]
}
var a: NSArray = [0]
f(x: &a)
// Prints '(12)'
print(a)
Copy-on-write
Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):
let array1 = [1, 2, 3]
var array2 = array1
// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }
array2[0] = 1
// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }
Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.
This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):
var array1 = [[1, 2], [3, 4]]
var array2 = array1
// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }
array2[0] = []
// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }
Replicating copy-on-write
It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:
final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}
struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}
var a = CopyOnWrite(value: SomeLargeObject())
// This line doesn't copy anything.
var b = a
However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:
If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.
This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.
Value types and multithreading
It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):
One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.
Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.
[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.
[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.
[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.
[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.
[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).
I don't know if that's the same for every value type in Swift, but for Arrays I'm pretty sure it's a copy-on-write, so it doesn't copy it unless you modify it, and as you said if you pass it around as a constant you don't run that risk anyway.
p.s. In Swift 1.2 there are new APIs you can use to implement copy-on-write on your own value-types too