According to the Swift Programming Language reference, Dictionary instances are copied whenever they are passed to a function/method or assigned to a constant or variable. This seems inefficient. Is there a way to efficiently share the contents of a dictionary between two methods without copying?
It's true the documentation says that but there are also various notes saying it won't affect the performance. The copying will be performed lazily - only when needed.
The descriptions below refer to the “copying” of arrays, dictionaries, strings, and other values. Where copying is mentioned, the behavior you see in your code will always be as if a copy took place. However, Swift only performs an actual copy behind the scenes when it is absolutely necessary to do so. Swift manages all value copying to ensure optimal performance, and you should not avoid assignment to try to preempt this optimization.
Source: Classes & Collections
Meaning - don't try to optimize before you actually encounter performance problems!
Also, don't forget that dictionaries are structures. When you pass them into a function, they are implicitly immutable, so no need for copying. To actually pass a mutable dictionary into a function, you can use an inout parameter and the dictionary won't be copied (passed by reference). The only case when a mutable dictionary passed as a parameter will be copied is when you declare the parameter as var.
You always have the option to define a custom, generic class with a Dictionary attribute:
class SharedDictionary<K, V> {
var dict : Dictionary<K, V>
// add the methods you need, including overloading operators
}
Instances of your SharedDictionary will be passed-by-reference (not copied).
I actually talked to someone on the Swift team today about "pass by reference" in Swift. Here is what I got:
As we all know, struct are pass by copy, classes are pass by
reference
I quote "It is extremely easy to wrap a struct in a class.
Pointing to GoZoner's answer.
Even though though a struct is copied, any classes defined in
the struct will still be passed by reference.
If you want to do traditional pass by reference on a struct, use
inout. However he specifically mentioned to "consider adding in
another return value instead of using inout" when saying this.
Since Dictionary defines KeyType and ValueType as generics:
struct Dictionary<KeyType : Hashable, ValueType>
I believe this means that if your KeyType and ValueType are class objects they will not be copied when the Dictionary itself is copied, and you shouldn't need to worry about it too much.
Also, the NSDictionary class is still available to use!
As other said "Swift only performs an actual copy behind the scenes when it is absolutely necessary to do so." so performance should not be a big problem here. However you might still want to have a dictionary passed by reference for some other reasons. In that case you can create a custom class like below and use it just like you would use a normal dictionary object:
class SharedDictionary<K : Hashable, V> {
var dict : Dictionary<K, V> = Dictionary()
subscript(key : K) -> V? {
get {
return dict[key]
}
set(newValue) {
dict[key] = newValue
}
}
}
Trust the language designers: the compiler is usually smarter than you think in optimizing copies.
You can hack around this, but I don't frankly see a need before proving it's inefficient.
Related
I'm really new to Swift and I just read that classes are passed by reference and arrays/strings etc. are copied.
Is the pass by reference the same way as in Objective-C or Java wherein you actually pass "a" reference or is it proper pass by reference?
Types of Things in Swift
The rule is:
Class instances are reference types (i.e. your reference to a class instance is effectively a pointer)
Functions are reference types
Everything else is a value type; "everything else" simply means instances of structs and instances of enums, because that's all there is in Swift. Arrays and strings are struct instances, for example. You can pass a reference to one of those things (as a function argument) by using inout and taking the address, as newacct has pointed out. But the type is itself a value type.
What Reference Types Mean For You
A reference type object is special in practice because:
Mere assignment or passing to function can yield multiple references to the same object
The object itself is mutable even if the reference to it is a constant (let, either explicit or implied).
A mutation to the object affects that object as seen by all references to it.
Those can be dangers, so keep an eye out. On the other hand, passing a reference type is clearly efficient because only a pointer is copied and passed, which is trivial.
What Value Types Mean For You
Clearly, passing a value type is "safer", and let means what it says: you can't mutate a struct instance or enum instance through a let reference. On the other hand, that safety is achieved by making a separate copy of the value, isn't it? Doesn't that make passing a value type potentially expensive?
Well, yes and no. It isn't as bad as you might think. As Nate Cook has said, passing a value type does not necessarily imply copying, because let (explicit or implied) guarantees immutability so there's no need to copy anything. And even passing into a var reference doesn't mean that things will be copied, only that they can be if necessary (because there's a mutation). The docs specifically advise you not to get your knickers in a twist.
Everything in Swift is passed by "copy" by default, so when you pass a value-type you get a copy of the value, and when you pass a reference type you get a copy of the reference, with all that that implies. (That is, the copy of the reference still points to the same instance as the original reference.)
I use scare quotes around the "copy" above because Swift does a lot of optimization; wherever possible, it doesn't copy until there's a mutation or the possibility of mutation. Since parameters are immutable by default, this means that most of the time no copy actually happens.
It is always pass-by-value when the parameter is not inout.
It is always pass-by-reference if the parameter is inout. However, this is somewhat complicated by the fact you need to explicitly use the & operator on the argument when passing to an inout parameter, so it may not fit the traditional definition of pass-by-reference, where you pass the variable directly.
Here is a small code sample for passing by reference.
Avoid doing this, unless you have a strong reason to.
func ComputeSomeValues(_ value1: inout String, _ value2: inout Int){
value1 = "my great computation 1";
value2 = 123456;
}
Call it like this
var val1: String = "";
var val2: Int = -1;
ComputeSomeValues(&val1, &val2);
The Apple Swift Developer blog has a post called Value and Reference Types that provides a clear and detailed discussion on this very topic.
To quote:
Types in Swift fall into one of two categories: first, “value types”,
where each instance keeps a unique copy of its data, usually defined
as a struct, enum, or tuple. The second, “reference types”, where
instances share a single copy of the data, and the type is usually
defined as a class.
The Swift blog post continues to explain the differences with examples and suggests when you would use one over the other.
When you use inout with an infix operator such as += then the &address symbol can be ignored. I guess the compiler assumes pass by reference?
extension Dictionary {
static func += (left: inout Dictionary, right: Dictionary) {
for (key, value) in right {
left[key] = value
}
}
}
origDictionary += newDictionaryToAdd
And nicely this dictionary 'add' only does one write to the original reference too, so great for locking!
Classes and structures
One of the most important differences between structures and classes is that structures are always copied when they are passed around in your code, but classes are passed by reference.
Closures
If you assign a closure to a property of a class instance, and the closure captures that instance by referring to the instance or its members, you will create a strong reference cycle between the closure and the instance. Swift uses capture lists to break these strong reference cycles
ARC(Automatic Reference Counting)
Reference counting applies only to instances of classes. Structures and enumerations are value types, not reference types, and are not stored and passed by reference.
Classes are passed by references and others are passed by value in default.
You can pass by reference by using the inout keyword.
Swift assign, pass and return a value by reference for reference type and by copy for Value Type
[Value vs Reference type]
If compare with Java you can find matches:
Java Reference type(all objects)
Java primitive type(int, bool...) - Swift extends it using struct
struct is a value type so it's always passed as a value. let create struct
//STEP 1 CREATE PROPERTIES
struct Person{
var raw : String
var name: String
var age: Int
var profession: String
// STEP 2 CREATE FUNCTION
func personInformation(){
print("\(raw)")
print("name : \(name)")
print("age : \(age)")
print("profession : \(profession)")
}
}
//allow equal values
B = A then call the function
A.personInformation()
B.personInformation()
print(B.name)
it have the same result when we change the value of 'B' Only Changes Occured in B Because A Value of A is Copied, like
B.name = "Zainab"
a change occurs in B's name. it is Pass By Value
Pass By Reference
Classes Always Use Pass by reference in which only address of occupied memory is copied, when we change similarly as in struct change the value of B , Both A & B is changed because of reference is copied,.
In Swift, when you pass a value type, say an Array to a function. A copy of the array is made for the function to use.
However the documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-XID_134 also says:
The description above refers to the “copying” of strings, arrays, and
dictionaries. The behavior you see in your code will always be as if a
copy took place. However, Swift only performs an actual copy behind
the scenes when it is absolutely necessary to do so. Swift manages all
value copying to ensure optimal performance, and you should not avoid
assignment to try to preempt this optimization.
So does it mean that the copying actually only takes placed when the passed value type is modified?
Is there a way to demonstrate that this is actually the underlying behavior?
Why this is important? If I create a large immutable array and want to pass it in from function to function, I certainly do not want to keep making copies of it. Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?
Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy? Granted that another thread can modify the original array elsewhere (only if it is mutable), making a copy at the moment the function is called necessary (but only if the array passed in is mutable). So if the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right? So what does Apple mean by the phrase above?
TL;DR:
So does it mean that the copying actually only takes placed when the passed value type is modified?
Yes!
Is there a way to demonstrate that this is actually the underlying behavior?
See the first example in the section on the copy-on-write optimization.
Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?
If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.
Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?
You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.
If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?
Exactly, hence the copy-on-write mechanism.
So what does Apple mean by the phrase above?
Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.
Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.
Value types semantics
Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.
Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.
One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...
Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).
What about mutating
Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:
struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}
var s1 = S(foo: 0, bar: 10)
s1.modify()
// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)
Reference types semantics
Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).
Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.
What about inout
An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:
func f(x: inout [Int]) {
x.append(12)
}
var a = [0]
f(x: &a)
// Prints '[0, 12]'
print(a)
In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:
func f(x: inout NSArray) {
x = [12]
}
var a: NSArray = [0]
f(x: &a)
// Prints '(12)'
print(a)
Copy-on-write
Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):
let array1 = [1, 2, 3]
var array2 = array1
// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }
array2[0] = 1
// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }
Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.
This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):
var array1 = [[1, 2], [3, 4]]
var array2 = array1
// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }
array2[0] = []
// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }
Replicating copy-on-write
It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:
final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}
struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}
var a = CopyOnWrite(value: SomeLargeObject())
// This line doesn't copy anything.
var b = a
However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:
If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.
This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.
Value types and multithreading
It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):
One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.
Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.
[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.
[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.
[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.
[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.
[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).
I don't know if that's the same for every value type in Swift, but for Arrays I'm pretty sure it's a copy-on-write, so it doesn't copy it unless you modify it, and as you said if you pass it around as a constant you don't run that risk anyway.
p.s. In Swift 1.2 there are new APIs you can use to implement copy-on-write on your own value-types too
I understand that generally structs in swift are pass-by-value. I use a struct for encapsulating a few bits of information add the struct to a set and later change small bits of its values. However; I seemed to have fallen into an issue whereby the structs are not updating correctly even though I have sprinkled the keyword inout everywhere the parameter requires a struct. My gut instinct was to allocate memory for the struct and refer to it in the set by it's pointer. Would it make sense to simply use a class even though all I need is a list of values that can change.
If you need reference semantics, then absolutely use a class. If you want to be able to modify your object both in a data structure as well as other places, a class is what you need. It is perfectly reasonable to use a class just to get reference semantics.
Also so you know, inout on a function parameter does not actually mean pass by reference. What is actually happening is a copy of your struct is made by the function. This copy is then modified in the function and later copied back to the original variable.
Without your code, I can't see what you're doing wrong, but it works for me - Playground minimal example:
struct Grimxn {
var first: Int
var second: Int
}
func modify(inout v: Grimxn) {
v.first++
v.second--
}
var a = Grimxn(first: 1, second: 2)
print("\(a)") // "Grimxn(first: 1, second: 2)\n"
modify(&a)
print("\(a)") // "Grimxn(first: 2, second: 1)\n" - as required?
I certainly wouldn't want to use pointers - why do that in Swift - use C!
I'm really new to Swift and I just read that classes are passed by reference and arrays/strings etc. are copied.
Is the pass by reference the same way as in Objective-C or Java wherein you actually pass "a" reference or is it proper pass by reference?
Types of Things in Swift
The rule is:
Class instances are reference types (i.e. your reference to a class instance is effectively a pointer)
Functions are reference types
Everything else is a value type; "everything else" simply means instances of structs and instances of enums, because that's all there is in Swift. Arrays and strings are struct instances, for example. You can pass a reference to one of those things (as a function argument) by using inout and taking the address, as newacct has pointed out. But the type is itself a value type.
What Reference Types Mean For You
A reference type object is special in practice because:
Mere assignment or passing to function can yield multiple references to the same object
The object itself is mutable even if the reference to it is a constant (let, either explicit or implied).
A mutation to the object affects that object as seen by all references to it.
Those can be dangers, so keep an eye out. On the other hand, passing a reference type is clearly efficient because only a pointer is copied and passed, which is trivial.
What Value Types Mean For You
Clearly, passing a value type is "safer", and let means what it says: you can't mutate a struct instance or enum instance through a let reference. On the other hand, that safety is achieved by making a separate copy of the value, isn't it? Doesn't that make passing a value type potentially expensive?
Well, yes and no. It isn't as bad as you might think. As Nate Cook has said, passing a value type does not necessarily imply copying, because let (explicit or implied) guarantees immutability so there's no need to copy anything. And even passing into a var reference doesn't mean that things will be copied, only that they can be if necessary (because there's a mutation). The docs specifically advise you not to get your knickers in a twist.
Everything in Swift is passed by "copy" by default, so when you pass a value-type you get a copy of the value, and when you pass a reference type you get a copy of the reference, with all that that implies. (That is, the copy of the reference still points to the same instance as the original reference.)
I use scare quotes around the "copy" above because Swift does a lot of optimization; wherever possible, it doesn't copy until there's a mutation or the possibility of mutation. Since parameters are immutable by default, this means that most of the time no copy actually happens.
It is always pass-by-value when the parameter is not inout.
It is always pass-by-reference if the parameter is inout. However, this is somewhat complicated by the fact you need to explicitly use the & operator on the argument when passing to an inout parameter, so it may not fit the traditional definition of pass-by-reference, where you pass the variable directly.
Here is a small code sample for passing by reference.
Avoid doing this, unless you have a strong reason to.
func ComputeSomeValues(_ value1: inout String, _ value2: inout Int){
value1 = "my great computation 1";
value2 = 123456;
}
Call it like this
var val1: String = "";
var val2: Int = -1;
ComputeSomeValues(&val1, &val2);
The Apple Swift Developer blog has a post called Value and Reference Types that provides a clear and detailed discussion on this very topic.
To quote:
Types in Swift fall into one of two categories: first, “value types”,
where each instance keeps a unique copy of its data, usually defined
as a struct, enum, or tuple. The second, “reference types”, where
instances share a single copy of the data, and the type is usually
defined as a class.
The Swift blog post continues to explain the differences with examples and suggests when you would use one over the other.
When you use inout with an infix operator such as += then the &address symbol can be ignored. I guess the compiler assumes pass by reference?
extension Dictionary {
static func += (left: inout Dictionary, right: Dictionary) {
for (key, value) in right {
left[key] = value
}
}
}
origDictionary += newDictionaryToAdd
And nicely this dictionary 'add' only does one write to the original reference too, so great for locking!
Classes and structures
One of the most important differences between structures and classes is that structures are always copied when they are passed around in your code, but classes are passed by reference.
Closures
If you assign a closure to a property of a class instance, and the closure captures that instance by referring to the instance or its members, you will create a strong reference cycle between the closure and the instance. Swift uses capture lists to break these strong reference cycles
ARC(Automatic Reference Counting)
Reference counting applies only to instances of classes. Structures and enumerations are value types, not reference types, and are not stored and passed by reference.
Classes are passed by references and others are passed by value in default.
You can pass by reference by using the inout keyword.
Swift assign, pass and return a value by reference for reference type and by copy for Value Type
[Value vs Reference type]
If compare with Java you can find matches:
Java Reference type(all objects)
Java primitive type(int, bool...) - Swift extends it using struct
struct is a value type so it's always passed as a value. let create struct
//STEP 1 CREATE PROPERTIES
struct Person{
var raw : String
var name: String
var age: Int
var profession: String
// STEP 2 CREATE FUNCTION
func personInformation(){
print("\(raw)")
print("name : \(name)")
print("age : \(age)")
print("profession : \(profession)")
}
}
//allow equal values
B = A then call the function
A.personInformation()
B.personInformation()
print(B.name)
it have the same result when we change the value of 'B' Only Changes Occured in B Because A Value of A is Copied, like
B.name = "Zainab"
a change occurs in B's name. it is Pass By Value
Pass By Reference
Classes Always Use Pass by reference in which only address of occupied memory is copied, when we change similarly as in struct change the value of B , Both A & B is changed because of reference is copied,.
In Swift, when you pass a value type, say an Array to a function. A copy of the array is made for the function to use.
However the documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-XID_134 also says:
The description above refers to the “copying” of strings, arrays, and
dictionaries. The behavior you see in your code will always be as if a
copy took place. However, Swift only performs an actual copy behind
the scenes when it is absolutely necessary to do so. Swift manages all
value copying to ensure optimal performance, and you should not avoid
assignment to try to preempt this optimization.
So does it mean that the copying actually only takes placed when the passed value type is modified?
Is there a way to demonstrate that this is actually the underlying behavior?
Why this is important? If I create a large immutable array and want to pass it in from function to function, I certainly do not want to keep making copies of it. Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?
Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy? Granted that another thread can modify the original array elsewhere (only if it is mutable), making a copy at the moment the function is called necessary (but only if the array passed in is mutable). So if the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right? So what does Apple mean by the phrase above?
TL;DR:
So does it mean that the copying actually only takes placed when the passed value type is modified?
Yes!
Is there a way to demonstrate that this is actually the underlying behavior?
See the first example in the section on the copy-on-write optimization.
Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?
If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.
Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?
You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.
If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?
Exactly, hence the copy-on-write mechanism.
So what does Apple mean by the phrase above?
Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.
Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.
Value types semantics
Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.
Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.
One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...
Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).
What about mutating
Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:
struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}
var s1 = S(foo: 0, bar: 10)
s1.modify()
// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)
Reference types semantics
Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).
Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.
What about inout
An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:
func f(x: inout [Int]) {
x.append(12)
}
var a = [0]
f(x: &a)
// Prints '[0, 12]'
print(a)
In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:
func f(x: inout NSArray) {
x = [12]
}
var a: NSArray = [0]
f(x: &a)
// Prints '(12)'
print(a)
Copy-on-write
Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):
let array1 = [1, 2, 3]
var array2 = array1
// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }
array2[0] = 1
// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }
Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.
This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):
var array1 = [[1, 2], [3, 4]]
var array2 = array1
// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }
array2[0] = []
// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }
Replicating copy-on-write
It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:
final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}
struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}
var a = CopyOnWrite(value: SomeLargeObject())
// This line doesn't copy anything.
var b = a
However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:
If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.
This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.
Value types and multithreading
It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):
One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.
Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.
[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.
[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.
[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.
[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.
[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).
I don't know if that's the same for every value type in Swift, but for Arrays I'm pretty sure it's a copy-on-write, so it doesn't copy it unless you modify it, and as you said if you pass it around as a constant you don't run that risk anyway.
p.s. In Swift 1.2 there are new APIs you can use to implement copy-on-write on your own value-types too