I already know the benefit of immutability over mutability in being able to reason about code and introducing less bugs, especially in multithreaded code. In creating structs, though, I cannot see any benefit over creating a completely immutable struct over a mutable one.
Let's have as an example of a struct that keeps some score:
struct ScoreKeeper {
var score: Int
}
In this structure I can change the value of score on an existing struct variable
var scoreKeeper = ScoreKeeper(score: 0)
scoreKeeper.score += 5
println(scoreKeeper.score)
// prints 5
The immutable version would look like this:
struct ScoreKeeper {
let score: Int
func incrementScoreBy(points: Int) -> ScoreKeeper {
return ScoreKeeper(score: self.score + points)
}
}
And its usage:
let scoreKeeper = ScoreKeeper(score: 0)
let newScoreKeeper = scoreKeeper.incrementScoreBy(5)
println(newScoreKeeper.score)
// prints 5
What I don't see is the benefit of the second approach over the first, since structs are value types. If I pass a struct around, it always gets copied. So it does not seem to matter to me if the structure has a mutable property, since other parts of the code would be working on a separate copy anyway, thus removing the problems of mutability.
I have seen some people using the second example, though, which requires more code for no apparent benefit. Is there some benefit I'm not seeing?
Different approaches will facilitate different kinds of changes to the code. An immutable structure is very similar to an immutable class object, but a mutable structure and a mutable class object are very different. Thus, code which uses an immutable structure can often be readily adapted if for some reason it becomes necessary to use a class object instead.
On the flip side, use of an immutable object will often make the code to replace a variable with a modified version more brittle in case additional properties are added to the type in question. For example, if a PhoneNumber type includes methods for AreaCode, LocalExchange, and LocalNumber and a constructor that takes those parameters, and then adds an "optional" fourth property for Extension, then code which is supposed to change the area codes of certain phone numbers by passing the new area code, LocalExchange, and LocalNumber, to the three-argument constructor will erase the Extension property of every phone number, while code which could write to AreaCode directly wouldn't have had that problem.
Your remark about copying value types is very good. Maybe this doesn't make much sense in particular language (swift) and particular compiler implementation (current version) but in general if the compiler knows for sure that the data structure is immutable, it could e.g. use reference instead of a copy behind the scenes to gain some performance improvement. This could not be done with mutable type for obvious reasons.
Even more generally speaking, limitation means information. If you limit your data structure somehow, you gain some extra knowledge about it. And extra knowledge means extra possibilities ;) Maybe the current compiler does not take advantage of them but this does not mean they are not here :)
Good analysis, especially pointing out that structs are passed by value and therefore will not be altered by other processes.
The only benefit I can see is a stylistic one by making the immutability of the element explicit.
It is more of a style to make value based types be treated on par with object based types in object oriented styles. It is more of a personal choice, and I don't see any big benefits in either of them.
In general terms, immutable objects are less costly to the system than mutable ones. Mutable objects need to have infrastructure for taking on new values, and the system has to allow for the fact that their values can change at any time.
Mutable objects are also a challenge in concurrent code because you have to guard against the value changing out from under you from another thread.
However, if you are constantly creating and destroying unique immutable objects, the overhead of creating new ones becomes costly quite quickly.
In the foundation classes, NSNumber is an immutable object. The system maintains a pool of NSNumber objects that you've used before, and under the covers, gives you back an existing number if you ask for one with the same value as one you created before.
That's about the only situation in which I could see value in using static structs - where they don't change very much and you have a fairly small pool of possible values. In that case you'd probably want to se up your class with a "factory method" that kept recently used structs around and reused them if you asked for a struct with the same value again.
Such a scheme could simplify concurrent code, as mentioned above. In that case you wouldn't have to guard against the values of your structs changing in another thread. If you were using such a struct, you could know that it would never change.
Related
In Swift, when you pass a value type, say an Array to a function. A copy of the array is made for the function to use.
However the documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-XID_134 also says:
The description above refers to the “copying” of strings, arrays, and
dictionaries. The behavior you see in your code will always be as if a
copy took place. However, Swift only performs an actual copy behind
the scenes when it is absolutely necessary to do so. Swift manages all
value copying to ensure optimal performance, and you should not avoid
assignment to try to preempt this optimization.
So does it mean that the copying actually only takes placed when the passed value type is modified?
Is there a way to demonstrate that this is actually the underlying behavior?
Why this is important? If I create a large immutable array and want to pass it in from function to function, I certainly do not want to keep making copies of it. Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?
Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy? Granted that another thread can modify the original array elsewhere (only if it is mutable), making a copy at the moment the function is called necessary (but only if the array passed in is mutable). So if the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right? So what does Apple mean by the phrase above?
TL;DR:
So does it mean that the copying actually only takes placed when the passed value type is modified?
Yes!
Is there a way to demonstrate that this is actually the underlying behavior?
See the first example in the section on the copy-on-write optimization.
Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?
If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.
Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?
You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.
If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?
Exactly, hence the copy-on-write mechanism.
So what does Apple mean by the phrase above?
Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.
Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.
Value types semantics
Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.
Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.
One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...
Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).
What about mutating
Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:
struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}
var s1 = S(foo: 0, bar: 10)
s1.modify()
// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)
Reference types semantics
Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).
Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.
What about inout
An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:
func f(x: inout [Int]) {
x.append(12)
}
var a = [0]
f(x: &a)
// Prints '[0, 12]'
print(a)
In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:
func f(x: inout NSArray) {
x = [12]
}
var a: NSArray = [0]
f(x: &a)
// Prints '(12)'
print(a)
Copy-on-write
Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):
let array1 = [1, 2, 3]
var array2 = array1
// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }
array2[0] = 1
// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }
Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.
This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):
var array1 = [[1, 2], [3, 4]]
var array2 = array1
// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }
array2[0] = []
// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }
Replicating copy-on-write
It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:
final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}
struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}
var a = CopyOnWrite(value: SomeLargeObject())
// This line doesn't copy anything.
var b = a
However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:
If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.
This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.
Value types and multithreading
It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):
One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.
Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.
[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.
[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.
[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.
[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.
[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).
I don't know if that's the same for every value type in Swift, but for Arrays I'm pretty sure it's a copy-on-write, so it doesn't copy it unless you modify it, and as you said if you pass it around as a constant you don't run that risk anyway.
p.s. In Swift 1.2 there are new APIs you can use to implement copy-on-write on your own value-types too
I used to do things like below:
class A {
var param1:String?
var param2:[B]?
}
class B {
var param1:String?
var param2:String?
var param3:[C]?
}
class C {
var param1:String?
var param2:String?
}
But recently I found that dictionaries are more flexible. Class A can be replaced by the following dictionary.
[
"param1":"some string",
"param2":[
"param1":"some string",
"param2":"some string",
"param3":[
"param1":"some string",
"param2":"some string"
],
[
...
...
]
],
[
...
...
],
...
]
If we want to add "param3" into class C, we need to modify a lot of associated code if using class. But if we use dictionaries, we can just use "param3" as if it already exists.
A dictionary is just like a runtime defined class. I am wondering should we use dictionaries to replace data storing classes (i.e. models in MVC pattern) in all situations.
It depends on the use you have of your model. Making small classes enables you to give each class a specific additional behavior (for example more specific isolated accessory methods or helpers).
You can also test the model more easily by using only the piece you want and mock the other.
In general splitting responsibility is better because of maintenance and testability and clear code.
If your dictionary grows out of control then it is going to be very difficult for a newcomer on your team to use and understand the giant blob of data, rather than handling a lot of small objects with relationships between themselves.
If you add a new parameter you might need to change a lot of initializers.
That is normal I would say.
Also it depends on how you manage the model initialization. Maybe you use a factory that hides this complexity for you inside the rest of your code.
Or maybe you will need just to change it in your dependency injection root.
It clearly depends on the approach and scope of the object you are creating.
But in my opinion isolated objects are more reusable than a big blob of data in a dictionary
I agree that dictionaries are more extensible, but classes are safer.
One big unsafe thing about dictionaries is that you don't know whether a key exist or not at compile time. You have to put guard let or if let statements all over the place whenever you want to access something. If you don't do this, the app will crash at runtime when the key does not exist. Sure, you can fix it after it crashed, but you wasted a lot of time running your app and making that erroneous line of code to run and crash.
The other unsafe thing is type-unsafety. Since your dictionary contains different types of stuff, It must be a [String: Any]. Normally you can do this with classes:
someAObject.param2!.first!.param3!.first!.param1
If you use dictionaries you need:
(((dict["param2"]! as! [[String: Any]]).first!["param3"] as! [[String: Any]]).first! as! [String: Any])["param1"]
Just look at how much more code that is! Also, when you want a method to accept a parameter, you can write A or B or C if you are using classes and the method will only accept the type you specify. If you are using dictionaries, all you can write is [String: Any]. There is no compile time check whether that dictionary is of the acceptable type.
The third thing is about typos. If you typed a property name wrong, Xcode will tell you that even before you run the app. If you typed a dictionary key wrong, Xcode will not tell you that. You have to run that bit of code to know. Sure, you can put keys into constants, but that is very troublesome and the trouble definitely overweighs what you call "benefits" of dictionaries.
The fourth point is that dictionaries are value types. You might want some of the features of reference types.
And last but not least, you cannot add methods to dictionaries! A very important feature of classes is that they allow you to add methods and you can call them on instances of the class. If you made good use of this, you can write very readable code.
If we want to add "param3" into class C, we need to modify a lot of associated code if using class
Not if you designed your model well. I can't think of a reason why adding a new property to a class would require you to change lots of associated code.
In Swift, when you pass a value type, say an Array to a function. A copy of the array is made for the function to use.
However the documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-XID_134 also says:
The description above refers to the “copying” of strings, arrays, and
dictionaries. The behavior you see in your code will always be as if a
copy took place. However, Swift only performs an actual copy behind
the scenes when it is absolutely necessary to do so. Swift manages all
value copying to ensure optimal performance, and you should not avoid
assignment to try to preempt this optimization.
So does it mean that the copying actually only takes placed when the passed value type is modified?
Is there a way to demonstrate that this is actually the underlying behavior?
Why this is important? If I create a large immutable array and want to pass it in from function to function, I certainly do not want to keep making copies of it. Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?
Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy? Granted that another thread can modify the original array elsewhere (only if it is mutable), making a copy at the moment the function is called necessary (but only if the array passed in is mutable). So if the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right? So what does Apple mean by the phrase above?
TL;DR:
So does it mean that the copying actually only takes placed when the passed value type is modified?
Yes!
Is there a way to demonstrate that this is actually the underlying behavior?
See the first example in the section on the copy-on-write optimization.
Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?
If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.
Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?
You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.
If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?
Exactly, hence the copy-on-write mechanism.
So what does Apple mean by the phrase above?
Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.
Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.
Value types semantics
Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.
Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.
One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...
Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).
What about mutating
Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:
struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}
var s1 = S(foo: 0, bar: 10)
s1.modify()
// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)
Reference types semantics
Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).
Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.
What about inout
An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:
func f(x: inout [Int]) {
x.append(12)
}
var a = [0]
f(x: &a)
// Prints '[0, 12]'
print(a)
In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:
func f(x: inout NSArray) {
x = [12]
}
var a: NSArray = [0]
f(x: &a)
// Prints '(12)'
print(a)
Copy-on-write
Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):
let array1 = [1, 2, 3]
var array2 = array1
// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }
array2[0] = 1
// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }
Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.
This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):
var array1 = [[1, 2], [3, 4]]
var array2 = array1
// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }
array2[0] = []
// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }
Replicating copy-on-write
It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:
final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}
struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}
var a = CopyOnWrite(value: SomeLargeObject())
// This line doesn't copy anything.
var b = a
However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:
If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.
This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.
Value types and multithreading
It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):
One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.
Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.
[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.
[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.
[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.
[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.
[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).
I don't know if that's the same for every value type in Swift, but for Arrays I'm pretty sure it's a copy-on-write, so it doesn't copy it unless you modify it, and as you said if you pass it around as a constant you don't run that risk anyway.
p.s. In Swift 1.2 there are new APIs you can use to implement copy-on-write on your own value-types too
I just attended a Scala-lecture at a summer school. The lecturer got the following question:
- "Is there any way for the compiler to tell if a class is immutable?"
The lecturer responded
- "No, there isn't. It would be very nice if it could."
I was surprised. Isnt't it just to check if the class contains any var-members?
What is immutable?
Checking to see if the object only contains val fields is an overapproximation of immutability - the object may very well contain vars, but never assign different values in them. Or the segments of the program assigning values to vars may be unreachable.
According to the terminology of Chris Okasaki, there are immutable data structures and functional data structures.
An immutable data structure (or a class) is a data structure which, once constructed in memory, never changes its components and values - an example of this is a Scala tuple.
However, if you define the immutability of an object as the immutability of itself and all the objects reachable through references from the object, then a tuple may not be immutable - it depends on what you later instantiate it with. Sometimes there is not enough information about the program available at compile time to decide if a given data structure is immutable in the sense of containing only vals. And the information is missing due to polymorphism, whether parametric, subtyping or ad-hoc (type classes).
This is the first problem with deciding immutability - lack of static information.
A functional data structure is a data structure on which you can do operations whose outputs depend solely on the inputs for a given state. An example of such a data structure is a search tree which caches the last item looked up by storing it in a mutable field. Even though every lookup will write the last item searched into the mutable field, so that if the item is looked up again the search doesn't have to be repeated, the outputs of the lookup operation for such a data structure always remain the same given that nobody inserts new items into it. Another example of a functional data structure are splay trees.
In a general imperative programming model, to check if an operation is pure, that is - do the outputs depend solely on inputs, is undecidable. Again, one could use a technique such as abstract interpretation to provide a conservative answer, but this is not an exact answer to the question of purity.
This is the second problem with deciding if something having vars is immutable or functional (observably immutable) - undecidability.
I think the problem is that you need to ensure that all your vals don’t have any var members either. And this you cannot. Consider
class Base
case class Immutable extends Base { val immutable: Int = 0 }
case class Mutable extends Base { var mutable: Int = _ }
case class Immutable_?(b: Base)
Even though Immutable_?(Immutable) is indeed immutable, Immutable_?(Mutable) is not.
If you save a mutable object in a val the object itself is still mutable. So you would have to check if each class you use in a val is immutable.
case class Mut(var mut:Int)
val m = Mut(1)
println(m.toString)
m.mut = 3
println(m.toString)
In addition to what others have said, take a look at effect systems and discussion about supporting one in Scala.
It is not quite as easy since you could have vals that are linked to other mutable classes or, even harder to detect, that calls methods in other classes or objects that are mutable.
Also, you could very well have a immutable class that in fact has vars (to be more efficient for example...).
I guess you could have something that checks if a class looks like it is immutable or not though, but it sounds like it could be pretty confusing.
You can have a class, which can be instantiated to an object, and this object can be mutable or immutable.
Example: A class may contain a List[_], which, at runtime, can be a List[Int] or a List[StringBuffer]. So two different objects of a class could be either mutable, or immutable.
I'm fairly new to programming, and there's one thing I'm confused by. What is a class, and how do I use one? I understand a little bit, but I can't seem to find a full answer.
By the way, if this is language-specific, then I'm programming in PHP.
Edit: There's something else I forgot to say. Specifically, I meant to ask how defining functions are used in classes. I've seen examples of PHP code where functions are defined inside classes, but I can't really understand why.
To be as succinct as possible: a class describes a collection of data that can perform actions on itself.
For example, you might have a class that represents an image. An object of this class would contain all of the data necessary to describe the image, and then would also contain methods like rotate, resize, crop, etc. It would also have methods that you could use to ask the object about its own properties, like getColorPalette, or getWidth. This as opposed to being able to directly access the color pallette or width in a raw (non-object) data collection - by having data access go through class methods, the object can enforce constraints that maintain consistency (e.g. you shouldn't be able to change the width variable without actually changing the image data to be that width).
This is how object-oriented programming differs from procedural programming. In procedural programming, you have data and you have functions. The functions act on data, but there's no "ownership" of the data, and no fundamental connection between the data and the functions which make use of it.
In object-oriented programming, you have objects which are data in combination with actions. Each type of data has a defined set of actions that it can perform on itself, and a defined set of properties that it allows functions and other objects to read and write in a defined, constraint-respecting manner.
The point is to decouple parts of the program from each other. With an Image class, you can be assured that all of the code that manipulates the image data is within the Image class's methods. You can be sure that no other code is going to be mucking about with the internals of your images in unexpected ways. On the other hand, code outside your image class can know that there is a defined way to manipulate images (resize, crop, rotate methods, etc), and not have to worry about exactly how the image data is stored, or how the image functions are implemented.
Edit: And one more thing that is sometimes hard to grasp is the relationship between the terms "class" and "object". A "class" is a description of how to create a particular type of "object". An Image class would describe what variables are necessary to store image data, and give the implementation code for all of the Image methods. An Image object, called an "instance" of an image class, is a particular use of that description to store some actual data. For example, if you have five images to represent, you would have five different image "objects", all of the same Image "class".
Classes is a term used in the object oriented programming (OOP) paradigm. They provide abstraction, modularity and much more to your code. OOP is not language specific, other examples of languages supporting it are C++ and Java.
I suggest youtube to get an understanding of the basics. For instance this video and other related lectures.
Since you are using PHP I'll use it in my code examples but most everything should apply.
OOP treats everything as an object, which is a collection of methods (functions) and variables. In most languages objects are represented in code as classes.
Take the following code:
class person
{
$gender = null;
$weight = null;
$height = null;
$age = null;
$firstName = null;
$lastName = null;
function __CONSTRUCT($firstName, $lastName)
{
//__CONSTRUCT is a special method that is called when the class is initialized
$this->firstName = $firstName;
$this->lastName = $lastName;
}
}
This is a valid (if not perfect) class when you use this code you'll first have to initailize an instance of the class which is like making of copy of it in a variable:
$steve = new person('Steve', 'Jobs');
Then when you want to change some property (not technicaly the correct word as there are no properties in PHP but just bear with me in this case I mean variable). We can access them like so:
$steve->age = 54;
Note: this assumes you are a little familiar with programming, which I guess you are.
A class is like a blueprint. Let's suppose you're making a game with houses in it. You'd have a "House" class. This class describes the house and says what can it do and what can be done to it. You can have attributes, like height, width, number of rooms, city where it is located, etc. You can also have "methods" (fancy name for functions inside a class). For example, you can have a "Clean()" method, which would tell all the people inside the house to clean it.
Now suppose someone is playing your game and clicks the "make new house" button. You would then create a new object from that class. In PHP, you'd write "$house = new House;", and now $house has all the attributes and methods of a class.
You can make as many houses as you want, and they will all have the same properties, which you can then change. For example, if the people living in a house decide to add one more room, you could write "$house->numberOfRooms++;". If the default number of rooms for a house was 4, this house would have 5 rooms, and all the others would have 4. As you can see, the attributes are independent from one instance to another.
This is the basics; there is a lot more stuff about classes, like inheritance, access modifiers, etc.
Now, you may ask yourself why is this useful. Well, the point of Object Oriented Programming (OOP) is to think of all the things in the program as independent objects, trying to design them so they can be used regardless of context. For example, your house may be a standalone variable, may be inside an array of houses. If you have a "Person" class with a "residence" attribute, then your house may be that attribute.
This is the theory behind classes and objects. I suggest you look around for examples of code. If you want, you can look at the classes I made for a Pong game I programmed. It's written in Python and may use some stuff you don't understand, but you will get the basic idea. The classes are here.
A class is essentially an abstraction.
You have built-in datatypes such as "int" or "string" or "float", each of which have certain behavior, and operations that are possible.
For example, you can take the square root of a float, but not of a string. You can concatenate two strings, or you can add two integers. Each of these data types represent a general concept (integers, text or numbers with a fixed number of significant digits, which may or may not be fractional)
A class is simply a user-defined datatype that can represent some other concept, including the operations that are legal on it.
For example, we could define a "password" class which implements the behavior expected of a password. That is, we should be able to take a text string and create a password from it. (If I type 'secret02', that is a legal password). It should probably perform some verification on this input string, making sure that it is at least N characters long, and perhaps that it is not a dictionary word. And it should not allow us to read the password. (A password is usually represented as ****** on the screen). Instead, it should simply allow us to compare the password to other passwords, to see if it is identical.
If the password I just typed is the same as the one I originally signed up with, I should be allowed to log in. But what the password actually is, is not something the application I'm logging in to should know. So our password class should define a comparison function, but not a "display" function.
A class basically holds some data, and defines which operations are legal on that data. It creates an abstraction.
In the password example, the data is obviously just a text string internally, but the class allows only a few operations on this data. It prevents us from using the password as a string, and instead only allows the specific operations that would make sense for a password.
In most languages, the members of a class can be either private or public. Anything that is private can only be accessed by other members of the class. That is how we would implement the string stored inside the password class. It is private, so it is still visible to the operations we define in the class, but code outside the class can not just access the string inside a password. They can only access the public members of the class.
A class is a form of structure you could think of, such as int, string and so forth that an instance can be made from using object oriented programming language. Like a template or blueprint the class takes on the structure. You write this structure with every association to the class. Something from a class would be used as an object instance in the Main() method where all the sysync programming steps take place.
This is why you see people write code like Car car = new Car();to draw out a new object from a class. I personally do not like this type of code, its very bad and circular and does not explain which part is the class syntax (arrangement). Too bad many programmers use this syntax and it is difficult for beginners to understand what they are perceiving.
Think of this as,
CarClass theCar = new CarClass(); //
The class essentially takes on the infinitely many forms. You can write properties that describe the CarClass and every car generated will have these. To get them from the property that "gets" what (reads) and "sets" what (writes) data, you simply use the dot operator on the object instance generates in the Main() and state the descriptive property to the actual noun. The class is the noumenon (a word for something like math and numbers, you cannot perceive it to the senses but its a thought like the #1). Instead of writing each item as a variable the class enables us to write a definition of the object to use.
With the ability to write infinitely many things there is great responsibility! Like "Hello World!" how this little first statement says much about our audience as programmers.
So
CarClass theCar = new CarClass(); //In a way this says this word "car" will be a car
theCar.Color = red; //Given the instance of a car we can add that color detail.
Now these are only implementations of the CarClass, not how to build one.
You must be wondering what are some other terms, a field, constructor, and class level methods and why we use them and indexing.
A field is another modifier on a property. These tend to be written on a private class level so nothing from the outside affects it and tends to be focused on the property itself for functionality. It is in another region where you declare it usually with an underscore in front of it. The field will add constraints necessary to maintain data integrity meaning that you will prevent people from writing values that make no sense in the context. (Like real like measurements in the negative... that is just not real.)
The Constructor
The easiest way to describe a constructor is to make claims to some default values on the object properties where the constructor scope is laid. In example a car has a color, a max speed, a model and a company. But what should these values be and should some be used in millions of copies from the CarClass or just a few? The constructor enables one to do this, to generate copies by establishing a basic quality. These values are the defaults assigned to a property in a constructor block. To design a constructor block type ctor[tab][tab]. Inside this simply refer to those properties you write above and place an assigned value on it.
Color = “Red”;
If you go to the main() and now use the car.Color property in any writing output component such as a the console window or textbox you should see the word “Red”. The details are thus implicit and hidden. Instead of offering every word from a book you simply refer to the book then the computer gets the remaining information. This makes code scripts compact and easy to use.
The Class level method should explain how to do some process over and over. Typically a string or some writing you can format some written information for a class and format it with placeholders that are in the writing to display that are represented with your class properties. It makes sense when you make an object instance then need to use the object to display the details in a .ToString() form. The class object instance in a sense can also contain information like a book or box. When we write .ToString() with a ToString override method at class level it will print your custom ToString method and how it should explain the code. You can also write a property .ToString() and read it. This below being a string should read fine as it is...
Console.Writeline(theCar.Color);
Once you get many objects, one at a time you can put them in a list that allows you to add or remove them. Just wait...
Here's a good page about Classes and Objects:
http://ficl.sourceforge.net/oo_in_c.html
This is a resource which I would kindly recommend
http://www.cplusplus.com/doc/tutorial/
not sure why, but starting with C++ to apply OOP might be natural prior of any other language, the above link helped me a lot when I started at least.
Classes are a way programmers mark their territory on code.
They are supposedly necessary for writing big projects.
Linus and his team must have missed that memo developing the linux kernel.
However, they can be good for organization and categorizing code I guess.
It makes it easier to navigate code in an ide such as visual studio with the object browsers.
Here are some usage demonstrations of classes in 31 languages on rosettacode
First of all back to the definitions:
Class definition:
Abstract definition of something, an user-type, a blueprint;
Has States / Fields / Properties (what an object knows) and Methods / Behaviors / Member Functions (what an object does);
Defines objects behavior and default values;
Object definition:
Instance of a Class, Repository of data;
Has a unique identity: the property of an object that distinguishes it from other objects;
Has its own states: describes the data stored in the object;
Exhibits some well defined behavior: follows the class’s description;
Instantiation:
Is the way of instantiate a class to create an object;
Leaves the object in a valid state;
Performed by a constructor;
To use a class you must instantiate the class though a contructor. In PHP a straight-forward example could be:
<?php
class SampleClass {
function __construct() {
print "In SampleClass constructor\n";
}
}
// In SampleClass constructor
$obj = new SampleClass ();
?>