How does an array in swift deep copy itself when copied or assigned - swift

We all know an array in swift is a value type, this means after copying or assigning an array to another, modify the new array will not effect the old one. Such as:
var a = ["a", "b", "c", "d", "e"]
var b = a
b[0] = "1"
print(a[0]) // a
print(b[0]) // 1
But I'm wondering how could an array work like that. The length for a 'var' array is dynamical. Usually we must alloc some heap memory to contain all the values. And I do peek some source codes for struct Array, the underlining buffer for an array is implemented using a class. But when copying a struct which contains class or memory pointer member, the class and alloced memory will not copied by default.
So how could an array copy its buffer when copy or assign it to another one?

Assignment of any struct (such as Array) causes a shallow copy of the structure contents. There's no special behavior for Array. The buffer that stores the Array's elements is not actually part of the structure. A pointer to that buffer, stored on the heap, is part of the Array structure, meaning that upon assignment, the buffer pointer is copied, but it still points to the same buffer.
All mutating operations on Array do a check to see if the buffer is uniquely referenced. If so, then the algorithm proceeds. Otherwise, a copy of the buffer is made, and the pointer to the new buffer is saved to that Array instance, then the algorithm proceeds as previously. This is called Copy on Write (CoW). Notice that it's not an automatic feature of all value types. It is merely a manually implemented feature of a few standard library types (like Array, Set, Dictionary, String, and others). You could even implement it yourself for your own types.
When CoW occurs, it does not do any deep copying. It will copy values, which means:
In the case of value types (struct, enum, tuples), the values are the struct/enum/tuples themselves. In this case, a deep and shallow copy are the same thing.
In the case of reference types (class), the value being copied is the reference. The referenced object is not copied. The same object is pointed to by both the old and copied reference. Thus, it's a shallow copy.

Related

Does String.init(cString: UnsafePointer<Int8>) copy the memory contents?

What are the inner workings?
Does it create a Swift string copy of the C string data?
Does it use it as a reference and return it as a Swift string, so the string returned uses the same data? How does it work?
Does it copy the C string into a newly allocated Swift string?
If String(cString: UnsafePointer<Int8>) indeed works by copying the C string into a newly allocated swift string, is there a way to convert C strings to swift by referencing the already existing data instead of copying it?
How does String(cString: UnsafePointer<Int8>) work, and how can I determine whether it copies, or whether it references the same memory as a Swift string?
The documentation clearly states that the data is copied:
Initializer
init(cString:)
Creates a new string by copying the null-terminated UTF-8 data referenced by the given pointer.
is there a way to convert C strings to swift by referencing the already existing data instead of copying it?
Nope. Strings are frequently copied/destroyed, which involves doing retain/release operations on the underlying buffer, to do the necessary booking keeping of thethe reference count. If the memory is not owned by the String, then there's no way to reliably de-allocate it.
What are you trying to achieve by avoiding the copy?

Does Data's "copy constructor" copy its referenced bytes if inited with `freeWhenDone:false`?

If I allocate a Data object with bytesNoCopy:count:deallocator:.none, it should reference the given bytes but in an unsafe manner, where I as programmer promise the bytes will be available during the lifetime of the Data, rather than Data controlling that on its own.
That's all fine. What I wonder is... Since it's a value type rather than a reference type, what happens when I assign another Data variable from my nocopy-Data? Does it COPY THE DATA (against my explicit wishes)? Or does it create one more unsafe Data instance which I must track the lifetime of, or risk crashes?
Here's an illustration:
let unsafe = malloc(5);
func makeUnsafeData() -> Data
{
return Data(bytesNoCopy: unsafe, count: 5, deallocator: .none)
}
struct Foo
{
var d: Data
}
var foo = Foo(d: makeUnsafeData())
free(unsafe)
The question is: does foo.d contain a dangling pointer to the freed bytes that used to be in unsafe? Or does it contain its own copy of those bytes, and is safe to use?
This experiment gist seems to indicate that NSData crashes in the above scenario, as expected, but Data does not; so my tentative conclusion is that Data copies the data, and there's no way to use a Data instance to transport bytes between functions without copying the bytes. But I'd love a reference to any documentation refuting or confirming this theory.
Turns out, the answer is in the documentation after all... as long as all your Data instances are let and you don't mutate to it ever, ONLY the original bytes should be in memory, all Datas just referencing it.
I might just use NSData instead though since it's a reference type and less magic going on...

Why storing substrings may lead to memory leak in Swift?

On Apple's documentation on Substring, is says:
Don’t store substrings longer than you need them to perform a specific operation. A substring holds a reference to the entire storage of the string it comes from, not just to the portion it presents, even when there is no other reference to the original string. Storing substrings may, therefore, prolong the lifetime of string data that is no longer otherwise accessible, which can appear to be memory leakage.
I feel confused that String is a value type in Swift and how does it lead to memory leak?
Swift Arrays, Sets, Dictionaries and Strings have value semantics, but they're actually copy-on-write wrappers for reference types. In other words, they're all struct wrappers around a class. This allows the following to work without making a copy:
let foo = "ABCDEFG"
let bar = foo
When you write to a String, it uses the standard library function isUniquelyReferencedNonObjC (unless it's been renamed again) to check if there are multiple references to the backing object. If so, it creates a copy before modifying it.
var foo = "ABCDEFG"
var bar = foo // no copy (yet)
bar += "HIJK" // backing object copied to keep foo and bar independent
When you use a Substring (or array slice), you get a reference to the entire backing object rather than just the bit that you want. This means that if you have a very large string and you have a substring of just 4 characters, as long as the substring is live, you're holding the entire string backing buffer in memory. This is the leak that this warns you about.
Given the way Swift is often portrayed your confusion is understandable. Types such as String, Array and Dictionary present value semantics but are library types constructed from a combination of value and references types.
The implementation of these types use dynamically allocated storage. This storage can be shared between different values. However library facilities are used to implement copy-on-write so that such shared storage is copied as needed to maintain value semantics, that is behaviour like that of value types.
HTH

Removing from set during enumeration in Swift? [duplicate]

Recently, I wrote this code without thinking about it very much:
myObject.myCollection.forEach { myObject.removeItem($0) }
where myObject.removeItem(_) removes an item from myObject.myCollection.
Looking at the code now, I am puzzled as to why this even works - shouldn't I get an exception along the lines of Collection was mutated while being enumerated?
The same code even works when using a regular for-in loop!
Is this expected behaviour or am I 'lucky' that it isn't crashing?
This is indeed expected behaviour – and is due to the fact that an Array in Swift (as well as many other collections in the standard library) is a value type with copy-on-write semantics. This means that its underlying buffer (which is stored indirectly) will be copied upon being mutated (and, as an optimisation, only when it's not uniquely referenced).
When you come to iterate over a Sequence (such as an array), be it with forEach(_:) or a standard for in loop, an iterator is created from the sequence's makeIterator() method, and its next() method is repeatedly applied in order to sequentially generate elements.
You can think of iterating over a sequence as looking like this:
let sequence = [1, 2, 3, 4]
var iterator = sequence.makeIterator()
// `next()` will return the next element, or `nil` if
// it has reached the end sequence.
while let element = iterator.next() {
// do something with the element
}
In the case of Array, an IndexingIterator is used as its iterator – which will iterate through the elements of a given collection by simply storing that collection along with the current index of the iteration. Each time next() is called, the base collection is subscripted with the index, which is then incremented, until it reaches endIndex (you can see its exact implementation here).
Therefore, when you come to mutate your array in the loop, its underlying buffer is not uniquely referenced, as the iterator also has a view onto it. This forces a copy of the buffer – which myCollection then uses.
So, there are now two arrays – the one which is being iterated over, and the one you're mutating. Any further mutations in the loop won't trigger another copy, as long as myCollection's buffer remains uniquely referenced.
This therefore means that it's perfectly safe to mutate a collection with value semantics while enumerating over it. The enumeration will iterate over the full length of the collection – completely independant of any mutations you do, as they will be done on a copy.
I asked a similar question in the Apple Developer
Forum and the answer is "yes, because of the value semantics of Array".
#originaluser2 said that already, but I would argue slightly different:
When myObject.removeItem($0) is called, a new array is created and
stored under the name myObject, but the array that forEach() was called upon is not modified.
Here is a simpler example demonstrating the effect:
extension Array {
func printMe() {
print(self)
}
}
var a = [1, 2, 3]
let pm = a.printMe // The instance method as a closure.
a.removeAll() // Modify the variable `a`.
pm() // Calls the method on the value that it was created with.
// Output: [1, 2, 3]
The Collection was copied before starting the iteration , and the code inside Foreach is applied on the real collection but the iteration is happening on the copied Collection which will be deleted after the last iteration .

Remove element from collection during iteration with forEach

Recently, I wrote this code without thinking about it very much:
myObject.myCollection.forEach { myObject.removeItem($0) }
where myObject.removeItem(_) removes an item from myObject.myCollection.
Looking at the code now, I am puzzled as to why this even works - shouldn't I get an exception along the lines of Collection was mutated while being enumerated?
The same code even works when using a regular for-in loop!
Is this expected behaviour or am I 'lucky' that it isn't crashing?
This is indeed expected behaviour – and is due to the fact that an Array in Swift (as well as many other collections in the standard library) is a value type with copy-on-write semantics. This means that its underlying buffer (which is stored indirectly) will be copied upon being mutated (and, as an optimisation, only when it's not uniquely referenced).
When you come to iterate over a Sequence (such as an array), be it with forEach(_:) or a standard for in loop, an iterator is created from the sequence's makeIterator() method, and its next() method is repeatedly applied in order to sequentially generate elements.
You can think of iterating over a sequence as looking like this:
let sequence = [1, 2, 3, 4]
var iterator = sequence.makeIterator()
// `next()` will return the next element, or `nil` if
// it has reached the end sequence.
while let element = iterator.next() {
// do something with the element
}
In the case of Array, an IndexingIterator is used as its iterator – which will iterate through the elements of a given collection by simply storing that collection along with the current index of the iteration. Each time next() is called, the base collection is subscripted with the index, which is then incremented, until it reaches endIndex (you can see its exact implementation here).
Therefore, when you come to mutate your array in the loop, its underlying buffer is not uniquely referenced, as the iterator also has a view onto it. This forces a copy of the buffer – which myCollection then uses.
So, there are now two arrays – the one which is being iterated over, and the one you're mutating. Any further mutations in the loop won't trigger another copy, as long as myCollection's buffer remains uniquely referenced.
This therefore means that it's perfectly safe to mutate a collection with value semantics while enumerating over it. The enumeration will iterate over the full length of the collection – completely independant of any mutations you do, as they will be done on a copy.
I asked a similar question in the Apple Developer
Forum and the answer is "yes, because of the value semantics of Array".
#originaluser2 said that already, but I would argue slightly different:
When myObject.removeItem($0) is called, a new array is created and
stored under the name myObject, but the array that forEach() was called upon is not modified.
Here is a simpler example demonstrating the effect:
extension Array {
func printMe() {
print(self)
}
}
var a = [1, 2, 3]
let pm = a.printMe // The instance method as a closure.
a.removeAll() // Modify the variable `a`.
pm() // Calls the method on the value that it was created with.
// Output: [1, 2, 3]
The Collection was copied before starting the iteration , and the code inside Foreach is applied on the real collection but the iteration is happening on the copied Collection which will be deleted after the last iteration .