Does Data's "copy constructor" copy its referenced bytes if inited with `freeWhenDone:false`? - swift

If I allocate a Data object with bytesNoCopy:count:deallocator:.none, it should reference the given bytes but in an unsafe manner, where I as programmer promise the bytes will be available during the lifetime of the Data, rather than Data controlling that on its own.
That's all fine. What I wonder is... Since it's a value type rather than a reference type, what happens when I assign another Data variable from my nocopy-Data? Does it COPY THE DATA (against my explicit wishes)? Or does it create one more unsafe Data instance which I must track the lifetime of, or risk crashes?
Here's an illustration:
let unsafe = malloc(5);
func makeUnsafeData() -> Data
{
return Data(bytesNoCopy: unsafe, count: 5, deallocator: .none)
}
struct Foo
{
var d: Data
}
var foo = Foo(d: makeUnsafeData())
free(unsafe)
The question is: does foo.d contain a dangling pointer to the freed bytes that used to be in unsafe? Or does it contain its own copy of those bytes, and is safe to use?
This experiment gist seems to indicate that NSData crashes in the above scenario, as expected, but Data does not; so my tentative conclusion is that Data copies the data, and there's no way to use a Data instance to transport bytes between functions without copying the bytes. But I'd love a reference to any documentation refuting or confirming this theory.

Turns out, the answer is in the documentation after all... as long as all your Data instances are let and you don't mutate to it ever, ONLY the original bytes should be in memory, all Datas just referencing it.
I might just use NSData instead though since it's a reference type and less magic going on...

Related

Does String.init(cString: UnsafePointer<Int8>) copy the memory contents?

What are the inner workings?
Does it create a Swift string copy of the C string data?
Does it use it as a reference and return it as a Swift string, so the string returned uses the same data? How does it work?
Does it copy the C string into a newly allocated Swift string?
If String(cString: UnsafePointer<Int8>) indeed works by copying the C string into a newly allocated swift string, is there a way to convert C strings to swift by referencing the already existing data instead of copying it?
How does String(cString: UnsafePointer<Int8>) work, and how can I determine whether it copies, or whether it references the same memory as a Swift string?
The documentation clearly states that the data is copied:
Initializer
init(cString:)
Creates a new string by copying the null-terminated UTF-8 data referenced by the given pointer.
is there a way to convert C strings to swift by referencing the already existing data instead of copying it?
Nope. Strings are frequently copied/destroyed, which involves doing retain/release operations on the underlying buffer, to do the necessary booking keeping of thethe reference count. If the memory is not owned by the String, then there's no way to reliably de-allocate it.
What are you trying to achieve by avoiding the copy?

Exception when using [] operator on a Data instance

This little Swift snippet crashes Xcode 9.2 playgrounds on the last assignment, event though buffer still holds 23 bytes. Any ideas why this is happening?
import Cocoa
var str = "01234567890123456789012345678901234567"
var buffer = Data()
if let data = str.data(using: .utf8) {
buffer.append(data)
buffer = buffer[15..<38]
buffer = buffer[2..<23]
}
Looks like this is caused by either an SDK bug or an compiler optimization, both trying to avoid unnecessary data copies (aka copy-on-write). Setting a breakpoint on the problematic line and inspecting the buffer reveals something like this:
So the buffer points to the same data storage, but a different slice, and trying to access outside the slice causes the crash.
Changing the problematic line to buffer = buffer[17..<38] makes the crash go away.
I don't know why subscripting by range results in a Data that might lead to crashes if used in another (possibly unrelated) parts of the code that don't know they are dealing with a slice. Other collections, like Array have dedicated slice types for which you can expect to be have various valid indices.
Here's a naive example to support the above thought:
func printFirstByte(of data: Data) {
print(data[0])
}
let str = "01234567890123456789012345678901234567"
if let data = str.data(using: .utf8) {
printFirstByte(of: data[15..<38]) // this call crashes
}
On the other hand, we do have a startIndex and an endIndex property on the Data type, however this just makes it harder to work with Data instances, because it's not obvious that we need to zero-base the Data slice before sending it to old code that doesn't know about this behaviour.

Why storing substrings may lead to memory leak in Swift?

On Apple's documentation on Substring, is says:
Don’t store substrings longer than you need them to perform a specific operation. A substring holds a reference to the entire storage of the string it comes from, not just to the portion it presents, even when there is no other reference to the original string. Storing substrings may, therefore, prolong the lifetime of string data that is no longer otherwise accessible, which can appear to be memory leakage.
I feel confused that String is a value type in Swift and how does it lead to memory leak?
Swift Arrays, Sets, Dictionaries and Strings have value semantics, but they're actually copy-on-write wrappers for reference types. In other words, they're all struct wrappers around a class. This allows the following to work without making a copy:
let foo = "ABCDEFG"
let bar = foo
When you write to a String, it uses the standard library function isUniquelyReferencedNonObjC (unless it's been renamed again) to check if there are multiple references to the backing object. If so, it creates a copy before modifying it.
var foo = "ABCDEFG"
var bar = foo // no copy (yet)
bar += "HIJK" // backing object copied to keep foo and bar independent
When you use a Substring (or array slice), you get a reference to the entire backing object rather than just the bit that you want. This means that if you have a very large string and you have a substring of just 4 characters, as long as the substring is live, you're holding the entire string backing buffer in memory. This is the leak that this warns you about.
Given the way Swift is often portrayed your confusion is understandable. Types such as String, Array and Dictionary present value semantics but are library types constructed from a combination of value and references types.
The implementation of these types use dynamically allocated storage. This storage can be shared between different values. However library facilities are used to implement copy-on-write so that such shared storage is copied as needed to maintain value semantics, that is behaviour like that of value types.
HTH

Array pass by value by default & thread-safety

Say I have a class which has an Array of object Photo:
class PhotoManager {
fileprivate var _photos: [Photo] = []
var photos: [Photo] {
return _photos
}
}
I read one article which says the following:
By default in Swift class instances are passed by reference and
structs passed by value. Swift’s built-in data types like Array and
Dictionary, are implemented as structs.
Meaning that the above getter returns a copy of [Photo] array.
Then, that same article tries to make the getter thread-safe by refactoring the code to:
fileprivate let concurrentPhotoQueue = DispatchQueue(label: "com.raywenderlich.GooglyPuff.photoQueue",
attributes: .concurrent)
fileprivate var _photos: [Photo] = []
var photos: [Photo] {
var photosCopy: [Photo]!
concurrentPhotoQueue.sync {
photosCopy = self._photos
}
return photosCopy
}
The above code explictly make a copy of self._photos in getter.
My questions are:
If by default swift already return an copy (pass by value) like the article said in the first place, why the article copy again to photosCopy to make it thread-safe? I feel myself do not fully understand these two parts mentioned in that article.
Does Swift3 really pass by value by default for Array instance like the article says?
Could someone please clarify it for me? Thanks!
I'll address your questions in reverse:
Does Swift3 really pass by value by default for Array instance like the article says?
Simple Answer: Yes
But I'm guessing that is not what your concern is when asking "Does Swift3 really pass by value". Swift behaves as if the array is copied in its entirety but behind the scenes it optimises the operation and the whole array is not copied until, and if, it needs to be. Swift uses an optimisation known as copy-on-write (COW).
However for the Swift programmer how the copy is done is not so important as the semantics of the operation - which is that after an assignment/copy the two arrays are independent and mutating one does not effect the other.
If by default swift already return an copy (pass by value) like the article said in the first place, why the article copy again to photosCopy to make it thread-safe? I feel myself do not fully understand these two parts mentioned in that article.
What this code is doing is insuring that the copy is done in a thread-safe way.
An array is not a trivial value, it is implemented as multi-field struct and some of those fields reference other structs and/or objects - this is needed to support an arrays ability to grow in size, etc.
In a multi-threaded system one thread could try to copy the array while another thread is trying to change the array. If these are allowed to happen at the same time then things easily can go wrong, e.g. the array could change while the copy is in progress, resulting in an invalid copy - part old value, part new value.
Swift per se is not thread safe; and in particular it will not prevent an array from being changed while a copy is being performed. The code you have addresses this by using a GCD queue so that during any alteration to the array by one thread all other writes or reads to the array in any other thread are blocked until the alteration is complete.
You might also be concerned that their are multiple copies going on here, self._photos to photoCopy, then photoCopy to the return value. While semantically this is what happens in practice there will probably only be one complex copy (and that will be thread safe) as the Swift system will optimise.
HTH
1) In code example what you provided will be returned copy of _photos.
As wrote in article:
The getter for this property is termed a read method as it’s reading
the mutable array. The caller gets a copy of the array and is protected
against mutating the original array inappropriately.
that's mean what you can access to _photos from outside of class, but you can't change them from there. Values of photos could be changed only inside class what make this array protected from it accidental changing.
2)Yes, Array is a value-type struct and it will be passed by value. You can easily check it in Playground
let arrayA = [1, 2, 3]
var arrayB = arrayA
arrayB[1] = 4 //change second value of arrayB
print(arrayA) //but arrayA didn't change
UPD #1
In article they have method func addPhoto(_ photo: Photo) what add new photo to _photos array what makes access to this property not thread-safe. That's mean what value of _photos could be changed on few thread in same time what will lead to issues.
They fixed it by writing photos on concurrentQueue with .barrier what make it thread-safely, _photos array will changed once per time
func addPhoto(_ photo: Photo) {
concurrentPhotoQueue.async(flags: .barrier) { // 1
self._photos.append(photo) // 2
DispatchQueue.main.async { // 3
self.postContentAddedNotification()
}
}
}
Now for ensure thread safety you need to read of _photos array on same queue. That's only reason why they refactored read method

If a function returns an UnsafeMutablePointer is it our responsibility to destroy and dealloc?

For example if I were to write this code:
var t = time_t()
time(&t)
let x = localtime(&t) // returns UnsafeMutablePointer<tm>
println("\(x.memory.tm_hour): \(x.memory.tm_min): \(x.memory.tm_sec)")
...would it also be necessary to also do the following?
x.destroy()
x.dealloc(1)
Or did we not allocate the memory and so therefore don't need to dismiss it?
Update #1:
If we imagine a function that returns an UnsafeMutablePointer:
func point() -> UnsafeMutablePointer<String> {
let a = UnsafeMutablePointer<String>.alloc(1)
a.initialize("Hello, world!")
return a
}
Calling this function would result in a pointer to an object that will never be destroyed unless we do the dirty work ourselves.
The question I'm asking here: Is a pointer received from a localtime() call any different?
The simulator and the playground both enable us to send one dealloc(1) call to the returned pointer, but should we be doing this or is the deallocation going to happen for a returned pointer by some other method at a later point?
At the moment I'm erring towards the assumption that we do need to destroy and dealloc.
Update #1.1:
The last assumption was wrong. I don't need to release, because I didn't create object.
Update #2:
I received some answers to the same query on the Apple dev forums.
In general, the answer to your question is yes. If you receive a pointer to memory which you would be responsible for freeing in C, then you are still responsible for freeing it when calling from swift ... [But] in this particular case you need do nothing. (JQ)
the routine itself maintains static memory for the result and you do not need to free them. (it would probably be a "bad thing" if you did) ... In general, you cannot know if you need to free up something pointed to by an UnsafePointer.... it depends on where that pointer obtains its value. (ST)
UnsafePointer's dealloc() is not compatible with free(). Pair alloc() with dealloc() and malloc and co. with free(). As pointed out previously, the function you're calling should tell you whether it's your response to free the result ... destroy() is only necessary if you have non-trivial content* in the memory referred to by the pointer, such as a strong reference or a Swift struct or enum. In general, if it came from C, you probably don't need to destroy() it. (In fact, you probably shouldn't destroy() it, because it wasn't initialized by Swift.) ... * "non-trivial content" is not an official Swift term. I'm using it by analogy with the C++ notion of "trivially copyable" (though not necessarily "trivial"). (STE)
Final Update:
I've now written a blogpost outlining my findings and assumptions with regard to the release of unsafe pointers taking onboard info from StackOverflow, Apple Dev Forums, Twitter and Apple's old documentation on allocating memory and releasing it, pre-ARC. See here.
From Swift library UnsafeMutablePointer<T>
A pointer to an object of type T. This type provides no automated
memory management, and therefore the user must take care to allocate
and free memory appropriately.
The pointer can be in one of the following states:
memory is not allocated (for example, pointer is null, or memory has
been deallocated previously);
memory is allocated, but value has not been initialized;
memory is allocated and value is initialized.
struct UnsafeMutablePointer<T> : RandomAccessIndexType, Hashable, NilLiteralConvertible { /**/}