Recently, in version R2022b they announced the introduction of dictionaries.
I was under the impression that dictionaries were already available, provided by containers.Map. Are dictionaries just a different name mapped to containers.Map? Or are there other differences? I was unable to find anything comparing them online.
From what I can gather, after reading this blog post and the comments under it, and the documentation (I haven’t yet had a chance to experiment with them, so feel free to correct me if I’m wrong):
dictionary is an actual primitive type, like double, cell or struct. containers.Map is a “custom class”, even if nowadays the code is built-in, the functionality can never be as integrated as for a primitive type. Consequently, dictionary is significantly faster.
dictionary uses normal value semantics. If you make a copy you have two independent dictionaries (note MATLAB’s lazy copy mechanism). containers.Map is a handle class, meaning that all copies point to the same data, modifying one copy modifies them all.
containers.Map can use char arrays (the old string format) or numbers as keys (string is implicitly converted to char when used as key). dictionary can use any type, as long as it overloads keyhash. This means you can use your own custom class objects as keys.
dictionary is vectorized, you can look up multiple values at once. With a containers.Map you can look up multiple values using the values function, not the normal lookup syntax.
dictionary has actual O(1) lookup. If I remember correctly, containers.Map doesn’t.*
containers.Map can store any array as value, dictionary stores only scalars. The scalar can be a cell, which can contain any array, but this leads to awkward semantics, since retrieving the value retrieves the cell, not its contents.
* No, it is also O(1), at least in R2022b.
Related
print(hash('hello world'))
result :
6266945022561323786
What is the mathematical used by the python hash() function ?
The Python documentation makes no guarantee about the particular algorithm that is used by hash() (or more precisely, object.__hash__(self)).
The documentation only says this:
object.__hash__(self)
Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. The __hash__() method should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple. Example:
def __hash__(self):
return hash((self.name, self.nick, self.color))
The only thing that is guaranteed is this: "The only required property is that objects which compare equal have the same hash value".
There are a couple of desirable properties related to security, safety, and performance, but they are not required.
Every object can implement its own __hash__() however it wants, as long as it satisfies the property that two equal objects have the same hash value. And, in fact, many objects do provide their own implementations.
Even for built-in core objects such as strings, different implementations (and even different versions of different implementations) use different algorithms. CPython even uses a different seed value every time you run it (again, for security reasons).
So, the answer is: you can't know what the algorithm is. All you know is that if a.__eq__(b) is True, then a.__hash__().__eq__(b.__hash__()) is also True.
Most importantly, there is no guarantee that a.__hash__().__eq__(b.__hash__()) being True implies a and b are equal, nor does a.__eq__(b) being False imply that a.__hash__().__eq__(b.__hash__()) is False.
Does Swift have an ordered set type? And if not, what are my options if I want to use one?
The standard library's Set is unordered, as is made clear in the documentation:
Arrays are ordered collections of values. Sets are unordered collections of unique values. Dictionaries are unordered collections of key-value associations.
However, many data structures suitable for implementing ordered sets (and dictionaries) are known, in particular balanced binary trees such as Red-Black trees.
As an example of this, c++'s stl has ordered sets and maps, and allows range queries on them using lower and upper bounds.
I know that a set's members can be sorted in to an array, but I am after a data structure with O(log(n)) insertion, removal and query.
Swift does not have a native ordered set type. If you use Foundation, you can use NSOrderedSet in Swift. If not, you have the opportunity to write your own ordered set data structure.
Update: Swift Package Manager includes an OrderedSet implementation that may be useful. It wraps both an array and a set and manages access to get ordered set behavior.
Update #2: Apple's Swift Collections repository contains an ordered set implementation.
On April 6th, 2021, a new package of Swift was released: Swift-Collection where three more data structures have been implemented. (OrderedSet, OrderedDictionary, Deque)
However, this package is in its pre-1.0 release state. As a result, it might not be stable.
Swift blog: Release of Swift Collections
At the time being there is no ordered set in Swift. Despite using NSOrderedSet on all Apple platforms, you can simply combine a Set with an Array to basically get the same effect. The Set is used to avoid duplicate entries, the Array is used to store the order. Before adding a new element to the Array, check if it is in the Set already. When removing an element from the Array, also remove it from the Set. To check if an element exists, ask the Set, it's faster. To retrieve an element by index, use the Array. You can change the order of elements in the Array (e.g. resort it) without having to touch the Set at all. To iterate over all elements, use the Array as that way the elements are iterated in order.
We're encouraged to use struct over class in Swift.
This is because
The compiler can do a lot of optimizations
Instances are created on the stack which is a lot more performant than malloc/free calls
The downside to struct variables is that they are copied each time when returning from or assigned to a function. Obviously, this can become a bottleneck too.
E.g. imagine a 4x4 matrix. 16 Float values would have to be copied on every assign/return which would be 1'024 bits on a 64 bit system.
One way you can avoid this is using inout when passing variables to functions, which is basically Swifts way of creating a pointer. But then we're also discouraged from using inout.
So to my question:
How should I handle large, immutable data structures in Swift?
Do I have to worry creating a large struct with many members?
If yes, when am I crossing the line?
This accepted answer is not entirely answering the question you had: Swift always copies structs. The trick that Array/Dictionary/String/etc do is that they are just wrappers around classes (which contain the actual stored properties). That way sizeof(Array) is just the size of the pointer to that class (MemoryLayout<Array<String>>.stride == MemoryLayout<UnsafeRawPointer>.stride)
If you have a really big struct, you might want to consider wrapping its stored properties in a class for efficient passing around as arguments, and checking isUniquelyReferenced before mutating to give COW semantics.
Structs have other efficiency benefits: they don't need reference-counting and can be decomposed by the optimiser.
In Swift, values keep a unique copy of their data. There are several
advantages to using value-types, like ensuring that values have
independent state. When we copy values (the effect of assignment,
initialization, and argument passing) the program will create a new
copy of the value. For some large values these copies could be time
consuming and hurt the performance of the program.
https://github.com/apple/swift/blob/master/docs/OptimizationTips.rst#the-cost-of-large-swift-values
Also the section on container types:
Keep in mind that there is a trade-off between using large value types
and using reference types. In certain cases, the overhead of copying
and moving around large value types will outweigh the cost of removing
the bridging and retain/release overhead.
From the very bottom of this page from the Swift Reference:
NOTE
The description above refers to the “copying” of strings, arrays, and dictionaries. The behavior you see in your code will always be as if a copy took place. However, Swift only performs an actual copy behind the scenes when it is absolutely necessary to do so. Swift manages all value copying to ensure optimal performance, and you should not avoid assignment to try to preempt this optimization.
I hope this answers your question, also if you want to be sure that an array doesn't get copied, you can always declare the parameter as inout, and pass it with &array into the function.
Also classes add a lot of overhead and should only be used if you really must have a reference to the same object.
Examples for structs:
Timezone
Latitude/Longitude
Size/Weight
Examples for classes:
Person
A View
Here's an example of two different dictionaries, yet they return the same hash code. Why?
https://gist.github.com/837861
(They aren't the same object)
Hashes aren't guaranteed to be distinct for distinct objects. In fact, hash collisions will happen. The only two properties the -hash method is supposed to guarantee are (both taken from the documentation):
If two objects are equal (as determined by the isEqual: method), they must have the same hash value.
If a mutable object is added to a collection that uses hash values to determine the object’s position in the collection, the value returned by the hash method of the object must not change while the object is in the collection.
If you look here, you can see that the hash implementation on dictionaries simply returns the count and is likely the reason why you're getting the same code:
https://stackoverflow.com/a/11984624/59198
While Adding the data into the collection, which is better practice to use, and what is performance Impact if we use Dictionary vs ArrayList and Why?
You should actually not use ArrayList at all, as you have the strongly typed List<T> to use.
Which you use depends on how you need to access the data. The List stores a sequential list of items, while a Dictionary stores items identified by a key. (You can still read the items from the Dictionary sequentially, but the order is not preserved.)
The performance is pretty much the same, both uses arrays internally to store the actual data. When they reach their capacity they allocate a new larger array and copies the data to it. If you know how large the collection will get, you should specify the capacity when you create it, so that it doesn't have to resize itself.
They are not interchangeable classes. Apples and oranges. If you intend to look up items in the collection by a key, use Dictionary. Otherwise, use ArrayList (or preferably List<T>)