Why are Swift classes and structures set up the way they are?

Why are Swift classes and structures set up the way they are? - swift

In Swift, classes have inheritance and structures don't. At the same time, class instances live in the heap while structure instances live on the stack.
This means that only two of four possibilities are allowed: You can have things that support inheritance and which live in the heap, or you can have things that do not support inheritance and which live on the stack. You cannot have things that support inheritance and which live on the stack, and you cannot have things which do not support inheritance and which live on the heap.
Why is this? What makes these latter two possibilities undesirable?
For an encore, why do structures get a free initializer, but classes don't?

This is a reasonable design regarding value-type vs. reference-type.
Class as a reference-type, its inheritance is corresponding to identity. while Struct as a value-type have no identity except for the value on top of the fields they contain, and thus can be freely copied. If you try to add sub-fields to a "inheriting struct", you change the fields, or values, of that struct, and there're no "is-a" relation between them anymore like Class inheritance, since it makes no sense to say that a value-type with fewer fields "is-a" another value-type.
The sub-fields added by the "inheriting struct" could be arbitrarily lost when copying (For example, C++ invoked copy constructors when it happened) and make characteristics like compatibility lose their significance. Class on the other hand don't have these problems, since each instance of a Class has a unique identity, and only references to that instance are passed around.
Let's take a point on a 2D-plane, which is a Struct containing two fields: x and y, for example. Said there's another sub-struct which represented a point on a 3D-world, which had a sub-field z.
When we do like:
point2D = point3D;
What should happen on the assignment line then? since the memory occupied by the point2D is already fixed?
As of your question, it's directly related to the characteristics of heap and stack themselves, when a program entered a function, or a local variable is allocated, they are pushed onto the current stack frame, which is of fixed size, and are poped/deallocated later as the function is exited, this makes it relatively easier to be optimized for the CPU.
The heap, on the other hand, had additional complexity since programmers are able to directly interact with it with commands like malloc or free - resulting in additional request and release times - which had more cost of time; they may require additional memory upon initialization - for the fields that may or may not contain value at the beginning - which had more cost of spaces, etc. Messing these characteristics up would just make optimization more complicated.

Related

Optimizing lazy collections

This question is about optimizing lazy collections. I will first explain the problem and then give some thoughts for a possible solution. Questions are in bold.
Problem
Swift expects operations on Collections to be O(1). Some operations, especially prefix and suffix-like types, deviate and are on the order of O(n) or higher.
Lazy collections can't iterate through the base collection during initialization since computation should be deferred for as long as possible until the value is actually needed.
So, how can we optimize lazy collections? And of course this begs the question, what constitutes an optimized lazy collection?
Thoughts
The most obvious solution is caching. This means that the first call to a collection's method has an unfavourable time complexity, but subsequent calls to the same or other methods can possibly be computed in O(1). We trade some space complexity to the order of O(n) for faster computation.
Attempting to optimize lazy collections on structs by using caching is impossible since subscript(_ position:) and all other methods that you'd need to implement to conform to LazyProtocolCollection are non-mutating and structs are immutable by default. This means that we have to recompute all operations for every call to a property or method.
This leaves us with classes. Classes are mutable, meaning that all computed properties and methods can internally mutate state. When we use classes to optimize a lazy collection we have two options. First, if the properties of the lazy type are variables then we're bringing ourselves into a world of hurt. If we change a property it could potentially invalidate previously cached results. I can imagine managing the code paths to make properties mutable to be headache inducing. Second, if we use lets we're good; the state set during initialization can't be changed so a cached result doesn't need to be updated. Note that we're only talking about lazy collections with pure methods without side effects here.
But classes are reference types. What are the downsides of using reference types for lazy collections? The Swift standard library doesn't use them for starters.
Any thoughts or thoughts on different approaches?

I completely agree with Alexander here. If you're storing lazy collections, you're generally doing something wrong, and the cost of repeated accesses is going to constantly surprise you.
These collections already blow up their complexity requirements, it's true:
Note: The performance of accessing startIndex, first, or any methods that depend on startIndex depends on how many elements satisfy the predicate at the start of the collection, and may not offer the usual performance given by the Collection protocol. Be aware, therefore, that general operations on LazyDropWhileCollection instances may not have the documented complexity.
But caching won't fix that. They'll still be O(n) on the first access, so a loop like
for i in 0..<xs.count { print(xs[i]) }
is still O(n^2). Also remember that O(1) and "fast" are not the same thing. It feels like you're trying to get to "fast" but that doesn't fix the complexity promise (that said, lazy structures are already breaking their complexity promises in Swift).
Caching is a net-negative because it makes the normal (and expected) use of lazy data structures slower. The normal way to use lazy data structures is to consume them either zero or one times. If you were going to consume them more than one time, you should use a strict data structure. Caching something that you never use is a waste of time and space.
There are certainly conceivable use cases where you have a large data structure that will be sparsely accessed multiple times, and so caching would be useful, but this isn't the use case lazy was built to handle.
Attempting to optimize lazy collections on structs by using caching is impossible since subscript(_ position:) and all other methods that you'd need to implement to conform to LazyProtocolCollection are non-mutating and structs are immutable by default. This means that we have to recompute all operations for every call to a property or method.
This isn't true. A struct can internally store a reference type to hold its cache and this is common. Strings do exactly this. They include a StringBuffer which is a reference type (for reasons related to a Swift compiler bug, StringBuffer is actually implemented as a struct that wraps a class, but conceptually it is a reference type). Lots of value types in Swift store internal buffer classes this way, which allows them to be internally mutable while presenting an immutable interface. (It's also important for CoW and lots of other performance and memory related reasons.)
Note that adding caching today would also break existing use cases of lazy:
struct Massive {
let id: Int
// Lots of data, but rarely needed.
}
// We have lots of items that we look at occassionally
let ids = 0..<10_000_000
// `massives` is lazy. When we ask for something it creates it, but when we're
// done with it, it's thrown away. If `lazy` forced caching, then everything
// we accessed would be forever. Also, if the values in `Massive` change over
// time, I certainly may want it to be rebuilt at this point and not cached.
let massives = ids.lazy.map(Massive.init)
let aMassive = massives[10]
This isn't to say a caching data structure wouldn't be useful in some cases, but it certainly isn't always a win. It imposes a lot of costs and breaks some uses while helping others. So if you want those other use cases, you should build a data structure that provides them. But it's reasonable that lazy is not that tool.

Swift's lazy collections are intended to provide one off access to elements. Subsequent access cause redundant computation (e.g. a lazy map sequence would recompute the transform closure.
In the case where you want repeated access to elements, it's best to just slice the portion of the lazy sequence/collection you care about, and create a proper Collection (e.g. an Array) out of it.
The book keeping overhead of lazily evaluating and caching each element would probably be greater than the benefits.

When and why use anonymous class instead of stucts for simple objects

I read in this answer A generic list of anonymous class how to load a list with anonymous class objects. My question is why and when is recommendable to use this way instead of using a struct, considering performance and good practices.

An exposed-field structure is essentially a group of variables bound together with duct tape. It won't behave as an "object", and may thus be seen as evil who think everything should behave like an object; nonetheless, in cases where one doesn't really want an object, but rather a group of variables bound together with duct tape, an exposed-field structure may be a perfect fit.
Anonymous classes have only a few advantages over exposed-field structures:
The syntax to declare them is at least slightly smaller; depending upon coding standards, it may be a lot smaller. If coding standards will allow one to write internal struct WeightAndVolume { public double weight, volume;} and say that the struct is "self-explanatory" [it contains two public fields of type double, named weight and volume, each of which will hold whatever was last written to it by outside code], anonymous classes won't save much, but if coding standards would require that every named data type have many pages of associated documentation, including an analysis of required unit-test procedures, anonymous classes could avoid such hassle.
Copying class references is slightly cheaper than copying structures larger than 8 bytes, though unless a reference would be copied many times, the cost of creating the object will outweigh any savings in copying.
Casting an anonymous class to Object is much cheaper than casting a struct. The first time an anonymous class instance gets cast to Object will make up for the extra costs of creating it. Every additional time will represent a savings of that amount.
Passing a structure to a generic method will require the JITter to produce a specialized version of the code for that type; by contrast, the JITter would only have to produce one piece of code to handle all anonymous classes.
In general, structures will work better than anonymous classes. On the other hand, there are a few scenarios (mostly related to the third point above) where classes can end up being much better.

I wouldn't say it is ever recommended to use anonymous classes, in the sense that it's never wrong to not use them. But they typically get used when
it's an one-shot job, for which creating a proper named type would be cumbersome, and
the consumer of the objects is either compiler-generated code (you don't have access to the types backing those anonymous classes, but the compiler does) or uses reflection (in which case you don't need access to the types at compile time)
The most common scenario where this occurs is in LINQ queries.

Theoretical difference between classes and types

This question has been asked on here a few times, but none of the replies really answered it in the more abstract, theoretical sense that I am looking for.
Most answers are something along the lines of "A class has implementations for methods that its objects can respond to, while a type just specifies which methods can be responded to".
Well, this seems kind of like an odd definition to me. Take ints, floats, and chars in a language like C. It may never be explicitly located in the code, but there are definitely methods built in to the language for responding to the messages ("plus", "minus", etc.) that these types receive.
And as all interfaces must have methods defined somewhere, it seems to me that types are the same thing as classes, except the word "class" carries a mental image of a more substantial programming structure than a "type".
Which leads to me to believe that the drawbacks that apply to any class-based language (the "expression problem" for example) would similarly apply to any language with types (Haskell, etc.)

There is no widely applicable, generally accepted definition of the term "class" that I'm aware of, not even wrt type systems. So your question pretty much depends on the context.
If you are talking about classes in object-oriented languages then the description you quote is relatively accurate. Types are specifications, descriptions (of objects or other values). Classes are implementations, definitions (of object factories).
However, in many OO languages, class definitions also introduce distinct type names, and these type names are often the only means to type objects. That's an unfortunate limitation and conflation of concepts, that also leads to the well-known confusion of subtyping and inheritance. At least some languages separate these concepts properly, e.g. Ocaml.
In any case, the reason why the distinction is seemingly at odds with ints and floats in C is simple: those are not objects. Despite what OO ideology tries to preach, not everything is an object, and certainly not in every language.

Simply put, a class will often have methods that manipulate the data contained within an instance. A type will not; it only is meant to hold and return data.
Although it is true that there may be methods specified somewhere for the type, there will only be one way to change the data contained within an instance of a type - storing a new value in it. The methods are generally along the lines of presenting the data in different ways, instead of actually manipulating the data.
This rule can, of course, be broken; C is full of examples, due to how it is structured (or, rather, not structured). Generally speaking, though, you don't want to have a type with a function that does fancy logic internally.

"Class" and "type" mean different things in different languages and environments; I will try to show here a synthesis that helps me think about the issue.
Classes have objects, and types have values. I think it is easier to understand the difference between objects and values, than between classes and types. An object has 2 independent properties: its identity, and its state/behaviour. So, you can have two different objects with the same class and state. This is not true for values: you cannot have 2 different values of a type that have the exact same state (or form, shape) and behaviour: you cannot have 2 "twoes". A value of a type does not have an identity independent of its state and behaviour.
Mixing both concepts together, you might say that a value of a given type does not necessarily have a class, but an object of a given class necessarily has a type, (e.g. object), and its value is given both by its state/beheviour and by its identity.
Haskell has types, and definable ones if I am correct. It is from Haskell that I am taking the "type" concept I am using. Python has classes and types mixed into the same "type" system, with some primitive types and rich definable classes. The concept of object that I am using is that of the type system of Python, minus its primitive types: int, str, etc.
Another key difference between types and classes would be in their definition. Types are tipically defined by a set of predicates or constraints that "give" all at once all of the values of the type. Therefore, you can use a literal value without first having to "create" it: 23438573. The definition of a class involves a procedure to create objects, and all objects of that class must be created before they are used.

why most of the objects we create in iphone are pointers

why most of the objects we create in iphone are pointers..?
like i create NSString *str, NSMutableDictionary *dict.. etc

Short Answer
Because Objective-C objects can only be allocated on the heap and manipulated as pointers.
Long Answer
Objective-C requires that classes be allocated on the heap and manipulated as pointers, because polymorphism requires the use of pointers, since the pointer to an interface will always have the same size, while different implementations of the interface may have different sizes. In C++, one can use both automatic (stack) and dynamic (heap) storage for classes; however, in the former case, one must beware of slicing (when a derived type is assigned to a base type, resulting in the object losing the content that makes it the derived type instead of the base type), and using pointers only as in Obj-C eliminates this potential pitfall. Additionally, allowing stack-allocated objects complicates the reference counting scheme that Objective-C has in place, and since stack-allocated objects live only in the scope in which they are created, one usually allocates objects on the heap, anyway, and so there would be marginal benefit in supporting objects as stack-allocated values. As a side note, I should also mention that both in Java and in C#, objects are similarly constrained to heap only allocation.

NSDictionaries vs. custom objects with properties, what's your take?

I'm writing an App that basically uses 5 business entities, A, B C, D and E
A has some properties and holds a list of B's
B has some other properties and a list of C's and a list of D's
C has some other properties and a list of D's and a list of E's
D has only a few properties
E has only a few properties
There is no inheritance between any of them.
There's no real business logic involved, the objects are created, populated, and then accessed read-only, no further manipulations.
My natural coding style would be to go object oriented and write classes for each of those entities, use NSArrays for the lists, and have the mentioned properties synthesized.
It would make the code readable.
But another approach seems obvious too: only use NSDictionaries and NSArrays, and working with keys/values instead of properties. This seems more efficient, and somehow "closer" to iPhone-style programming to me... but obviously leads to less readable code. Another advantage is there's no additional custom encoding/decoding for serialization required (persisting state to disk, using JSON, ...)
So on the paper, it speaks for the latter approach, on the other hand, it still feels somehow awkward NOT to use custom objects...
Is this really just a matter of taste question? Or are there maybe other arguments in favour/against one of the approaches? Is only using Dictionaries better memory/performance-wise? Is it the preferred "Apple Coding Style"? (I'm coming from Java/C#).

I don't see much difference between Java/C# and Cocoa in this area. Your question is equivalently applicable to those platforms as well (the same also applies to key-value stores and relational stores).
In an object oriented environment, you have to make a trade-off between the flexibility of the key-value approach for storing data and the structured and object oriented style. I'd go with the key-value approach only when I need the flexibility (e.g. the structure is dynamic and might change by user or not known at compile time). Otherwise, taking that route might get you completely off the OOP conventions and benefits (By the way, this is the important point. Does the hassle of sticking to object oriented principles worth it for that specific circumstance? I think your question reduces to this one and to answer it, you should analyze your specific situation)

It largely depends on whether your objects are just collections of data (key/value pairs) or implement their own functionality.
If they're data I'd say go with NSDictionary, it's a lot less code and as you point out you won't have to write serialization routines for each class.

Use a hybrid approach. Store the dictionaries the objects are based on, but expose the most-used values as properties that are either filled when the object is initialized from a dictionary, or have the accessors look into the dictionary for values (less efficient).
Also provide a property to get at the dictionary. This way if you need to propagate a new value quickly to a specific area of the code from the dictionary (presumably a new value added by the server) you have that flexibility. Then if callers are making heavy use of a value you can migrate it to be a true property and get the completion and type checking of a property.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse