Theoretical difference between classes and types - class

This question has been asked on here a few times, but none of the replies really answered it in the more abstract, theoretical sense that I am looking for.
Most answers are something along the lines of "A class has implementations for methods that its objects can respond to, while a type just specifies which methods can be responded to".
Well, this seems kind of like an odd definition to me. Take ints, floats, and chars in a language like C. It may never be explicitly located in the code, but there are definitely methods built in to the language for responding to the messages ("plus", "minus", etc.) that these types receive.
And as all interfaces must have methods defined somewhere, it seems to me that types are the same thing as classes, except the word "class" carries a mental image of a more substantial programming structure than a "type".
Which leads to me to believe that the drawbacks that apply to any class-based language (the "expression problem" for example) would similarly apply to any language with types (Haskell, etc.)

There is no widely applicable, generally accepted definition of the term "class" that I'm aware of, not even wrt type systems. So your question pretty much depends on the context.
If you are talking about classes in object-oriented languages then the description you quote is relatively accurate. Types are specifications, descriptions (of objects or other values). Classes are implementations, definitions (of object factories).
However, in many OO languages, class definitions also introduce distinct type names, and these type names are often the only means to type objects. That's an unfortunate limitation and conflation of concepts, that also leads to the well-known confusion of subtyping and inheritance. At least some languages separate these concepts properly, e.g. Ocaml.
In any case, the reason why the distinction is seemingly at odds with ints and floats in C is simple: those are not objects. Despite what OO ideology tries to preach, not everything is an object, and certainly not in every language.

Simply put, a class will often have methods that manipulate the data contained within an instance. A type will not; it only is meant to hold and return data.
Although it is true that there may be methods specified somewhere for the type, there will only be one way to change the data contained within an instance of a type - storing a new value in it. The methods are generally along the lines of presenting the data in different ways, instead of actually manipulating the data.
This rule can, of course, be broken; C is full of examples, due to how it is structured (or, rather, not structured). Generally speaking, though, you don't want to have a type with a function that does fancy logic internally.

"Class" and "type" mean different things in different languages and environments; I will try to show here a synthesis that helps me think about the issue.
Classes have objects, and types have values. I think it is easier to understand the difference between objects and values, than between classes and types. An object has 2 independent properties: its identity, and its state/behaviour. So, you can have two different objects with the same class and state. This is not true for values: you cannot have 2 different values of a type that have the exact same state (or form, shape) and behaviour: you cannot have 2 "twoes". A value of a type does not have an identity independent of its state and behaviour.
Mixing both concepts together, you might say that a value of a given type does not necessarily have a class, but an object of a given class necessarily has a type, (e.g. object), and its value is given both by its state/beheviour and by its identity.
Haskell has types, and definable ones if I am correct. It is from Haskell that I am taking the "type" concept I am using. Python has classes and types mixed into the same "type" system, with some primitive types and rich definable classes. The concept of object that I am using is that of the type system of Python, minus its primitive types: int, str, etc.
Another key difference between types and classes would be in their definition. Types are tipically defined by a set of predicates or constraints that "give" all at once all of the values of the type. Therefore, you can use a literal value without first having to "create" it: 23438573. The definition of a class involves a procedure to create objects, and all objects of that class must be created before they are used.

Related

What are the disadvantages of using records instead of classes?

C# 9 introduces record reference types. A record provides some synthesized methods like copy constructor, clone operation, hash codes calculation and comparison/equality operations. It seems to me convenient to use records instead of classes in general. Are there reasons no to do so?
It seems to me that currently Visual Studio as an editor does not support records as well as classes but this will probably change in the future.
Firstly, be aware that if it's possible for a class to contain circular references (which is true for most mutable classes) then many of the auto generated record members can StackOverflow. So that's a pretty good reason to not use records for everything.
So when should you use a record?
Use a record when an instance of a class is entirely defined by the public data it contains, and has no unique identity of it's own.
This means that the record is basically just an immutable bag of data. I don't really care about that particular instance of the record at all, other than that it provides a convenient way of grouping related bits of data together.
Why?
Consider the members a record generates:
Value Equality
Two instances of a record are considered equal if they have the same data (by default: if all fields are the same).
This is appropriate for classes with no behavior, which are just used as immutable bags of data. However this is rarely the case for classes which are mutable, or have behavior.
For example if a class is mutable, then two instances which happen to contain the same data shouldn't be considered equal, as that would imply that updating one would update the other, which is obviously false. Instead you should use reference equality for such objects.
Meanwhile if a class is an abstraction providing a service you have to think more carefully about what equality means, or if it's even relevant to your class. For example imagine a Crawler class which can crawl websites and return a list of pages. What would equality mean for such a class? You'd rarely have two instances of a Crawler, and if you did, why would you compare them?
with blocks
with blocks provides a convenient way to copy an object and update specific fields. However this is always safe if the object has no identity, as copying it doesn't lose any information. Copying a mutable class loses the identity of the original object, as updating the copy won't update the original. As such you have to consider whether this really makes sense for your class.
ToString
The generated ToString prints out the values of all public properties. If your class is entirely defined by the properties it contains, then this makes a lot of sense. However if your class is not, then that's not necessarily the information you are interested in. A Crawler for example may have no public fields at all, but the private fields are likely to be highly relevant to its behavior. You'll probably want to define ToString yourself for such classes.
All properties of a record are per default public
All properties of a record are per default immutable
By default, I mean when using the simple record definition syntax.
Also, records can only derive from records and you cannot derive a regular class from a record.

When and why use anonymous class instead of stucts for simple objects

I read in this answer A generic list of anonymous class how to load a list with anonymous class objects. My question is why and when is recommendable to use this way instead of using a struct, considering performance and good practices.
An exposed-field structure is essentially a group of variables bound together with duct tape. It won't behave as an "object", and may thus be seen as evil who think everything should behave like an object; nonetheless, in cases where one doesn't really want an object, but rather a group of variables bound together with duct tape, an exposed-field structure may be a perfect fit.
Anonymous classes have only a few advantages over exposed-field structures:
The syntax to declare them is at least slightly smaller; depending upon coding standards, it may be a lot smaller. If coding standards will allow one to write internal struct WeightAndVolume { public double weight, volume;} and say that the struct is "self-explanatory" [it contains two public fields of type double, named weight and volume, each of which will hold whatever was last written to it by outside code], anonymous classes won't save much, but if coding standards would require that every named data type have many pages of associated documentation, including an analysis of required unit-test procedures, anonymous classes could avoid such hassle.
Copying class references is slightly cheaper than copying structures larger than 8 bytes, though unless a reference would be copied many times, the cost of creating the object will outweigh any savings in copying.
Casting an anonymous class to Object is much cheaper than casting a struct. The first time an anonymous class instance gets cast to Object will make up for the extra costs of creating it. Every additional time will represent a savings of that amount.
Passing a structure to a generic method will require the JITter to produce a specialized version of the code for that type; by contrast, the JITter would only have to produce one piece of code to handle all anonymous classes.
In general, structures will work better than anonymous classes. On the other hand, there are a few scenarios (mostly related to the third point above) where classes can end up being much better.
I wouldn't say it is ever recommended to use anonymous classes, in the sense that it's never wrong to not use them. But they typically get used when
it's an one-shot job, for which creating a proper named type would be cumbersome, and
the consumer of the objects is either compiler-generated code (you don't have access to the types backing those anonymous classes, but the compiler does) or uses reflection (in which case you don't need access to the types at compile time)
The most common scenario where this occurs is in LINQ queries.

Are scala reflection API Names or Symbols adequate for use inside transfer objects?

Introduction
I am working on an API written in Scala. I use data transfer objects (DTOs) as parameters passed to the API's functions. The DTOs will be instanciated by the API's user.
As the API is pretty abstract / generic I want to specify the attributes of a object that the API should operate on. Example:
case class Person(name: String, birthdate: Date)
When an instance of Person "P" is passed to the API, the API needs to know the attributes of "P" it should operate on: either just name or birthdate, or both of them.
So I need to design a DTO that contains the instance of "P" itself, some kind of declaration of the attributes and maybe additional information on the type of "P".
String based approach
One way would be to use Strings to specify the attributes of "P" and maybe its type. This would be relatively simple, as Strings are pretty lightweight and well known. As there is a formal notation of packages, types and members as Strings, the declarations would structured to a certain degree.
On the other side, the String-declarations must be validated, because a user might pass invalid Strings. I could imagine types that represent the attributes with dedicated types instead of String, which may have the benefit of increased structure and maybe even those type are designed so that only valid instances can exist.
Reflection API approach
Of course the reflection API came to my mind and I am experimenting to declare the attributes with types out of the reflection API. Unfortunately the scala 2.10.x reflection API is a bit unintuitive. There are names, symbols, mirrors, types, typetags which can cause a bit of confusion.
Basically I see two alternatives to attribute declaration with Strings:
Attribute declaration with reflection API's "Names"
Attribute declaration with reflection API's "Symbols" (especially TermSymbol)
If I go this way, as far as I can see, the API's user, who constructs the DTOs, will have to deal with the reflection API and its Names / Symbols. Also the API's implementation will have to make use of the reflection API. So there are two places with reflective code and the user must have at least a little bit of knowledge of the reflection API.
Questions
However I don't know how heavyweight these approaches are:
Are Names or Symbols expensive to construct?
Does the reflection API do any caching of expensive operation results or should I take care about that?
Are Names and Symbols transferable to another JVM via network?
Are they serializable?
Main question: Are scala reflection API Names or Symbols adequate for use inside transfer objects?
It seems complicated to do this with the reflection API. Any hints are welcome. And any hints on other alternatives, too.
P.S.: I did not include my own code, yet, because my API is complex and the reflection part is in pretty experimental state. Maye I can deliver something useful later.
1a) Names are easy to construct and are lightweight, as they are just a bit more than strings.
1b) Symbols can't be constructed by the user, but are created internally when one resolves names using APIs like staticClass or member. First calls to such APIs usually involve unpacking type signatures of symbol's owners from ScalaSignature annotations, so they might be costly. Subsequent calls use already loaded signatures, but still pay the cost of a by-name lookup in a sort of a hashtable (1). declaration costs less than member, because declaration doesn't look into base classes.
2) Type signatures (e.g. lists of members of classes, params + return type of methods, etc) are loaded lazily and therefore are cached. Mappings between Java and Scala reflection artifacts are cached as well (2). To the best of my knowledge, the rest (e.g. subtyping checks) is generally uncached with a few minor exceptions.
3-4) Reflection artifacts depend on their universe and at the moment can't be serialized (3).

Interface doubts

Are interfaces a layer between objects(different objects) and actions(different object types trying to perform same action)? and Interface checks what kind of object is it and how it can perform a particular action?
I'd say that it's better to think of an interface as a promise. In Java there is the interface construct that allows for inheritance of an API, but doesn't specify behavior. In general though, an interface is comprised of the methods an object presents for interacting with the object.
In duck-typed languages, if an object presents a particular set of methods (the interface) specific to a particular class, then that object is like the specifying class.
Enforcement of interface is complicated, since you need to specify some set of criteria for behavior. An interesting example would the design-by-contract ideas in Eiffel.
Are you asking about the term "interface" as used in a specific language (such as Java or Objective-C), or the generic meaning of the term?
If the latter, then an "interface" can be almost anything. Pour oil on water -- the line between them is an "interface". An interface is any point where two separate things meet and interact.
The term does not have a rigorous definition in computing, but refers to any place where two relatively distinct domains interact.
To understand interfaces in .net or Java, one must first recognize that inheritance combines two concepts:
Implementations of the derived type will include all fields (including private ones) of the base type, and can access any and all public or protected members of the base type as if it were its own.
Objects of the derived type may be freely used in place of objects of the base type.
Allowing objects to use members of more than one base type as their own is complicated. Some languages provide ways of doing so, but there can often be confusion as to which portion of which base object is being referred to, especially if one is inheriting from two classes which independently inherit from a third. Consequently, many frameworks only allow objects to inherit from one base object.
On the other hand, allowing objects to be substitutable for more than one other type of object does not create these difficulties. An object representing a database table may, for example, allow itself to be passed to a routine that wants a "thing that can enumerate contents, which are of type T (IEnumerable<T> in .net)", or a routine that wants a "thing that can have things of type T added to it" (ICollection<T> in .net), or a thing that wants a "thing that wants to know when it's no longer needed (IDisposable in .net)". Note that there are some things that want notification when they're no longer needed that do not represent enumerable collections, and there are other things that represent enumerable collections that can be abandoned without notification. Thus, neither type of object could inherit from the other, but if one uses an interface to represent "things which can enumerate their contents, which are of type T", or "things that want to know when they are no longer needed", then there's no problem having classes implement both interfaces.

NSDictionaries vs. custom objects with properties, what's your take?

I'm writing an App that basically uses 5 business entities, A, B C, D and E
A has some properties and holds a list of B's
B has some other properties and a list of C's and a list of D's
C has some other properties and a list of D's and a list of E's
D has only a few properties
E has only a few properties
There is no inheritance between any of them.
There's no real business logic involved, the objects are created, populated, and then accessed read-only, no further manipulations.
My natural coding style would be to go object oriented and write classes for each of those entities, use NSArrays for the lists, and have the mentioned properties synthesized.
It would make the code readable.
But another approach seems obvious too: only use NSDictionaries and NSArrays, and working with keys/values instead of properties. This seems more efficient, and somehow "closer" to iPhone-style programming to me... but obviously leads to less readable code. Another advantage is there's no additional custom encoding/decoding for serialization required (persisting state to disk, using JSON, ...)
So on the paper, it speaks for the latter approach, on the other hand, it still feels somehow awkward NOT to use custom objects...
Is this really just a matter of taste question? Or are there maybe other arguments in favour/against one of the approaches? Is only using Dictionaries better memory/performance-wise? Is it the preferred "Apple Coding Style"? (I'm coming from Java/C#).
I don't see much difference between Java/C# and Cocoa in this area. Your question is equivalently applicable to those platforms as well (the same also applies to key-value stores and relational stores).
In an object oriented environment, you have to make a trade-off between the flexibility of the key-value approach for storing data and the structured and object oriented style. I'd go with the key-value approach only when I need the flexibility (e.g. the structure is dynamic and might change by user or not known at compile time). Otherwise, taking that route might get you completely off the OOP conventions and benefits (By the way, this is the important point. Does the hassle of sticking to object oriented principles worth it for that specific circumstance? I think your question reduces to this one and to answer it, you should analyze your specific situation)
It largely depends on whether your objects are just collections of data (key/value pairs) or implement their own functionality.
If they're data I'd say go with NSDictionary, it's a lot less code and as you point out you won't have to write serialization routines for each class.
Use a hybrid approach. Store the dictionaries the objects are based on, but expose the most-used values as properties that are either filled when the object is initialized from a dictionary, or have the accessors look into the dictionary for values (less efficient).
Also provide a property to get at the dictionary. This way if you need to propagate a new value quickly to a specific area of the code from the dictionary (presumably a new value added by the server) you have that flexibility. Then if callers are making heavy use of a value you can migrate it to be a true property and get the completion and type checking of a property.