Are scala reflection API Names or Symbols adequate for use inside transfer objects? - scala

Introduction
I am working on an API written in Scala. I use data transfer objects (DTOs) as parameters passed to the API's functions. The DTOs will be instanciated by the API's user.
As the API is pretty abstract / generic I want to specify the attributes of a object that the API should operate on. Example:
case class Person(name: String, birthdate: Date)
When an instance of Person "P" is passed to the API, the API needs to know the attributes of "P" it should operate on: either just name or birthdate, or both of them.
So I need to design a DTO that contains the instance of "P" itself, some kind of declaration of the attributes and maybe additional information on the type of "P".
String based approach
One way would be to use Strings to specify the attributes of "P" and maybe its type. This would be relatively simple, as Strings are pretty lightweight and well known. As there is a formal notation of packages, types and members as Strings, the declarations would structured to a certain degree.
On the other side, the String-declarations must be validated, because a user might pass invalid Strings. I could imagine types that represent the attributes with dedicated types instead of String, which may have the benefit of increased structure and maybe even those type are designed so that only valid instances can exist.
Reflection API approach
Of course the reflection API came to my mind and I am experimenting to declare the attributes with types out of the reflection API. Unfortunately the scala 2.10.x reflection API is a bit unintuitive. There are names, symbols, mirrors, types, typetags which can cause a bit of confusion.
Basically I see two alternatives to attribute declaration with Strings:
Attribute declaration with reflection API's "Names"
Attribute declaration with reflection API's "Symbols" (especially TermSymbol)
If I go this way, as far as I can see, the API's user, who constructs the DTOs, will have to deal with the reflection API and its Names / Symbols. Also the API's implementation will have to make use of the reflection API. So there are two places with reflective code and the user must have at least a little bit of knowledge of the reflection API.
Questions
However I don't know how heavyweight these approaches are:
Are Names or Symbols expensive to construct?
Does the reflection API do any caching of expensive operation results or should I take care about that?
Are Names and Symbols transferable to another JVM via network?
Are they serializable?
Main question: Are scala reflection API Names or Symbols adequate for use inside transfer objects?
It seems complicated to do this with the reflection API. Any hints are welcome. And any hints on other alternatives, too.
P.S.: I did not include my own code, yet, because my API is complex and the reflection part is in pretty experimental state. Maye I can deliver something useful later.

1a) Names are easy to construct and are lightweight, as they are just a bit more than strings.
1b) Symbols can't be constructed by the user, but are created internally when one resolves names using APIs like staticClass or member. First calls to such APIs usually involve unpacking type signatures of symbol's owners from ScalaSignature annotations, so they might be costly. Subsequent calls use already loaded signatures, but still pay the cost of a by-name lookup in a sort of a hashtable (1). declaration costs less than member, because declaration doesn't look into base classes.
2) Type signatures (e.g. lists of members of classes, params + return type of methods, etc) are loaded lazily and therefore are cached. Mappings between Java and Scala reflection artifacts are cached as well (2). To the best of my knowledge, the rest (e.g. subtyping checks) is generally uncached with a few minor exceptions.
3-4) Reflection artifacts depend on their universe and at the moment can't be serialized (3).

Related

Why annotation based libraries are not so popular in Scala?

When I write Java code, I found annotation based libraries are very popular, e.g. hibernate, Jackson, Gson, Spring-MVC. But in Scala, most of the popular libraries are not providing annotations, or provided but recommend non-annotation approaches, e.g. squerly, slick, argonaut, unfiltered, etc.
Sometimes, I found the annotations are easier to read and maintain, but why people are not so interested in them?
One reason is that annotations often have to be used at declaration-site. Hence, you have to "pollute" your domain models with code not relevant to your business logic. Solutions based on macros or type classes on the other hand are usually applied on use-site. This allows higher reusability of your domain models.
E.g., what if you need different serialization logic for different tasks? With annotations you have usually no other choice than implementing an additional representation of your model with modified annotations. With type classes (probably automatically derived through macros), you have to just implement another instance and inject it accordingly to your needs.
Macros and implicits can often be used as a substitute for annotations and have the benefit of being statically checked.

When and why use anonymous class instead of stucts for simple objects

I read in this answer A generic list of anonymous class how to load a list with anonymous class objects. My question is why and when is recommendable to use this way instead of using a struct, considering performance and good practices.
An exposed-field structure is essentially a group of variables bound together with duct tape. It won't behave as an "object", and may thus be seen as evil who think everything should behave like an object; nonetheless, in cases where one doesn't really want an object, but rather a group of variables bound together with duct tape, an exposed-field structure may be a perfect fit.
Anonymous classes have only a few advantages over exposed-field structures:
The syntax to declare them is at least slightly smaller; depending upon coding standards, it may be a lot smaller. If coding standards will allow one to write internal struct WeightAndVolume { public double weight, volume;} and say that the struct is "self-explanatory" [it contains two public fields of type double, named weight and volume, each of which will hold whatever was last written to it by outside code], anonymous classes won't save much, but if coding standards would require that every named data type have many pages of associated documentation, including an analysis of required unit-test procedures, anonymous classes could avoid such hassle.
Copying class references is slightly cheaper than copying structures larger than 8 bytes, though unless a reference would be copied many times, the cost of creating the object will outweigh any savings in copying.
Casting an anonymous class to Object is much cheaper than casting a struct. The first time an anonymous class instance gets cast to Object will make up for the extra costs of creating it. Every additional time will represent a savings of that amount.
Passing a structure to a generic method will require the JITter to produce a specialized version of the code for that type; by contrast, the JITter would only have to produce one piece of code to handle all anonymous classes.
In general, structures will work better than anonymous classes. On the other hand, there are a few scenarios (mostly related to the third point above) where classes can end up being much better.
I wouldn't say it is ever recommended to use anonymous classes, in the sense that it's never wrong to not use them. But they typically get used when
it's an one-shot job, for which creating a proper named type would be cumbersome, and
the consumer of the objects is either compiler-generated code (you don't have access to the types backing those anonymous classes, but the compiler does) or uses reflection (in which case you don't need access to the types at compile time)
The most common scenario where this occurs is in LINQ queries.

Theoretical difference between classes and types

This question has been asked on here a few times, but none of the replies really answered it in the more abstract, theoretical sense that I am looking for.
Most answers are something along the lines of "A class has implementations for methods that its objects can respond to, while a type just specifies which methods can be responded to".
Well, this seems kind of like an odd definition to me. Take ints, floats, and chars in a language like C. It may never be explicitly located in the code, but there are definitely methods built in to the language for responding to the messages ("plus", "minus", etc.) that these types receive.
And as all interfaces must have methods defined somewhere, it seems to me that types are the same thing as classes, except the word "class" carries a mental image of a more substantial programming structure than a "type".
Which leads to me to believe that the drawbacks that apply to any class-based language (the "expression problem" for example) would similarly apply to any language with types (Haskell, etc.)
There is no widely applicable, generally accepted definition of the term "class" that I'm aware of, not even wrt type systems. So your question pretty much depends on the context.
If you are talking about classes in object-oriented languages then the description you quote is relatively accurate. Types are specifications, descriptions (of objects or other values). Classes are implementations, definitions (of object factories).
However, in many OO languages, class definitions also introduce distinct type names, and these type names are often the only means to type objects. That's an unfortunate limitation and conflation of concepts, that also leads to the well-known confusion of subtyping and inheritance. At least some languages separate these concepts properly, e.g. Ocaml.
In any case, the reason why the distinction is seemingly at odds with ints and floats in C is simple: those are not objects. Despite what OO ideology tries to preach, not everything is an object, and certainly not in every language.
Simply put, a class will often have methods that manipulate the data contained within an instance. A type will not; it only is meant to hold and return data.
Although it is true that there may be methods specified somewhere for the type, there will only be one way to change the data contained within an instance of a type - storing a new value in it. The methods are generally along the lines of presenting the data in different ways, instead of actually manipulating the data.
This rule can, of course, be broken; C is full of examples, due to how it is structured (or, rather, not structured). Generally speaking, though, you don't want to have a type with a function that does fancy logic internally.
"Class" and "type" mean different things in different languages and environments; I will try to show here a synthesis that helps me think about the issue.
Classes have objects, and types have values. I think it is easier to understand the difference between objects and values, than between classes and types. An object has 2 independent properties: its identity, and its state/behaviour. So, you can have two different objects with the same class and state. This is not true for values: you cannot have 2 different values of a type that have the exact same state (or form, shape) and behaviour: you cannot have 2 "twoes". A value of a type does not have an identity independent of its state and behaviour.
Mixing both concepts together, you might say that a value of a given type does not necessarily have a class, but an object of a given class necessarily has a type, (e.g. object), and its value is given both by its state/beheviour and by its identity.
Haskell has types, and definable ones if I am correct. It is from Haskell that I am taking the "type" concept I am using. Python has classes and types mixed into the same "type" system, with some primitive types and rich definable classes. The concept of object that I am using is that of the type system of Python, minus its primitive types: int, str, etc.
Another key difference between types and classes would be in their definition. Types are tipically defined by a set of predicates or constraints that "give" all at once all of the values of the type. Therefore, you can use a literal value without first having to "create" it: 23438573. The definition of a class involves a procedure to create objects, and all objects of that class must be created before they are used.

Use cases of Scala collection forwarders and proxies

Scala's collection library contains the forwarders IterableForwarder, TraversableForwarder, SeqForwarder and proxies like IterableProxy, MapProxy, SeqProxy, SetProxy, TraversableProxy, etc. Forwarders and proxies both delegate collection methods to an underlying collection object. The main difference between these two are that forwarders don't forward calls that would create new collection objects of the same kind.
In which cases would I prefer one of these types over the other? Why and when are forwarders useful? And if they are useful why are there no MapForwarder and SetForwarder?
I assume proxies are most often used if one wants to build a wrapper for a collection with additional methods or to pimp the standard collections.
I think this answer provides some context about Proxy in general (and your assumption about wrapper and pimping would be correct).
As far as I can tell the subtypes of Proxy are more targeted to end users. When using Proxy the proxy object and the self object will be equal for all intent and purposes. I think that's actually the main difference. Don't use Proxy if that assumption does not hold.
The Forwarder traits only seems to be used to support ListBuffer and may be more appropriate if one needs to roll out their own collection class built on top of the CanBuildFrom infrastructure. So I would say it's more targeted to library writers where the library is based on the 2.8 collection design.

How do I implement a collection in Scala 2.8?

In trying to write an API I'm struggling with Scala's collections in 2.8(.0-beta1).
Basically what I need is to write something that:
adds functionality to immutable sets of a certain type
where all methods like filter and map return a collection of the same type without having to override everything (which is why I went for 2.8 in the first place)
where all collections you gain through those methods are constructed with the same parameters the original collection had (similar to how SortedSet hands through an ordering via implicits)
which is still a trait in itself, independent of any set implementations.
Additionally I want to define a default implementation, for example based on a HashSet. The companion object of the trait might use this default implementation. I'm not sure yet if I need the full power of builder factories to map my collection type to other collection types.
I read the paper on the redesign of the collections API but it seems like things have changed a bit since then and I'm missing some details in there. I've also digged through the collections source code but I'm not sure it's very consistent yet.
Ideally what I'd like to see is either a hands-on tutorial that tells me step-by-step just the bits that I need or an extensive description of all the details so I can judge myself which bits I need. I liked the chapter on object equality in "Programming in Scala". :-)
But I appreciate any pointers to documentation or examples that help me understand the new collections design better.
I'd have a look at the implementation of collection.immutable.BitSet. It's a bit spread out, reusing things from collection.BitSetLike and collection.generic.BitSetFactory. But it does exactly what you specified: implement an immutable set of a certain element type that adds new functionality.