how to get all instances of a given class/trait with scala reflect? all refs to a given instance? - scala

I know it's possible to get the members of a class, and of a given instance, but why is it hard to get all instances of a given class? Doesn't the JVM keep track of the instances of a class? This doesn't work in Java:
myInstance.getClass.getInstances()
Is this possible with the new scala reflect library? Are there possible workarounds?
Searched through the reflection scaladoc, on SO and google, but strangely couldn't find any info on this very obvious question.
I want to experiment/hack a hypergraph-database, inspired by hypergraphDB, querying the object graph directly, set aside serialization.
Furthermore, I'd need access to all references to a given object. Now this information certainly is there (GC), but is it accessible by reflection?
thanks
EDIT: this appears to be possible at least by "debugging" the JVM from another JVM, using com.sun.jdi.ReferenceType.instances

"Keeping track" of all instances of a class is hardly desirable, at least not by default. There's considerable cost to doing so and the mechanism must avoid hard references that would prevent reclaiming otherwise unreferenced instances. That means using one of the reference types and all the associated machinery involved.
Garbage Collection does not need to be class-aware. It only cares about whether instances are reachable or not.
That said, you can write code to track instantiations on a class-by-class basis. You'd have to use one of the reference classes in java.lang.ref to track them.

Related

What are the disadvantages of using records instead of classes?

C# 9 introduces record reference types. A record provides some synthesized methods like copy constructor, clone operation, hash codes calculation and comparison/equality operations. It seems to me convenient to use records instead of classes in general. Are there reasons no to do so?
It seems to me that currently Visual Studio as an editor does not support records as well as classes but this will probably change in the future.
Firstly, be aware that if it's possible for a class to contain circular references (which is true for most mutable classes) then many of the auto generated record members can StackOverflow. So that's a pretty good reason to not use records for everything.
So when should you use a record?
Use a record when an instance of a class is entirely defined by the public data it contains, and has no unique identity of it's own.
This means that the record is basically just an immutable bag of data. I don't really care about that particular instance of the record at all, other than that it provides a convenient way of grouping related bits of data together.
Why?
Consider the members a record generates:
Value Equality
Two instances of a record are considered equal if they have the same data (by default: if all fields are the same).
This is appropriate for classes with no behavior, which are just used as immutable bags of data. However this is rarely the case for classes which are mutable, or have behavior.
For example if a class is mutable, then two instances which happen to contain the same data shouldn't be considered equal, as that would imply that updating one would update the other, which is obviously false. Instead you should use reference equality for such objects.
Meanwhile if a class is an abstraction providing a service you have to think more carefully about what equality means, or if it's even relevant to your class. For example imagine a Crawler class which can crawl websites and return a list of pages. What would equality mean for such a class? You'd rarely have two instances of a Crawler, and if you did, why would you compare them?
with blocks
with blocks provides a convenient way to copy an object and update specific fields. However this is always safe if the object has no identity, as copying it doesn't lose any information. Copying a mutable class loses the identity of the original object, as updating the copy won't update the original. As such you have to consider whether this really makes sense for your class.
ToString
The generated ToString prints out the values of all public properties. If your class is entirely defined by the properties it contains, then this makes a lot of sense. However if your class is not, then that's not necessarily the information you are interested in. A Crawler for example may have no public fields at all, but the private fields are likely to be highly relevant to its behavior. You'll probably want to define ToString yourself for such classes.
All properties of a record are per default public
All properties of a record are per default immutable
By default, I mean when using the simple record definition syntax.
Also, records can only derive from records and you cannot derive a regular class from a record.

Are scala reflection API Names or Symbols adequate for use inside transfer objects?

Introduction
I am working on an API written in Scala. I use data transfer objects (DTOs) as parameters passed to the API's functions. The DTOs will be instanciated by the API's user.
As the API is pretty abstract / generic I want to specify the attributes of a object that the API should operate on. Example:
case class Person(name: String, birthdate: Date)
When an instance of Person "P" is passed to the API, the API needs to know the attributes of "P" it should operate on: either just name or birthdate, or both of them.
So I need to design a DTO that contains the instance of "P" itself, some kind of declaration of the attributes and maybe additional information on the type of "P".
String based approach
One way would be to use Strings to specify the attributes of "P" and maybe its type. This would be relatively simple, as Strings are pretty lightweight and well known. As there is a formal notation of packages, types and members as Strings, the declarations would structured to a certain degree.
On the other side, the String-declarations must be validated, because a user might pass invalid Strings. I could imagine types that represent the attributes with dedicated types instead of String, which may have the benefit of increased structure and maybe even those type are designed so that only valid instances can exist.
Reflection API approach
Of course the reflection API came to my mind and I am experimenting to declare the attributes with types out of the reflection API. Unfortunately the scala 2.10.x reflection API is a bit unintuitive. There are names, symbols, mirrors, types, typetags which can cause a bit of confusion.
Basically I see two alternatives to attribute declaration with Strings:
Attribute declaration with reflection API's "Names"
Attribute declaration with reflection API's "Symbols" (especially TermSymbol)
If I go this way, as far as I can see, the API's user, who constructs the DTOs, will have to deal with the reflection API and its Names / Symbols. Also the API's implementation will have to make use of the reflection API. So there are two places with reflective code and the user must have at least a little bit of knowledge of the reflection API.
Questions
However I don't know how heavyweight these approaches are:
Are Names or Symbols expensive to construct?
Does the reflection API do any caching of expensive operation results or should I take care about that?
Are Names and Symbols transferable to another JVM via network?
Are they serializable?
Main question: Are scala reflection API Names or Symbols adequate for use inside transfer objects?
It seems complicated to do this with the reflection API. Any hints are welcome. And any hints on other alternatives, too.
P.S.: I did not include my own code, yet, because my API is complex and the reflection part is in pretty experimental state. Maye I can deliver something useful later.
1a) Names are easy to construct and are lightweight, as they are just a bit more than strings.
1b) Symbols can't be constructed by the user, but are created internally when one resolves names using APIs like staticClass or member. First calls to such APIs usually involve unpacking type signatures of symbol's owners from ScalaSignature annotations, so they might be costly. Subsequent calls use already loaded signatures, but still pay the cost of a by-name lookup in a sort of a hashtable (1). declaration costs less than member, because declaration doesn't look into base classes.
2) Type signatures (e.g. lists of members of classes, params + return type of methods, etc) are loaded lazily and therefore are cached. Mappings between Java and Scala reflection artifacts are cached as well (2). To the best of my knowledge, the rest (e.g. subtyping checks) is generally uncached with a few minor exceptions.
3-4) Reflection artifacts depend on their universe and at the moment can't be serialized (3).

OCM or Nodes in JCR?

We are developing a CMS based on JCR/Sling/JSP/Felix/etc.
What I found so far is using Nodes are very straight forward and flexible. But my concern is over time it could become too hard to maintain and manage.
So, is it wise to invest in using a OCM? Would it be just an extra layer of complexity? What's the real benefit in OCM if there's any? Or it's better for us to stick to Nodes instead?
And lastly, is Jackrabbit OCM the best option for us if we are to go down that path?
Thank you.
In my personal experience I can say it severly depends on your situation if OCM is a useful tool for your project or not.
The real problem in using OCM (in my personal experience) is when the definition of a class used in existing persisted data (as objects) in the repository has changed. For example: you found it necessary to change some members and methods of a class to match with functionality changes. By this I mean that the class definition of the persisted data object in the repository no longer matches the definition of actual class. When a persisted data is saved to the jcr repository it is usually saved in a format that java understands in terms of serialization. Which means that when something changes to the definition of the used class, the saved data in the repository can no longer be correctly interpreted by java. This issue tends to lead to complex deployment where you need to convert old persisted data objects to the new definition and save them again in the repository to make sure you can still use "old" but still required persisted data.
What does work (in my opinion) is using a framework that allows to map nodes and node properties to java objects directly (for example by using annotations) and the other way around (persist a java object to the repository as a JCR node where the java member fields are actual node properties). This way you stick to the data representation of jcr (nodes with properties) and can still map them to the members of a java class.
I've used a framework like this in a cms called AEM (of Adobe) before, although I must mention this is in a OSGI context (but the prinicipe still stands). The used framework basically allowed maximum flexibility and persists the java object as a JCR node and the other way around. Because it mapped directly to the jcr definition, code changes in the class and members ment just changing annotations, and old persisted data was still usuable without much effort.

Consequences of Singletons

So I just delved into the Singleton classes and yes, I find them quite helpful. I use my singletons mostly for data storage for multiple targets (views, tables etc.). That being said, I can already see myself going to implement a lot of singletons in my project.
But can a lot of singletons have a negative impact? From what I've read about singletons is that you create one instance for each of them in a proces. Other class instances get released (assuming they get released properly) from memory, then should singletons be released too?
So to narrow it down to one question: Is it harmful to have a lot of singletons?
Singletons don't scale. No matter what you think should be a singleton, when your system gets bigger, it turns out you needed more than one.
If you NEVER need more than one, a singleton is fine. However, as systems scale, you typically need more than one of anything within its own context.
Singletons are merely another way to say "global". It's not bad, but generally, it's not a good idea for systems that evolve and grow in complexity.
From GOF Book:
The Singleton pattern has several benefits:
Controlled access to sole instance. Because the Singleton class encapsulates its sole instance, it can have strict control over how
and when clients access it.
Reduced name space. The Singleton pattern is an improvement over global variables. It avoids polluting the name space with global
variables that store sole instances.
Permits refinement of operations and representation. The Singleton class may be subclassed, and it's easy to configure an application
with an instance of this extended class. You can configure the
application with an instance of the class you need at run-time.
Permits a variable number of instances. The pattern makes it easy to change your mind and allow more than one instance of the Singleton
class. Moreover, you can use the same approach to control the number
of instances that the application uses. Only the operation that grants
access to the Singleton instance needs to change.
More flexible than class operations. Another way to package a singleton's functionality is to useThe Singleton class can be
subclassed. class operations (that is, static member functions in C++
or class methods in Smalltalk). But both of these language techniques
make it hard to change a design to allow more than one instance
ofclass. Moreover, static member functions in C++ are never virtual,
so subclasses can't override them polymorphically.

How do I implement a collection in Scala 2.8?

In trying to write an API I'm struggling with Scala's collections in 2.8(.0-beta1).
Basically what I need is to write something that:
adds functionality to immutable sets of a certain type
where all methods like filter and map return a collection of the same type without having to override everything (which is why I went for 2.8 in the first place)
where all collections you gain through those methods are constructed with the same parameters the original collection had (similar to how SortedSet hands through an ordering via implicits)
which is still a trait in itself, independent of any set implementations.
Additionally I want to define a default implementation, for example based on a HashSet. The companion object of the trait might use this default implementation. I'm not sure yet if I need the full power of builder factories to map my collection type to other collection types.
I read the paper on the redesign of the collections API but it seems like things have changed a bit since then and I'm missing some details in there. I've also digged through the collections source code but I'm not sure it's very consistent yet.
Ideally what I'd like to see is either a hands-on tutorial that tells me step-by-step just the bits that I need or an extensive description of all the details so I can judge myself which bits I need. I liked the chapter on object equality in "Programming in Scala". :-)
But I appreciate any pointers to documentation or examples that help me understand the new collections design better.
I'd have a look at the implementation of collection.immutable.BitSet. It's a bit spread out, reusing things from collection.BitSetLike and collection.generic.BitSetFactory. But it does exactly what you specified: implement an immutable set of a certain element type that adds new functionality.