Consider the following toy class:
class myGiantClass(){
val serializableElement = ...
// lots of other variables and methods here
}
// main program
val listOfGiantObjects: List[myGiantClass] = ....
What I need is to serialize/deserialize listOfGiantObjects. The issue is that myGiantClass contains lots of junk objects and variables which I don't/can't serialize/deserialize. Instead the only element of the myGiantClass that I want to serialize is serializableElement inside each object of listOfGiantObjects.
So after deserialize, listOfGiantObjects is expected to contain a bunch of myGiantClass objects which contain only serializableElement (the rest set to default).
Any ideas?
Of course there are two approaches (or defaults): all elements should be serialized by default, or none.
Within the "all" scenario, you could take a look at the #transient annotation, for marking fields that should not be serialized.
It may seem an unoptimal approach in case of a large number of elements that should not be serialized. However, it does communicate what you are trying to achieve. Moreover, you could arrange your code using composition or inner classes to better define the scope of serialization.
At last resort, ad-hoc serializaion with custom attributes is a way (e.g., to implement the none-by-default scenario).
Related
In C++ I would just compare the memory addresses of both objects. How would I do something similar in MATLAB?
Worst Case would be to have a static variable that iterates in each constructor and every object gets the current value as ID. But is there a better solution?
Thank you in advance.
#Edit:
I'd like to extend this question by assuming I have some given/not changeable classes inheriting handle and overloading eq. If I want to compare two objects of this class can I somehow cast both instances to handle and use the implementation of eq of the super class?
To test that two handle objects a and b refer to the same instance, you only need to use a == b. This is the same as eq(a, b). This is the defined behaviour of == for handle objects. I.e., for handle objects, == tests for equality of instances, not equality of the values within the instances. This is different from value objects.
For this to work you need to be using handle objects (classdef myObject < handle) because it doesn't make sense to test instances of value objects.
N.B. if you also need to get some kind of instance identifier for a handle object, then you need to do something like you describe using a persistent variable. Here's an example. In that case I would make that a base class for all your objects, so you wouldn't have to copy the same code into each class. But that's unnecessary if all you want to do is test two instances.
Scala case classes essentially capture a set of fields with helping methods.
How are case classes resolved? Are they expanded to different classes with fields or a generic class that contains, say, a HashMap<String, Field>?
If it's the latter, are case classes with single field more expensive than explicitly defined data objects?
Case classes are exactly the same as regular classes, except that they offer some additional convenience functions.
No, they are not backed by a map. What made you think that they are?
I want to change the format of my data, from RDD(Label:String,(ID:String,Data:Array[Double])) to an RDD Object with the label, id and data as components.
But when I print my RDD consecutively twice, the references of objects change :
class Data_Object(private val id:String, private var vector:Vector) extends Serializable {
var label = ""
...
}
First print
(1,ms3.Data_Object#35062c11)
(2,ms3.Data_Object#25789aa9)
Second print
(2,ms3.Data_Object#6bf5d886)
(1,ms3.Data_Object#a4eb65)
I think that explains why the subtract method doesn't work. So can I use subtract with objects as values, or do I return to my classic model ?
Unless you specify otherwise, objects in Scala (and Java) are compared using reference equality (i.e. their memory address). They are also printed out according to this address, hence the Data_Object#6bf5d886 and so on.
Using reference equality means that two Data_Object instances with identical properties will NOT compare as equal unless they are exactly the same object. Also, their references will change from one run to the next.
Particularly in a distributed system like Spark, this is no good - we need to be able to tell whether two objects in two different JVMs are the same or not, according to their properties. Until this is fixed, RDD operations like subtract will not give the results you expect.
Fortunately, this is usually easy to fix in Scala/Spark - define your class as a case class. This automatically generates equals and hashcode and toString methods derived from all of the properties of the class. For example:
case class Data_Object(id:String, label:String, vector:Vector)
If you want to compare your objects according to only some of the properties, you'll have to define your own equals and hashcode methods, though. See Programming in Scala, for example.
I just attended a Scala-lecture at a summer school. The lecturer got the following question:
- "Is there any way for the compiler to tell if a class is immutable?"
The lecturer responded
- "No, there isn't. It would be very nice if it could."
I was surprised. Isnt't it just to check if the class contains any var-members?
What is immutable?
Checking to see if the object only contains val fields is an overapproximation of immutability - the object may very well contain vars, but never assign different values in them. Or the segments of the program assigning values to vars may be unreachable.
According to the terminology of Chris Okasaki, there are immutable data structures and functional data structures.
An immutable data structure (or a class) is a data structure which, once constructed in memory, never changes its components and values - an example of this is a Scala tuple.
However, if you define the immutability of an object as the immutability of itself and all the objects reachable through references from the object, then a tuple may not be immutable - it depends on what you later instantiate it with. Sometimes there is not enough information about the program available at compile time to decide if a given data structure is immutable in the sense of containing only vals. And the information is missing due to polymorphism, whether parametric, subtyping or ad-hoc (type classes).
This is the first problem with deciding immutability - lack of static information.
A functional data structure is a data structure on which you can do operations whose outputs depend solely on the inputs for a given state. An example of such a data structure is a search tree which caches the last item looked up by storing it in a mutable field. Even though every lookup will write the last item searched into the mutable field, so that if the item is looked up again the search doesn't have to be repeated, the outputs of the lookup operation for such a data structure always remain the same given that nobody inserts new items into it. Another example of a functional data structure are splay trees.
In a general imperative programming model, to check if an operation is pure, that is - do the outputs depend solely on inputs, is undecidable. Again, one could use a technique such as abstract interpretation to provide a conservative answer, but this is not an exact answer to the question of purity.
This is the second problem with deciding if something having vars is immutable or functional (observably immutable) - undecidability.
I think the problem is that you need to ensure that all your vals don’t have any var members either. And this you cannot. Consider
class Base
case class Immutable extends Base { val immutable: Int = 0 }
case class Mutable extends Base { var mutable: Int = _ }
case class Immutable_?(b: Base)
Even though Immutable_?(Immutable) is indeed immutable, Immutable_?(Mutable) is not.
If you save a mutable object in a val the object itself is still mutable. So you would have to check if each class you use in a val is immutable.
case class Mut(var mut:Int)
val m = Mut(1)
println(m.toString)
m.mut = 3
println(m.toString)
In addition to what others have said, take a look at effect systems and discussion about supporting one in Scala.
It is not quite as easy since you could have vals that are linked to other mutable classes or, even harder to detect, that calls methods in other classes or objects that are mutable.
Also, you could very well have a immutable class that in fact has vars (to be more efficient for example...).
I guess you could have something that checks if a class looks like it is immutable or not though, but it sounds like it could be pretty confusing.
You can have a class, which can be instantiated to an object, and this object can be mutable or immutable.
Example: A class may contain a List[_], which, at runtime, can be a List[Int] or a List[StringBuffer]. So two different objects of a class could be either mutable, or immutable.
As we know, Scala generates getters and setters automatically for any public field and make the actual field variable private. Why is it better than just making the field public ?
For one this allows swapping a public var/val with a (couple of) def(s) and still maintain binary compatibility. Secondly it allows overriding a var/val in derived classes.
First, keeping the field public allows a client to read and write the field. Since it's beneficial to have immutable objects, I'd recommend to make the field read only (which you can achieve in Scala by declaring it as "val" rather than "var").
Now back to your actual question. Scala allows you to define your own setters and getters if you need more than the trivial versions. This is useful to maintain invariants. For setters you might want to check the value the field is set to. If you keep the field itself public, you have no chance to do so.
This is also useful for fields declared as "val". Assume you have a field of type Array[X] to represent the internal state of your class. A client could now get a reference to this array and modify it--again you have no chance to ensure the invariant is maintained. But since you can define your own getter you can return a copy of the actual array.
The same argument applies when you make a field of a reference type "final public" in Java--clients can't reset the reference but still modify the object the reference points to.
On a related note: accessing a field via getters in Scala looks like accessing the field directly. The nice thing about this is that it allows to make accessing a field and calling a method without parameters on the object look like the same thing. So if you decide you don't want to store a value in a field anymore but calculate it on the fly, the client does not have to care because it looks like the same thing to him--this is known as the Uniform Access Principle
In short: the Uniform Access Principle.
You can use a val to implement an abstract method from a superclass. Imagine the following definition from some imaginary graphics package:
abstract class circle {
def bounds: Rectangle
def centre: Point
def radius: Double
}
There are two possible subclasses, one where the circle is defined in terms of a bounding box, and one where it's defined in terms of the centre and radius. Thanks to the UAP, details of the implementation can be completely abstracted away, and easily changed.
There's also a third possibility: lazy vals. These would be very useful to avoid recalculating the bounds of our circle again and again, but it's hard to imagine how lazy vals could be implemented without the uniform access principle.