I'd like to serialization in Scala -- I've seen the likes of sjson and the #serializable annotation -- however, I have been unable to see how to get them to deal with 1 major hurdle -- Type Erasure and Generics in Libraries.
Take for example the Graph for Scala Library. I make heavy use of it in my code and would like to write several objects holding graphs to disk throughout my code for later analysis. However, many times the node and edge types are encapsulated in generic type arguments of another class I have. How can I properly serialize these classes without either modifying the library itself to deal with reflection or "dirtying" my code by importing a large number of Type Classes (serialization according to how an object is being viewed is wholly unsatisfying anyways...)?
Example,
class Container[N](val g: Graph[N,DiEdge]) {
...
}
// in another file
def myMethod[N](container: Container[N]): Unit = {
<serialize container somehow here>
}
To report on my findings, Java's XStream does a phenomenal job -- anything and everything, generics or otherwise, can be automatically serialized without any additional input. If you need a quick and no-work way to get serialization going, XStream is it!
However, it should be noted that the output XML will not be particularly concise without your own input. For example, every memory block used by Scala's HashMap will be recorded, even if most of them don't contain anything!
If you are using Graphs for Scala and if JSON is your serialization format, you can directly use graph-json.
Here is the code and the doc.
Related
I want to use kryo to serialize and deserialize a hierarchy of classes, like this:
case class Apple(bananas: Map[String, Banana], color: Option[String])
case class Banana(cherries: Seq[Cherry], countryOfOrigin: String)
case class Cherry(name: Option[String], age: Int, isTomato: Boolean)
Sometimes I want to add and remove fields somewhere in this hierarchy, e.g. to Cherry.
I would like to write a unit test which looks at the type hierarchy starting at Apple and concludes that data previously serialized with kryo will not deserialize properly—i.e. the deserialized object would not be == to the serialized object, if I could have both in memory simultaneously.
In that case, I can update a namespace key in my Redis cache, forget all the old data and rebuild it from scratch. I just need an automated reminder so that I'll remember to do this when I need to.
Some false positives are acceptable; false negatives are not. I'm happy to hardcode something like a serial version UID into my test case and update it whenever I change the underlying class hierarchy. It's acceptable if the test only works on DAG-shaped hierarchies, but handling cycles is definitely welcome.
Is there some way of computing the bit I want by using e.g. the TypeTag machinery to walk a description of the type hierarchy? Exactly which aspects of source type declaration does kryo compatibility depend on, and how do I plop out a representation of those features using e.g. TypeTag?
I use io.altoo.akka.serialization.kryo.KryoSerializer to (de)serialize, see https://github.com/altoo-ag/akka-kryo-serialization.
One trick I've used in this area is to check in samples (ScalaCheck and its generators may prove useful here) of data serialized with "important" versions of the old serialization. Then you write tests that literally check that the new serialization properly deserializes.
You may run into a developer under pressure to get a change in who makes the deserialization test green by changing the serialized data (this happened to me). You can address that by checking in the checksums of the serialized test data and validating them at the start of CI: changing those checksums should be pretty apparent in review that something questionable is going on.
I suspect that this approach will have a somewhat better return-on-effort than the alternative of reimplementing a portion of kryo's type system and figuring out a way to serialize a representation of that type system for comparison against future versions of the code.
I've recently started programming in Scala, coming from Python and Java I was wondering what the correct way or the accepted way is when defining objects/classes in Scala. Scala supports, just like python, to add several class or object definitions in a single file.
So purely from an accepted structure perspective, does every object need to be defined in its own file or are you allowed to choose this yourself?
There is a chapter in the official Scala Style Guide on this. It's pretty clear in itself, but I'll just leave some quotes here.
The core idea is:
As a rule, files should contain a single logical compilation unit. By “logical” I mean a class, trait or object.
There is, of course, an exception for companion objects:
One exception to this guideline is for classes or traits which have companion objects. Companion objects should be grouped with their corresponding class or trait in the same file.
There is also the fact that sealed only works within the same file.
Despite what was said above, there are some important situations which warrant the inclusion of multiple compilation units within a single file. One common example is that of a sealed trait and several sub-classes. Because of the nature of sealed superclasses (and traits), all subtypes must be included in the same file.
Most of the time, case classes are just simple data containers and can be grouped together.
Another case is when multiple classes logically form a single, cohesive group, sharing concepts to the point where maintenance is greatly served by containing them within a single file.
Finally, there is a naming convention for exempted multi-unit Scala files:
All multi-unit files should be given camelCase names with a lower-case first letter.
So: put your Scala classes and objects in separate files, unless they fall into one of the three mentioned exceptions.
In Scala, it is perfectly valid to have multiple classes within a single file AS LONG AS they are tightly related.
But not all languages encourage this convention, and I think it is worth considering the reason.
I personally dislike it when people put multiple classes into a single file because it makes it harder to find a class definition. This is magnified in code reviews where I want to be able to review code as quickly as possible without digging around.
Cons
Code reviews require me to do more searching to find a class
I don't like having to grep to find a file
A consistent naming convention allows me to use my text editor or IDE tools to quickly open a file by the class name
Pros
As Jesper pointed out, certain scenarios require it
Support classes/traits are kept hidden to minimize file structure "noise"
Sometimes you have to put several traits, classes or objects in one source file, particularly when you are using sealed traits. A sealed trait can only be extended inside the same source file.
I have read various old StackOverflow discussions on this general topic but there is still one part of the puzzle which appears, to me at least, to be missing.
It is simply this: what is the actual mechanism by which the anonymous function is serialized? And, where could we find its source code?
Or is it all just magic?
Other relevant SO articles (the third of these itself points to some useful articles outside StockOverflow):
Serialization of Scala Functions
Why Scala can serialize...
How to serialize functions in Scala
I'm going to answer my own question with what, I believe is the correct answer. The reason I'm doing it this way is that it seems to me that this aspect of serialization is never explained and it does appear to work just by magic. I essentially confirmed (to my satisfaction) the answer as part of the research I was doing to ensure that my question above was indeed appropriate.
But the main reason I'm offering my own answer is that I invite knowledgeable users either to agree with it, to correct it, to expand upon it, or to destroy it. Here goes...
It's all magic. No, I'm just kidding. But essentially the mechanism, once Scala has taken the step of representing the anonymous function as a Class, is entirely provided for by Java. In addition, we, the programmer, need to ensure that an anonymous function is as much pure code as possible: no references to any objects that might not be serializable. The secret sauce is to be found in the Java class: ObjectStreamClass. Which, in turn, is invoked by the Java serialization classes: ObjectInputStream and ObjectOutputStream.
Essentially the serialized bytes contain the full pathname of the class, its serialVersionUID, and whatever other relevant information is necessary. When deserializing, the system will simply look up the class in the appropriate classpath and return a reference to it. This obviously assumes that the deserializing system has the class in its classpath. The mechanism for that is a little beyond the scope of my research but it's clear that in a system like Spark, it should be easy to arrange.
No (additional) compilation/decompilation of byte code is necessary as the classLoader has everything necessary. I'm slightly surprised to find the ObjectStreamClass in java.io rather than in the reflection package, but I suppose there's an argument for it being there, given the tight coupling with ObjectInputStream and ObjectOutputStream.
One thing to keep in mind is that while we think in terms of serializing/deserializing objects, rather than classes, what we are dealing with here is an object of type Class.
One more thing to note is that in Scala 2.12, anonymous functions are now implemented differently: as Java8 lambdas. This has broken the mechanism described above in a rather serious way. So serious, that Spark is currently having trouble supporting Scala 2.12. The holdup appears to be this issue: SPARK-14540.
I'm trying to add some re-usability to a Java library which has some common methods across classes, but whose methods are not part of a common hierarchy. I'm pretty certain I've seen it previously that Scala allows non-trait based contracts for parameter classes, but for the life of me I cannot find this information anywhere at the moment.
Does my memory serve me correctly? Would anybody be able to point me in the right direction for documentation on said language feature (if I am not mistaken)?
For some added context, I'm trying to reduce duplicate code when using some Google Java libraries where things like getNextPageToken(), setPageToken(), etc. are common between many classes, but are not implemented further up in the hierarchy where I would have the option to specify a common parent class as the parameter type. So essentially I'd like to enforce that these methods exist and offload the duplicate request & pagination code to a common function using said method contracts.
You probably want to use structural types:
example:
def method(param: { def getNextPageToken(): Unit })
param will be required to have getNextPageToken method with no parameters and returning Unit. It is handled using reflection.
The only ways I am aware of, aren't "direct":
converting to ANTLR format and using its own visualizer
VISUALLANGLAB, which it seems to require an entire mouse-clicks "rewrite"
implementing a converter by myself (which would be funny, but time-consuming)
second link below
Related:
comparison
wrapper
a 3rd party attempt
The second link suggests to debug adding an implicitly method to the parsers:
implicit def toLogged(name:String) = new {
def !!![T](p:Parser[T]) = log(p)(name)
}
May be an AST would be more feasible/usefull; but the question remains similar.
I might have misunderstood your question.
Scala parser combinators are used to parse strings to instances of types that you can use (either custom or built-in). The result is a structure of Scala instances that you decide, this could be anything.
You could create a parser that parses your arbitrary string into instances of a well known java structure for example ECore.
Without a usecase it's hard to suggest the best road for your problem. Maybe Xtext can help you: http://www.eclipse.org/Xtext/. Xtext has quite a few built-in features, however it's an Eclipse plugin and you might need something else.