Writing a Custom Class to HDFS in Apache Flink - scala

I am trying to get familiar with the semantics of Flink after having started with Spark. I would like to write a DataSet[IndexNode] to persistent storage in HDFS so that it can be read later by another process. Spark has a simple ObjectFile API that provides such a functionality, but I cannot find a similar option in Flink.
case class IndexNode(vec: Vector[IndexNode],
id: Int) extends Serializable {
// Getters and setters etc. here
}
The build-in sinks tend to serialize the instance based on the toString method, which is not suitable here due to the nested structure of the class. I imagine the solution is to use a FileOutputFormat and provide a translation of the instances to a byte stream. However, I am not sure how to serialize the vector, which is of an arbitrary length and can be many levels deep.

You can achieve this by using SerializedOutputFormat and SerializedInputFormat.
Try following steps:
Make IndexNode extend IOReadableWritable interface from FLINK. Make unserialisable fields #transient. Implement write(DataOutputView out) and read(DataInputView in) method. The write method will write out all data from IndexNode and read method will read them back and build all internal data fields. For example, instead of serialising all data from arr field in Result class, I write out all value out and then read them back and rebuild the array in read method.
class Result(var name: String, var count: Int) extends IOReadableWritable {
#transient
var arr = Array(count, count)
def this() {
this("", 1)
}
override def write(out: DataOutputView): Unit = {
out.writeInt(count)
out.writeUTF(name)
}
override def read(in: DataInputView): Unit = {
count = in.readInt()
name = in.readUTF()
arr = Array(count, count)
}
override def toString: String = s"$name, $count, ${getArr}"
}
Write out data with
myDataSet.write(new SerializedOutputFormat[Result], "/tmp/test")
and read it back with
env.readFile(new SerializedInputFormat[Result], "/tmp/test")

Related

Serializing message with protobuf for akka actor which contains serializable data

I have a persistent actor which can receive one type of command Persist(event) where event is a type of trait Event (there are numerous implementations of it). And on success, this reponds with Persisted(event) to the sender.
The event itself is serializable since this is the data we store in persistence storage and the serialization is implemented with a custom serializer which internally uses classes generated from google protobuf .proto files. And this custom serializer is configured in application.conf and bound to the base trait Event. This is already working fine.
Note: The implementations of Event are not classes generated by protobuf. They are normal scala classes and they have a protobuf equivalent of them too, but that's mapped through the custom serializer that's bound to the base Event type. This was done by my predecessors for versioning (which probably isn't required because this can be handled with plain protobuf classes + custom serialization combi too, but that's a different matter) and I don't wish to change that atm.
We're now trying to implement cluster sharding for this actor which also means that my commands (viz. Persist and Persisted) also need to be serializable since they may be forwarded to other nodes.
This is the domain model :
sealed trait PersistenceCommand {
def event: Event
}
final case class Persisted(event: Event) extends PersistenceCommand
final case class Persist(event: Event) extends PersistenceCommand
Problem is, I do not see an elegent way to make it serializable. Following are the options I have considered
Approach 1. Define a new proto file for Persist and Persisted, but what do I use as the datatype for event? I didn't find a way to define something like this :
message Persist {
"com.example.Event" event = 1 // this doesn't work
}
Such that I can use existing Scala trait Event as a data type. If this works, I guess (it's far fetched though) I could bind the generated code (After compiling this proto file) to akka's inbuilt serializer for google protobuf and it may work. The note above explains why I cannot use oneof construct in my proto file.
Approach 2. This is what I've implemented and it works (but I don't like it)
Basically, I wrote a new serializer for the commands and delegated seraizalition and de-serialization of event part of the command to the existing serializer.
class PersistenceCommandSerializer extends SerializerWithStringManifest {
val eventSerializer: ManifestAwareEventSerializer = new ManifestAwareEventSerializer()
val PersistManifest = Persist.getClass.getName
val PersistedManifest = Persisted.getClass.getName
val Separator = "~"
override def identifier: Int = 808653986
override def manifest(o: AnyRef): String = o match {
case Persist(event) => s"$PersistManifest$Separator${eventSerializer.manifest(event)}"
case Persisted(event) => s"$PersistedManifest$Separator${eventSerializer.manifest(event)}"
}
override def toBinary(o: AnyRef): Array[Byte] = o match {
case command: PersistenceCommand => eventSerializer.toBinary(command.event)
}
override def fromBinary(bytes: Array[Byte], manifest: String): AnyRef = {
val (commandManifest, dataManifest) = splitIntoCommandAndDataManifests(manifest)
val event = eventSerializer.fromBinary(bytes, dataManifest).asInstanceOf[Event]
commandManifest match {
case PersistManifest =>
Persist(event)
case PersistedManifest =>
Persisted(event)
}
}
private def splitIntoCommandAndDataManifests(manifest: String) = {
val commandAndDataManifests = manifest.split(Separator)
(commandAndDataManifests(0), commandAndDataManifests(1))
}
}
Problem with this approach is the thing I'm doing in def manifest and in def fromBinary. I had to make sure that I have the command's manifest as well as the event's manifest while serializing and de-serializing. Hence, I had to use ~ as a separator - sort of, my custom serialization technique for the manifest information.
Is there a better or perhaps, a right way, to implement this?
For context: I'm using ScalaPB for generating scala classes from .proto files and classic akka actors.
Any kind of guidance is hugely appreciated!
If you delegate serialization of the nested object to whichever serializer you have configured the nested field should have bytes for the serialized data but also an int32 with the id of the used serializer and bytes for the message manifest. This ensures that you will be able to version/replace the nested serializers which is important for data that will be stored for a longer time period.
You can see how this is done internally in Akka for our own wire formats here: https://github.com/akka/akka/blob/6bf20f4117a8c27f8bd412228424caafe76a89eb/akka-remote/src/main/protobuf/WireFormats.proto#L48 and here https://github.com/akka/akka/blob/6bf20f4117a8c27f8bd412228424caafe76a89eb/akka-remote/src/main/scala/akka/remote/MessageSerializer.scala#L45

Scala object constructor

I have to create a file loader object and I would like the file to be loaded only once at object creation.
What I did until now is create a trait with a method read that will read file and output a list of String.
trait Loader {
protected val readSource: List[String] = {
Source
.fromInputStream(getClass.getResourceAsStream("filename"), "UTF-8")
.getLines()
.toList
}
def transform(delimeter: String): Vector[C] = {
val lines = readSource
// process the lines
}
}
The trait is implemented by several object, and the transform method can be called multiple times in the client code.
I would like to avoid re reading the file each time the transform method is called and my first solution was to extract the val lines = readSource from the transform method and make a function of it def loadFile = readSource and to create a apply method in my objects to call loadFile like so :
object MyLoader extends Loader {
def apply: List[String] = {
loadFile
}
}
I am wondering if this is the right way to do it. Thank you for your advices.
If you want the resource read once for all, then you should do that in a singleton object, which will be initialized once lazily.
Clients should use that object. "Prefer composition over inheritance" is the mantra.
If you want a mix-in that makes it easy to use the object, you can use "self-types" to constrain clients:
trait HasResource { val resource: R = TheResource }
trait Client { self: HasResource => def getR: R = resource }
This is the "cake pattern" way of making stuff available.

transform anorm (play framework) value before binding

I have a case class representing my domain:
case class MyModel(rawValue: String, transformedValue: String)
rawValue maps to a value in the database and is properly parsed and bound. What I am trying to do is add transformedValue to my model: this value is just some arbitrary transformation that I perform on the rawValue. It does not map to any data in the database / query.
I have a parser (prior to adding transformedValue) that looks like this:
val parser = {
get[String]("rawValue") map {
case rawValue => MyModel(rawValue)
}
}
Since MyModel is immutible and I can't insert transformedValue into it after its been created, what and where is the best way to do and add this transformation (e.g. adding ad-hoc values to the Model) preferably without using vars?
Coming from Java, I would probably just have added a getTransformedValue getter to the domain class that performs this transformation on the rawValue attribute.
Since transformedValue seems to be a derived property, and not something that would be supplied in the constructor, you can define it as an attribute in the body of the function, possibly using the lazy qualifier so that it will only be computed on-demand (and just once for instance):
case class MyModel(rawValue: String) {
lazy val transformedValue: String = {
// Transformation logic, e.g.
rawValue.toLowerCase()
}
}
val is evaluated when defined; def is evaluated when called. A lazy val is evaluated when it is accessed the first time.
It may be appropriate to use lazy in the case of an expensive computation that is rarely needed, or when logic requires it. But for most cases a regular val should be used. Quoting:
lazy val is not free (or even cheap). Use it only if you absolutely need laziness for correctness, not for optimization.
I don't see why you wouldn't just do the transformation in the parser itself:
def transformation(rawValue: String): String = ...
val parser = {
get[String]("rawValue") map {
case rawValue => MyModel(rawValue, transformation(rawValue))
}
}
Or if you don't want to do it there for some reason, you can use copy to create a new copy of MyModel with a new value:
val model = MyModel("value", ..)
val modelWithTransform = model.copy(transformedValue = transformation(model.rawValue))
You could also overload apply to automatically apply the transformation within the companion object:
case class MyModel(rawValue: String, transformedValue: String)
object MyModel {
def apply(value: String): MyModel = MyModel(rawValue, transformation(rawValue))
val parser = {
get[String]("rawValue") map {
case rawValue => MyModel(rawValue)
}
}
}
MyModel may be immutable, but you can always create another copy of it with some values changed.
Turns out it was easier than I thought and the solution did look like what I said I would have done in java:
I just add a function to MyModel that does this transformation:
case class MyModel(rawValue: String) {
def transformedValue = {
// do the transformation here and return it
}
}

Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

Getting strange behavior when calling function outside of a closure:
when function is in a object everything is working
when function is in a class get :
Task not serializable: java.io.NotSerializableException: testing
The problem is I need my code in a class and not an object. Any idea why this is happening? Is a Scala object serialized (default?)?
This is a working code example:
object working extends App {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
//calling function outside closure
val after = rddList.map(someFunc(_))
def someFunc(a:Int) = a+1
after.collect().map(println(_))
}
This is the non-working example :
object NOTworking extends App {
new testing().doIT
}
//adding extends Serializable wont help
class testing {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
def doIT = {
//again calling the fucntion someFunc
val after = rddList.map(someFunc(_))
//this will crash (spark lazy)
after.collect().map(println(_))
}
def someFunc(a:Int) = a+1
}
RDDs extend the Serialisable interface, so this is not what's causing your task to fail. Now this doesn't mean that you can serialise an RDD with Spark and avoid NotSerializableException
Spark is a distributed computing engine and its main abstraction is a resilient distributed dataset (RDD), which can be viewed as a distributed collection. Basically, RDD's elements are partitioned across the nodes of the cluster, but Spark abstracts this away from the user, letting the user interact with the RDD (collection) as if it were a local one.
Not to get into too many details, but when you run different transformations on a RDD (map, flatMap, filter and others), your transformation code (closure) is:
serialized on the driver node,
shipped to the appropriate nodes in the cluster,
deserialized,
and finally executed on the nodes
You can of course run this locally (as in your example), but all those phases (apart from shipping over network) still occur. [This lets you catch any bugs even before deploying to production]
What happens in your second case is that you are calling a method, defined in class testing from inside the map function. Spark sees that and since methods cannot be serialized on their own, Spark tries to serialize the whole testing class, so that the code will still work when executed in another JVM. You have two possibilities:
Either you make class testing serializable, so the whole class can be serialized by Spark:
import org.apache.spark.{SparkContext,SparkConf}
object Spark {
val ctx = new SparkContext(new SparkConf().setAppName("test").setMaster("local[*]"))
}
object NOTworking extends App {
new Test().doIT
}
class Test extends java.io.Serializable {
val rddList = Spark.ctx.parallelize(List(1,2,3))
def doIT() = {
val after = rddList.map(someFunc)
after.collect().foreach(println)
}
def someFunc(a: Int) = a + 1
}
or you make someFunc function instead of a method (functions are objects in Scala), so that Spark will be able to serialize it:
import org.apache.spark.{SparkContext,SparkConf}
object Spark {
val ctx = new SparkContext(new SparkConf().setAppName("test").setMaster("local[*]"))
}
object NOTworking extends App {
new Test().doIT
}
class Test {
val rddList = Spark.ctx.parallelize(List(1,2,3))
def doIT() = {
val after = rddList.map(someFunc)
after.collect().foreach(println)
}
val someFunc = (a: Int) => a + 1
}
Similar, but not the same problem with class serialization can be of interest to you and you can read on it in this Spark Summit 2013 presentation.
As a side note, you can rewrite rddList.map(someFunc(_)) to rddList.map(someFunc), they are exactly the same. Usually, the second is preferred as it's less verbose and cleaner to read.
EDIT (2015-03-15): SPARK-5307 introduced SerializationDebugger and Spark 1.3.0 is the first version to use it. It adds serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object.
In OP's case, this is what gets printed to stdout:
Serialization stack:
- object not serializable (class: testing, value: testing#2dfe2f00)
- field (class: testing$$anonfun$1, name: $outer, type: class testing)
- object (class testing$$anonfun$1, <function1>)
Grega's answer is great in explaining why the original code does not work and two ways to fix the issue. However, this solution is not very flexible; consider the case where your closure includes a method call on a non-Serializable class that you have no control over. You can neither add the Serializable tag to this class nor change the underlying implementation to change the method into a function.
Nilesh presents a great workaround for this, but the solution can be made both more concise and general:
def genMapper[A, B](f: A => B): A => B = {
val locker = com.twitter.chill.MeatLocker(f)
x => locker.get.apply(x)
}
This function-serializer can then be used to automatically wrap closures and method calls:
rdd map genMapper(someFunc)
This technique also has the benefit of not requiring the additional Shark dependencies in order to access KryoSerializationWrapper, since Twitter's Chill is already pulled in by core Spark
Complete talk fully explaining the problem, which proposes a great paradigm shifting way to avoid these serialization problems: https://github.com/samthebest/dump/blob/master/sams-scala-tutorial/serialization-exceptions-and-memory-leaks-no-ws.md
The top voted answer is basically suggesting throwing away an entire language feature - that is no longer using methods and only using functions. Indeed in functional programming methods in classes should be avoided, but turning them into functions isn't solving the design issue here (see above link).
As a quick fix in this particular situation you could just use the #transient annotation to tell it not to try to serialise the offending value (here, Spark.ctx is a custom class not Spark's one following OP's naming):
#transient
val rddList = Spark.ctx.parallelize(list)
You can also restructure code so that rddList lives somewhere else, but that is also nasty.
The Future is Probably Spores
In future Scala will include these things called "spores" that should allow us to fine grain control what does and does not exactly get pulled in by a closure. Furthermore this should turn all mistakes of accidentally pulling in non-serializable types (or any unwanted values) into compile errors rather than now which is horrible runtime exceptions / memory leaks.
http://docs.scala-lang.org/sips/pending/spores.html
A tip on Kryo serialization
When using kyro, make it so that registration is necessary, this will mean you get errors instead of memory leaks:
"Finally, I know that kryo has kryo.setRegistrationOptional(true) but I am having a very difficult time trying to figure out how to use it. When this option is turned on, kryo still seems to throw exceptions if I haven't registered classes."
Strategy for registering classes with kryo
Of course this only gives you type-level control not value-level control.
... more ideas to come.
I faced similar issue, and what I understand from Grega's answer is
object NOTworking extends App {
new testing().doIT
}
//adding extends Serializable wont help
class testing {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
def doIT = {
//again calling the fucntion someFunc
val after = rddList.map(someFunc(_))
//this will crash (spark lazy)
after.collect().map(println(_))
}
def someFunc(a:Int) = a+1
}
your doIT method is trying to serialize someFunc(_) method, but as method are not serializable, it tries to serialize class testing which is again not serializable.
So make your code work, you should define someFunc inside doIT method. For example:
def doIT = {
def someFunc(a:Int) = a+1
//function definition
}
val after = rddList.map(someFunc(_))
after.collect().map(println(_))
}
And if there are multiple functions coming into picture, then all those functions should be available to the parent context.
I solved this problem using a different approach. You simply need to serialize the objects before passing through the closure, and de-serialize afterwards. This approach just works, even if your classes aren't Serializable, because it uses Kryo behind the scenes. All you need is some curry. ;)
Here's an example of how I did it:
def genMapper(kryoWrapper: KryoSerializationWrapper[(Foo => Bar)])
(foo: Foo) : Bar = {
kryoWrapper.value.apply(foo)
}
val mapper = genMapper(KryoSerializationWrapper(new Blah(abc))) _
rdd.flatMap(mapper).collectAsMap()
object Blah(abc: ABC) extends (Foo => Bar) {
def apply(foo: Foo) : Bar = { //This is the real function }
}
Feel free to make Blah as complicated as you want, class, companion object, nested classes, references to multiple 3rd party libs.
KryoSerializationWrapper refers to: https://github.com/amplab/shark/blob/master/src/main/scala/shark/execution/serialization/KryoSerializationWrapper.scala
I'm not entirely certain that this applies to Scala but, in Java, I solved the NotSerializableException by refactoring my code so that the closure did not access a non-serializable final field.
Scala methods defined in a class are non-serializable, methods can be converted into functions to resolve serialization issue.
Method syntax
def func_name (x String) : String = {
...
return x
}
function syntax
val func_name = { (x String) =>
...
x
}
FYI in Spark 2.4 a lot of you will probably encounter this issue. Kryo serialization has gotten better but in many cases you cannot use spark.kryo.unsafe=true or the naive kryo serializer.
For a quick fix try changing the following in your Spark configuration
spark.kryo.unsafe="false"
OR
spark.serializer="org.apache.spark.serializer.JavaSerializer"
I modify custom RDD transformations that I encounter or personally write by using explicit broadcast variables and utilizing the new inbuilt twitter-chill api, converting them from rdd.map(row => to rdd.mapPartitions(partition => { functions.
Example
Old (not-great) Way
val sampleMap = Map("index1" -> 1234, "index2" -> 2345)
val outputRDD = rdd.map(row => {
val value = sampleMap.get(row._1)
value
})
Alternative (better) Way
import com.twitter.chill.MeatLocker
val sampleMap = Map("index1" -> 1234, "index2" -> 2345)
val brdSerSampleMap = spark.sparkContext.broadcast(MeatLocker(sampleMap))
rdd.mapPartitions(partition => {
val deSerSampleMap = brdSerSampleMap.value.get
partition.map(row => {
val value = sampleMap.get(row._1)
value
}).toIterator
})
This new way will only call the broadcast variable once per partition which is better. You will still need to use Java Serialization if you do not register classes.
I had a similar experience.
The error was triggered when I initialize a variable on the driver (master), but then tried to use it on one of the workers.
When that happens, Spark Streaming will try to serialize the object to send it over to the worker, and fail if the object is not serializable.
I solved the error by making the variable static.
Previous non-working code
private final PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Working code
private static final PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Credits:
https://learn.microsoft.com/en-us/answers/questions/35812/sparkexception-job-aborted-due-to-stage-failure-ta.html ( The answer of pradeepcheekatla-msft)
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/javaionotserializableexception.html
def upper(name: String) : String = {
var uppper : String = name.toUpperCase()
uppper
}
val toUpperName = udf {(EmpName: String) => upper(EmpName)}
val emp_details = """[{"id": "1","name": "James Butt","country": "USA"},
{"id": "2", "name": "Josephine Darakjy","country": "USA"},
{"id": "3", "name": "Art Venere","country": "USA"},
{"id": "4", "name": "Lenna Paprocki","country": "USA"},
{"id": "5", "name": "Donette Foller","country": "USA"},
{"id": "6", "name": "Leota Dilliard","country": "USA"}]"""
val df_emp = spark.read.json(Seq(emp_details).toDS())
val df_name=df_emp.select($"id",$"name")
val df_upperName= df_name.withColumn("name",toUpperName($"name")).filter("id='5'")
display(df_upperName)
this will give error
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
Solution -
import java.io.Serializable;
object obj_upper extends Serializable {
def upper(name: String) : String =
{
var uppper : String = name.toUpperCase()
uppper
}
val toUpperName = udf {(EmpName: String) => upper(EmpName)}
}
val df_upperName=
df_name.withColumn("name",obj_upper.toUpperName($"name")).filter("id='5'")
display(df_upperName)
My solution was to add a compagnion class that handles all methods that are not seriazable within the class.

Define custom serialization with Casbah / Salat - or delegate serialization to member?

I'm in the process of learning Scala for a new project having come from Rails. I've defined a type that is going to be used in a number of my models which can basically be thought of as collection of 'attributes'. It's basically just a wrapper for a hashmap that delegates most of its responsibilities to it:
case class Description(attributes: Map[String, String]) {
override def hashCode: Int = attributes.hashCode
override def equals(other: Any) = other match {
case that: Description => this.attributes == that.attributes
case _ => false
}
}
So I would then define a model class with a Description, something like:
case class Person(val name: String, val description: Description)
However, when I persist a Person with a SalatDAO I end up with a document that looks like this:
{
name : "Russell",
description:
{
attributes:
{
hair: "brown",
favourite_color: "blue"
}
}
}
When in actual fact I don't need the nesting of the attributes tag in the description tag - what I actually want is this:
{
name : "Russell",
description:
{
hair: "brown",
favourite_color: "blue"
}
}
I haven't tried, but I reckon I could get that to work if I made Description extend a Map rather than contain one, but I'd rather not, because a Description isn't a type of Map, it's something which has some of the behaviour of a Map as well as other behaviour of its own I'm going to add later. Composition over inheritance and so on.
So my question is, how can I tell Salat (or Casbah, I'm actually a bit unclear as to which is doing the conversion as I've only just started using them) how to serialize and deserialize the Description class? In the casbah tutorial here it says:
It is also possible to create your own custom type serializers and
deserializers. See Custom Serializers and Deserializers.
But this page doesn't seem to exist. Or am I going about it the wrong way? Is there actually a really simple way to indicate this is what I want to happen, an annotation or something? Or can I simply delegate the serialization to the attributes map in some way?
EDIT: After having a look at the source for the JodaTime conversion helper I've tried the following but have had no luck getting it to work yet:
import org.bson.{ BSON, Transformer }
import com.mongodb.casbah.commons.conversions.MongoConversionHelper
object RegisterCustomConversionHelpers extends Serializers
with Deserializers {
def apply() = {
super.register()
}
}
trait Serializers extends MongoConversionHelper
with DescriptionSerializer {
override def register() = {
super.register()
}
override def unregister() = {
super.unregister()
}
}
trait Deserializers extends MongoConversionHelper {
override def register() = {
super.register()
}
override def unregister() = {
super.unregister()
}
}
trait DescriptionSerializer extends MongoConversionHelper {
private val transformer = new Transformer {
def transform(o: AnyRef): AnyRef = o match {
case d: Description => d.attributes.asInstanceOf[AnyRef]
case _ => o
}
}
override def register() = {
BSON.addEncodingHook(classOf[Description], transformer)
super.register()
}
}
When I call RegisterCustomConversionHelpers() then save a Person I don't get any errors, it just has no effect, saving the document the same way as ever. This also seems like quite a lot to have to do for what I want.
Salat maintainer here.
I don't understand the value of Description as a wrapper here. It wraps a map of attributes, overrides the default equals and hashcode impl of a case class - which seems unnecessary since the impl is delegated to the map anyhow and that is exactly what the case class does anyway - and introduces an additional layer of indirection to the serialized object.
Have you considered just:
case class Person(val name: String, val description: Map[String, String])
This will do exactly what you want out of box.
In another situation I would recommend a simple type alias but unfortunately Salat can't support type aliases right now due to some issues with how they are depicted in pickled Scala signatures.
(You probably omitted this from your example from brevity, but it is best practice for your Mongo model to have an _id field - if you don't, the Mongo Java driver will supply one for you)
There is a working example of a custom BSON hook in the salat-core test package (it handles java.net.URL). It could be that your hook is not working simply because you are not registering it in the right place? But still, I would recommend getting rid of Description unless it is adding some value that is not evident from your example above.
Based on #prasinous' answer I decided this wasn't going to be that easy so I've changed my design a bit to the following, which pretty much gets me what I want. Rather than persisting the Description as a field I persist a vanilla map then mix in a Described trait to the model classes I want to have a description, which automatically converts the map to Description when the object is created. Would appreciate it if anyone can point out any obvious problems to this approach or any suggestions for improvement.
class Description(val attributes: Map[String, String]){
//rest of class omitted
}
trait Described {
val attributes: Map[String, String]
val description = new Description(attributes)
}
case class Person(name: String, attributes: Map[String, String]) extends Described