Pre-process parameters of a case class constructor without repeating the argument list - scala

I have this case class with a lot of parameters:
case class Document(id:String, title:String, ...12 more params.. , keywords: Seq[String])
For certain parameters, I need to do some string cleanup (trim, etc) before creating the object.
I know I could add a companion object with an apply function, but the LAST thing I want is to write the list of parameters TWICE in my code (case class constructor and companion object's apply).
Does Scala provide anything to help me on this?

My general recommendations would be:
Your goal (data preprocessing) is the perfect use case of a companion object -- so it is maybe the most idiomatic solution despite the boilerplate.
If the number of case class parameters is high the builder pattern definitely helps, since you do not have to remember the order of the parameters and your IDE can help you with calling the builder member functions. Using named arguments for the case class constructor allows you to use a random argument order as well but, to my knowledge, there is not IDE autocompletion for named arguments => makes a builder class slightly more convenient. However using a builder class raises the question of how to deal with enforcing the specification of certain arguments -- the simple solution may cause runtime errors; the type-safe solution is a bit more verbose. In this regard a case class with default arguments is more elegant.
There is also this solution: Introduce an additional flag preprocessed with a default argument of false. Whenever you want to use an instance val d: Document, you call d.preprocess() implemented via the case class copy method (to avoid ever typing all your arguments again):
case class Document(id: String, title: String, keywords: Seq[String], preprocessed: Boolean = false) {
def preprocess() = if (preprocessed) this else {
this.copy(title = title.trim, preprocessed = true) // or whatever you want to do
}
}
But: You cannot prevent a client to initialize preprocessed set to true.
Another option would be to make some of your parameters a private val and expose the corresponding getter for the preprocessed data:
case class Document(id: String, title: String, private val _keywords: Seq[String]) {
val keywords = _keywords.map(kw => kw.trim)
}
But: Pattern matching and the default toString implementation will not give you quite what you want...

After changing context for half an hour, I looked at this problem with fresh eyes and came up with this:
case class Document(id: String, title: String, var keywords: Seq[String]) {
keywords = keywords.map(kw => kw.trim)
}
I simply make the argument mutable adding var and cleanup data in the class body.
Ok I know, my data is not immutable anymore and Martin Odersky will probably kill a kitten after seeing this, but hey.. I managed to do what I want adding 3 characters. I call this a win :)

Related

Understanding Some and Option in Scala

I am reading this example from their docs:
class Email(val username: String, val domainName: String)
object Email {
def fromString(emailString: String): Option[Email] = {
emailString.split('#') match {
case Array(a, b) => Some(new Email(a, b))
case _ => None
}
}
}
println(Email.fromString("scala.center#epfl.ch"))
val scalaCenterEmail = Email.fromString("scala.center#epfl.ch")
scalaCenterEmail match {
case Some(email) => println(
s"""Registered an email
|Username: ${email.username}
|Domain name: ${email.domainName}
""")
case None => println("Error: could not parse email")
}
My questions:
What is Some and Option?
What is a factory method (just some function that creates a new object and returns it?)
What is the point of companion objects? Is it just to contain functions that are available to all instances of class? Are they like class methods in Ruby?
What is Some and Option?
Option is a data structure that represents optionality, as the name suggests. Whenever a computation may not return a value, you can return an Option. Option has two cases (represented as two subclasses): Some or None.
In the example above, the method Email.fromString can fail and not return a value. This is represented with Option. In order to know whether the computation yielded a value or not, you can use match and check whether it was a Some or a None:
Email.fromString("scala.center#epfl.ch") match {
case Some(email) => // do something if it's a Some
case None => // do something it it's a None
}
This is much better than returning null because now whoever calls the method can't possibly forget to check the return value.
For example compare this:
def willReturnNull(s: String): String = null
willReturnNull("foo").length() // NullPointerException!
with this
def willReturnNone(s: String): Option[String] = None
willReturnNone("foo").length() // doesn't compile, because 'length' is not a member of `Option`
Also, note that using match is just a way of working with Option. Further discussion would involve using map, flatMap, getOrElse or similar methods defined on Option, but I feel it would be off-topic here.
What is a factory method (just some function that creates a new object and returns it?)
This is nothing specific to Scala. A "factory method" is usually a static method that constructs the value of some type, possibly hiding the details of the type itself. In this case fromString is a factory method because it allows you create an Email without calling the Email constructor with new Email(...)
What is the point of companion objects? Is it just to contain functions that are available to all instances of class? Are they like class methods in Ruby?
As a first approximation, yes. Scala doesn't have static members of a class. Instead, you can have an object associated with that class where you define everything that is static.
E.g. in Java you would have:
public class Email {
public String username;
public String domain;
public static Optional<Email> fromString(String: s) {
// ...
}
}
Where as in Scala you would define the same class as roughly:
class Email(val username: String, val domain: String)
object Email {
def fromString(s: String): Option[Email] = {
// ...
}
}
I would like to add some examples/information to the third question.
If you use akka in companion object you can put every message that you use in case method (it should proceed and use by actor). Moreover, you can add some val for a name of actors or other constant values.
If you work with JSON you should create a format for it (sometimes custom reads and writes). This format you should put inside companion object. Methods to create instances too.
If you go deeper to Scala you can find case classes. So a possibility to create an object of this class without new is because there is a method apply in "default" companion object.
But in general, it's a place where you can put every "static" method etc.
About Option, it provides you a possibility to avoid some exception and make something when you don't have any values.
Gabriele put an example with email, so I'll add another one.
You have a method that sends email, but you take email from User class. The user can have this field empty, so if we have something like it
val maybeEmail: Option[String] = user.email you can use for example map to send an email
maybeEmail.map(email => sendEmail(email))
So if you use it, during writing methods like above you don't need to think that user specify his email or not :)

Using overloaded constructors from the superclass

I'm writing a message parser. Suppose I have a superclass Message with two auxiliary constructors, one that accepts String raw messages and one that accepts a Map with datafields mapped out in key-value pairs.
class Message {
def this(s: String)
def this(m: Map[String, String])
def toRaw = { ... } # call third party lib to return the generated msg
def map # call third party lib to return the parsed message
def something1 # something common for all messages which would be overriden in child classes
def something2 # something common for all messages which would be overriden in child classes
...
}
There's good reason to do this as the library that does parsing/generating is kind of awkward and removing the complexity of interfacing with it into a separate class makes sense, the child class would look something like this:
class SomeMessage extends Message {
def something1 # ...
def something2 # ...
}
and the idea is to use the overloaded constructors in the child class, for example:
val msg = new SomeMessage(rawMessage) # or
val msg = new SomeMessage("fld1" -> ".....", "fld2" -> "....")
# and then be able to call
msg.something1
msg.something2 # ...
However, the way auxiliary constructors and inheritance seem to behave in Scala this pattern has proven to be pretty challenging, and the simplest solution I found so far is to create a method called constructMe, which does the work of the constructors in the above case:
val msg = new SomeMessage
msg.constructMe(rawMessage) # or
msg.constructMe("fld1" -> ".....", "fld2" -> "....")
which seems crazy to need a method called constructMe.
So, the question:
is there a way to structure the code so to simply use the overloaded constructors from the superclass? For example:
val msg = new SomeMessage(rawMessage) # or
val msg = new SomeMessage("fld1" -> ".....", "fld2" -> "....")
or am I simply approaching the problem the wrong way?
Unless I'm missing something, you are calling the constructor like this:
val msg = new SomeMessage(rawMessage)
But the Message class doesn't not take a parameter, your class should be defined so:
class Message(val message: String) {
def this(m: Map[String, String]) = this("some value from mapping")
}
Also note that the constructor in scala must call the primary constructor as first action, see this question for more info.
And then the class extending the Message class should be like this:
class SomeMessage(val someString: String) extends Message(someString) {
def this(m: Map[String, String]) = this("this is a SomeMessage")
}
Note that the constructor needs a code block otherwise your code won't compile, you can't have a definition like def this(someString: String) without providing the implementation.
Edit:
To be honest I don't quite get why you want to use Maps in your architecture, your class main point it to contain a String, having to do with complex types in constructors can lead to problems. Let's say you have some class which can take a Map[String, String] as a constructor parameter, what will you do with it? As I said a constructor must call himself as first instruction, what you could is something like this:
class A(someString: String) = {
def this(map: Map[String, String]) = this(map.toString)
}
And that's it, the restrictions in scala don't allow you to do anything more, you would want to do some validation, for example let's say you want to take always the second element in the map, this could throw exceptions since the user is not forced to provide a map with more than one value, he's not even forced to provide a filled map unless you start filling your class with requires.
In your case I probably would leave String as class parameter or maybe a List[String] where you can call mkString or toString.
Anyway if you are satisfied calling map.toString you have to give both constructor implementation to parent and child class, this is one of scala constructor restrictions (in Java you could approach the problem in a different way), I hope somebody will prove me wrong, but as far as I know there's no other way to do it.
As a side note, I personally find this kind of restriction to be correct (most of the time) since the force you to structure your code to be more rigorous and have a better architecture, think about the fact that allowing people to do whatever they want in a constructor (like in java) obfuscate their true purpose, that is return a new instance of a class.

How to pass around string values type-safely?

E.g.:
def updateAsinRecords(asins:Seq[String], recordType:String)
Above method takes a Seq of ASINs and a record type. Both have type of String. There are also other values that are passed around with type String in the application. Needless to say, this being Scala, I'd like to use the type system to my advantage. How to pass around string values in a type safe manner (like below)?
def updateAsinRecords(asins:Seq[ASIN], recordType:RecordType)
^ ^
I can imagine, having something like this:
trait ASIN { val value:String }
but I'm wondering if there's a better approach...
There is an excellent bit of new Scala functionality know as Value Classes and Universal Traits. They impose no runtime overhead but you can use them to work in a type safe manner:
class AnsiString(val inner: String) extends AnyVal
class Record(val inner: String) extends AnyVal
def updateAnsiRecords(ansi: Seq[AnsiString], record: Record)
They were created specifically for this purpose.
You could add thin wrappers with case classes:
case class ASIN(asin: String)
case class RecordType(recordType: String)
def updateAsinRecords(asins: Seq[ASIN], recordType: RecordType) = ???
updateAsinRecords(Vector(ASIN("a"), ASIN("b")), RecordType("c"))
This will not only make your code safer, but it will also make it much easier to read! The other big advantage of this approach is that refactoring later will be much easier. For example, if you decide later that an ASIN should have two fields instead of just one, then you just update the ASIN class definition instead of every place it's used. Likewise, you can do things like add methods to these types whenever you decide you need them.
In addition to the suggestions about using a Value Class / extends AnyVal, you should probably control the construction to allow only valid instances, since presumably not any old string is a valid ASIN. (And... is that an Amazon thing? It rings a bell somehow.)
The best way to do this is to make the constructor private and put a validating factory method in a companion object. The reason for this is that throwing exceptions in constructors (when an attempt is made to instantiate with an invalid argument) can lead to puzzling failure modes (I often see it manifest as a NoClassDefFoundError error when trying to load a different class).
So, in addition to:
case class ASIN private (asin: String) extends AnyVal { /* other stuff */ }
You should include something like this:
object A {
import scala.util.{Try, Success, Failure}
def fromString(str: String): Try[ASIN] =
if (validASIN(str))
Success(new ASIN(str))
else
Failure(new InvalidArgumentException(s"Invalid ASIN string: $str")
}
How about a type alias?
type ASIN = String
def update(asins: Seq[ASIN])

Scala class that handles enumerations generically

I want to create a generic class that holds the value of an enumeration, and also allows access to the possible values of the enumeration. Think of a property editor for example - you need to know the current value of the property, and you also need to be able to know what other values are legal for the property. And the type of enumeration should not be known in advance, you should be able to work with any kind of enumeration.
My first thought was something like this:
class EnumerationProperty[T <: Enumeration](value:T)
However, that doesn't work because for enumerations T isn't a type, it's an object. Other variations I have tried are:
class EnumerationProperty[T <: Enumeration](value:T.Value)
class EnumerationProperty[T <: Enumeration.Value](value:T)
(I won't go into the details of why these don't work because I suspect the reasons aren't interesting.)
For first part of your question. You can define holder for generic enum value like this:
case class EnumerationProperty[T <: Enumeration#Value](value:T)
But I don't know how to get all enum values without explicitly pass Enumeration object. Enumeration has values() method to get all values. And Enumeration#Value has link to Enumeration, but with private access
In your use case (a property editor) I think the best solution is to rely on scala reflection available in scala 2.10. Given a class's TypeTag, you can get the full type (without erasure) of all of its members plus their values, and from that populate your property editor. For enumerations, use the TypeTag to get their values using the following method:
Using Scala 2.10 reflection how can I list the values of Enumeration?
Now, maybe you don't want or can't use scala reflection, and from now on I'll suppose this is the case. If that is true then you are opening a whole can of worm :)
TL;DR: This is not possible using a standard Enumeration, so you'll probably have to explcitly wrap the enumeration values as shown in Ptharien's Flame's answer (or roll your own fork of Enumeration). Below I detail all my attempts, please bear with me.
Unfortunately, and for some unknown reason to me (though I suspect it has to do with serialization issues), Enumeration.Value has no field pointing to its Enumeration instance. Given how Enumeration is implemented, this would be trivial to implement, but of course wehave no say, short of forking Enumeration and modifying our version (which is actually what I did for this very purpose, plus to add proper support for serialization and reflection - but I diggress).
If we can't modify Enumeration, maybe we can just extend it? Looking at the implementation again, something like this would seem to work:
class EnumerationEx extends Enumeration {
protected class ValEx(i: Int, name: String) extends Val(i, name) {
#transient val enum: Enumeration = EnumerationEx.this
}
override protected def Value(i: Int, name: String): Value = new ValEx(i, name)
}
object Colors extends EnumerationEx {
val Red, Green, Blue = Value
}
The downside would be that it only works for enumerations that explicitly extend EnumerationEx instead of Enumeration, but it would be better than nothing.
Unfortunately, this does not compile simply because def Value ... is declared final in Enumeration so there is no way to override it. (Note again that forking Enumeration would allow to circunvent this. Actually, why not do it, as we are already down the path of using a custom Enumeration anwyay. I'll let you judge).
So here is another take on it:
class EnumerationEx extends Enumeration {
class ValueWithEnum( inner: Value ) {
#transient val enum: Enumeration = EnumerationEx.this
}
implicit def valueToValue( value: Value ): ValueWithEnum = new ValueWithEnum( value )
}
And indeed it works as expected. Or so it seems.
scala> object Colors extends EnumerationEx {
| val Red, Green, Blue = Value
| }
defined module Colors
scala> val red = Colors.Red
red: Colors.Value = Red
scala> red.enum.values
res58: Enumeration#ValueSet = Colors.ValueSet(Red, Green, Blue)
Hooray? Well no, because the conversion from Value to ValueWithEnum is done only when accessing red.enum, not at the time of instantiation of the Colors enumeration. In other words, when calling enum the compiler needs to know the exact static type of the enumeration (the compiler must statically know that red's type is Colors.Value, and not just Enumeration# Value). And in the use case you mention (a property editor) you can only rely to java reflection (I already assumed that you won't use scala reflection) to get the type of an enumeration value, and java reflection will only give you Enumeration#Val (which extends Enumeration#Value) as the type of Colors.Red. So basically you are stuck here.
Your best bet is definitly to use scala reflection in the first place.
class EnumProps[E <: Enumeration](val e: E)(init: e.Value) {...}
Then you can use e and e.Value to implement the class.

scala programming without vars

val and var in scala, the concept is understandable enough, I think.
I wanted to do something like this (java like):
trait PersonInfo {
var name: Option[String] = None
var address: Option[String] = None
// plus another 30 var, for example
}
case class Person() extends PersonInfo
object TestObject {
def main(args: Array[String]): Unit = {
val p = new Person()
p.name = Some("someName")
p.address = Some("someAddress")
}
}
so I can change the name, address, etc...
This works well enough, but the thing is, in my program I end up with everything as vars.
As I understand val are "preferred" in scala. How can val work in this
type of example without having to rewrite all 30+ arguments every time one of them is changed?
That is, I could have
trait PersonInfo {
val name: Option[String]
val address: Option[String]
// plus another 30 val, for example
}
case class Person(name: Option[String]=None, address: Option[String]=None, ...plus another 30.. ) extends PersonInfo
object TestObject {
def main(args: Array[String]): Unit = {
val p = new Person("someName", "someAddress", .....)
// and if I want to change one thing, the address for example
val p2 = new Person("someName", "someOtherAddress", .....)
}
}
Is this the "normal" scala way of doing thing (not withstanding the 22 parameters limit)?
As can be seen, I'm very new to all this.
At first the basic option of Tony K.:
def withName(n : String) = Person(n, address)
looked promising, but I have quite a few classes that extends PersonInfo.
That means in each one I would have to re-implement the defs, lots of typing and cutting and pasting,
just to do something simple.
If I convert the trait PersonInfo to a normal class and put all the defs in it, then
I have the problem of how can I return a Person, not a PersonInfo?
Is there a clever scala thing to somehow implement in the trait or super class and have
all subclasses really extend?
As far as I can see all works very well in scala when the examples are very simple,
2 or 3 parameters, but when you have dozens it becomes very tedious and unworkable.
PersonContext of weirdcanada is I think similar, still thinking about this one. I guess if
I have 43 parameters I would need to breakup into multiple temp classes just to pump
the parameters into Person.
The copy option is also interesting, cryptic but a lot less typing.
Coming from java I was hoping for some clever tricks from scala.
Case classes have a pre-defined copy method which you should use for this.
case class Person(name: String, age: Int)
val mike = Person("Mike", 42)
val newMike = mike.copy(age = 43)
How does this work? copy is just one of the methods (besides equals, hashCode etc) that the compiler writes for you. In this example it is:
def copy(name: String = name, age: Int = age): Person = new Person(name, age)
The values name and age in this method shadow the values in the outer scope. As you can see, default values are provided, so you only need to specify the ones that you want to change. The others default to what there are in the current instance.
The reason for the existence of var in scala is to support mutable state. In some cases, mutable state is truly what you want (e.g. for performance or clarity reasons).
You are correct, though, that there is much evidence and experience behind the encouragement to use immutable state. Things work better on many fronts (concurrency, clarity of reason, etc).
One answer to your question is to provide mutator methods to the class in question that don't actually mutate the state, but instead return a new object with a modified entry:
case class Person(val name : String, val address : String) {
def withName(n : String) = Person(n, address)
...
}
This particular solution does involve coding potentially long parameter lists, but only within the class itself. Users of it get off easy:
val p = Person("Joe", "N St")
val p2 = p.withName("Sam")
...
If you consider the reasons you'd want to mutate state, then thing become clearer. If you are reading data from a database, you could have many reasons for mutating an object:
The database itself changed, and you want to auto-refresh the state of the object in memory
You want to make an update to the database itself
You want to pass an object around and have it mutated by methods all over the place
In the first case, immutable state is easy:
val updatedObj = oldObj.refresh
The second is much more complex, and there are many ways to handle it (including mutable state with dirty field tracking). It pays to look at libraries like Squery, where you can write things in a nice DSL (see http://squeryl.org/inserts-updates-delete.html) and avoid using the direct object mutation altogether.
The final one is the one you generally want to avoid for reasons of complexity. Such things are hard to parallelize, hard to reason about, and lead to all sorts of bugs where one class has a reference to another, but no guarantees about the stability of it. This kind of usage is the one that screams for immutable state of the form we are talking about.
Scala has adopted many paradigms from Functional Programming, one of them being a focus on using objects with immutable state. This means moving away from getters and setters within your classes and instead opting to to do what #Tony K. above has suggested: when you need to change the "state" of an inner object, define a function that will return a new Person object.
Trying to use immutable objects is likely the preferred Scala way.
In regards to the 22 parameter issue, you could create a context class that is passed to the constructor of Person:
case class PersonContext(all: String, of: String, your: String, parameters: Int)
class Person(context: PersonContext) extends PersonInfo { ... }
If you find yourself changing an address often and don't want to have to go through the PersonContext rigamarole, you can define a method:
def addressChanger(person: Person, address: String): Person = {
val contextWithNewAddress = ...
Person(contextWithNewAddress)
}
You could take this even further, and define a method on Person:
class Person(context: PersonContext) extends PersonInfo {
...
def newAddress(address: String): Person = {
addressChanger(this, address)
}
}
In your code, you just need to make remember that when you are updating your objects that you're often getting new objects in return. Once you get used to that concept, it becomes very natural.