How to enforce encapsulation of immutable class in Scala?

How to enforce encapsulation of immutable class in Scala? - scala

I am trying to write immutable code in Dart. Dart wasn't really built with immutability in mind, that's why I need to write a lot of boilerplate in order to achieve immutability. Because of this, I got interested in how a language, like Scala, which was built around the concept of immutability, would solve this.
I am currently using the following class in Dart:
class Profile{
List<String> _inSyncBikeIds = []; // private field
String profileName; // public field
Profile(this.profileName); // You should not be able to pass a value to _inSyncBikeIds
void synchronize(String bikeId){
_inSyncBikeIds.add(bikeId);
}
bool isInSync(String bikeId){
return _inSyncBikeIds.contains(bikeId);
}
void reset(){
_inSyncBikeIds = [];
}
}
The same class in immutable:
class Profile{
final List<String> _inSyncBikeIds = []; // private final field
final String profileName; // public final field
factory Profile(String profileName) => Profile._(profileName); // You should not be able to pass a value to _inSyncBikeIds
Profile._(this._inSyncBikeIds, this.profileName); // private contructor
Profile synchronize(String bikeId){
return _copyWith(inSyncBikeIds: _inSyncBikeIds.add(bikeId);
}
bool isInSync(String bikeId) {
return _inSyncBikeIds.contains(bikeId);
}
Profile reset(){
return _copyWith(inSyncBikeIds: []);
}
Profile copyWith({
String profileName,
}) {
return _copyWith(profileName: profileName)
}
Profile _copyWith({
String profileName,
List<Id> inSyncBikeIds,
}) {
return Profile._(
profileName: profileName ?? this.profileName,
inSyncBikeIds: inSyncBikeIds ?? _inSyncBikeIds);
}
}
What I understand from Scala so far, is that for every class a copy method is automatically created. In order to be able to change a field using the copy method, it needs to be part of the constructor.
I want the field _inSyncBikeIds to be final (val in Scala). In order to change the value of the field _inSyncBikeIds I need to create a copy of the object. But in order to use the copy method, to change the field, it needs to be part of the constructor of the class, like class Profile(private val _inSyncBikeIds, val profileName). But this would then break encapsulation, because everyone can create an object and initialize _inSyncBikeIds. In my case, _inSyncBikeIds should always be an empty list after initialization.
Three questions:
How do I solve this in Scala?
When I use the copy method inside the class, can I change private fields using the copy method?
Does the copy method in Scala copy private fields as well (even when they are not part of the constructor, you can't mutate that private field then of course)?

Scala comes from a tradition that tends to view immutable data as a license for free sharing (thus public by default etc.). The interpretation of encapsulation is more that code outside an object not be able to directly mutate data: immutable data regardless of visibility satisfies this.
It's possible to suppress the auto-generated copy method for a case class by making it abstract (nearly always sealed abstract with a private constructor). This is commonly used to make the apply/copy methods return a different type (e.g. something which encodes a validation failure as a value without throwing an exception (as require would)), but it can be used for your purpose
sealed abstract case class Profile private(private val _inSyncBikeIds: List[String], profileName: String) {
def addBike(bikeId: String): Profile = Profile.unsafeApply(bikeId :: _inSyncBikeIds, profileName)
// Might consider using a Set...
def isInSync(bikeId: String): Boolean = _inSyncBikeIds.contains(bikeId)
def copy(profileName: String = profileName): Profile = Profile.unsafeApply(_inSyncBikeIds, profileName)
}
object Profile {
def apply(profileName: String): Profile = unsafeApply(Nil, profileName)
private[Profile] def apply(_inSyncBikeIds: List[String], profileName: String): Profile = new Profile(_inSyncBikeIds, profileName) {}
}
unsafeApply is more common for the validation as value use-case, but the main purpose it serves is to limit the concrete implementations of the abstract Profile to only that anonymous implementation; this monomorphism has beneficial implications for runtime performance.
Notes: case classes are Serializable, so there is a Java serialization hole: in application code this is solvable by never ever using Java serialization because it's broken, but it makes up for being broken by being completely evil (i.e. if you have a Scala application that uses Java serialization, you should probably re-evaluate the choices that led you there).
There's no way to encode sealedness in JVM bytecode AFAIK (Scala uses an annotation, IIRC, so Scala will limit extension of Profile to that compilation unit but, e.g, Kotlin won't), nor is the private[Profile] access control encoded in a way that JVM languages which aren't Scala will enforce (the unsafeApply method is actually public in the bytecode). Again, in application code, the obvious question is "why are you trying to use this from Java/Kotlin/Clojure/...?". In a library, you might have to do something hacky like throw an exception, catch it and inspect the top frames of the stack, throwing again if it's not hunky-dory.

I have no idea if it is possible in dart, but in scala this would be done with a private constructor:
class Profile private (val _foo: Seq[String], val bar: String) {
def this(bar: String) = this(Nil, bar)
}
This lets you define
private copy(foo: Seq[String], bar: String) = new Profile(foo, bar)
This is fine as long the class is final. If you subclass it, badness ensues: Child.copy() returns an instance of Parent, unless you override copy in every subclass, but there is no good way to enforce it (scala 3 admittedly has some improvement over this).
The generated copy method you mentioned only works for case classes. But subclassing a case class would lead to some even more interesting results.
This is really rarely useful though. Looking at your code for instance, if I read the ask correctly, you want the user to not be able to do
Profile(List("foo"), "bar") but Profile("bar").synchronize("foo") is still possible even though it produces exactly the same result. This hardly seems useful.

Related

Scala Class that containing List

I have a very basic and simple Scala question. For example, I have a java class like that
class Dataset{
private List<Record> records;
Dataset(){
records = new ArrayList<Record>()
}
public void addItem(Record r){
records.add(r)
}
}
When I try to write same class in Scala, I encoutered with some error:
class RecordSet() {
private var dataset:List[Record]
def this(){
dataset = new List[Record]
}
def addRecord(rd: Record)={
dataset :+ rd
}
}
I cannot declare a List variable like ( private var dataset:List[Record])
and cannot write a default constructor.

Here is how you will replicate the Java code you mentioned in your question:
// defining Record so the code below compiles
case class Record()
// Here is the Scala implementation
class RecordSet(private var dataset:List[Record]) {
def addRecord(rd: Record)={
dataset :+ rd
}
}
Some explanation:
In Scala, when you define a class, you have the ability to pass parameter to the class definition. eg: class Foo(num:Int, descr:String) Scala would automatically use the given parameter to create a primary constructor for you. So you can now instantiate the Foo, like so new Foo(1, "One"). This is different in Java where you have to explicitly define parameter accepting constructors.
You have to be aware that the parameter passed do not automatically become instance member of the class. Although if you want, you can tell Scala to make them instance member. There are various ways to do this, one way is to prefix the parameter with either var or val. For example class Foo(val num:Int, val descr:String) or class Foo(var num:Int, var descr:String). The difference is that with val, the instance variable are immutable. With var they are mutable.
Also, by default the instance member Scala will generate would be public. That means they can be accessed directly from an instance of the object. For example:
val foo = new Foo(1, "One")
println(foo.num) // prints 1.
If you want them to be private, you add private keyword to the definition. So that would become:
class Foo(private var num:Int, private var desc:String)
The reason why your code fails to compile is you define a method called this() which is used to create multiple constructors. (and not to create a constructor that initiates a private field which is your intention judging from the Java code you shared). You can google for multiple constructors or auxiliary constructors to learn more about this.

As dade told the issue in your code is that with this keyword you are actually creating an auxilary constructor which has some limitations like the first line of your auxilary constructor must be another constructor (auxilary/primary). Hence you cannot use such a way to create a class.
Also you can not write such lines in a scala concrete class private var dataset:List[Record] as it is considered as abstract (no definition provided).
Now with the code. Usually in Scala we don't prefer mutability because it introduces side-effects in our functions (which is not the functional way but as scala is not purely functional you can use mutability too).
In Scala way, the code should be something like this:
class RecordSet(private val dataset:List[Record]) {
def addRecord(rd: Record): RecordSet ={
new RecordSet(dataset :+ rd)
}
}
Now with the above class there is no mutability. Whenever you are adding on an element to the dataset a new instance of RecordSet is being created. Hence no mutability.
However, if you have to use the same class reference in your application use your a mutable collection for your dataset like below:
class RecordSet(private val dataset:ListBuffer[Record]) {
def addRecord(rd: Record): ListBuffer[Record] ={
dataset += rd
}
}
Above code will append the new record in the existing dataset with the same class reference.

How to partially abstract a method in Scala

This question is related to this one.
I have a family of classes of type Config and all of them are built using a builder pattern. Therefore, I have also ConfigBuilder classes that form a hierarchy as well, since many implementation share the same behaviour.
What I want to achieve is that ConfigBuilder expose a method build which always perform these steps: validate the parameters (throwing an exception if not valid) and build the Config. Of course I would like to do this with the least possible duplication of code. In fact, the build method can be ideally split in two parts: a common build of the parameters shared by all implementation of Config and a implementation-specific build for each Config.
This is an example of the superclasses
abstract class Config {
def name: String
def query: String
def sourceTable: String
}
abstract class ConfigBuilder {
// common variables are set through setters by the user, which finally calls build
def build = {
validate
val sourceTable = extractFrom(query) // private local method
// it will contain more fields, extracted from the ones set by the user
buildInternal(sourceTable)
}
def validate = {
if(name == null) throw new Exception() // and all common checks
}
abstract def buildInternal(s:String): Config
}
And this is an implementation
case class Factlog private (
name:String, query:String, sourceTable:String, description: String)
class FactlogBuilder extends ConfigBuilder {
// description is set through setters by the user
def validate = {
super.validate()
if(description == null) throw new Exception()
}
def buildInternal(s:String) =
Factlog(name,query,s,description)
}
This snippet of code works but I would like to understand if this is the best way to do implement the build and buildInternal method.
With this approach, buildInternal signature will change with any new Factlog specific parameter, so the solution will be to place the computation of sourceTable in ConfigBuilder, outside the method build
If I do this, I am forced to generate sourceTable before the call to validate
As last approach, I could instantiate the variable outside as var sourceTable = _ and then, after validate method call, give it the value returned by the method extractQuery
I am tempted to use the approach 3, but I assume this is not really how Scala should be used. I am sure there are better approaches to compose these hierarchies.
P.S. the list of parameters will surely grow over time, so this is something that I have to consider. Moreover, the builder pattern usage is facilitated by a Spark feature that at the moment I cannot avoid to use.

Scala mutator naming conventions and requirements

I am having some trouble understand the naming in Scala with respect to mutators. Here is the part that I am having trouble understanding:
class Company {
private val _name: String = _
def name = _name
def name_=(name: String) {
_name = name
}
}
So I understand that the _name is the private String, and the first def name is the getter/accessor while the second is the setter/mutator. Essentially, I understand what the code means and does, but I am not sure what is personal preference vs code standards/the required way to do it. Will all mutators have the _ suffix and is it standard to prefix private attributes with an underscore or is that personal preference?
Or can I just define the mutator as the following?
def name=(name: String) {
_name = name
}
Similarly, do I have the prefix the private val with the underscore or could I just change it to:
def name=(name: String) {
name = name
}
I got the above code from Scala Naming Conventions and Daniel Spewak's Accessors/Mutators

These are all good questions. Some of them are covered by the article from the Scala documentation you've liked, specifically in the part about Accessors/Mutators.
Naming of mutator methods
In a nutshell, the name_= form is not special syntax but rather a naming convention enforced by the specification.
Let's look at what scalac produces at the JVM bytecode level when you declare a plain old var. Understanding the bytecode produced by scalac is by no means necessary to understand the higher-level workings of the language, but many people learning Scala have some experience with Java and, though that, some intuition of what is possible in the JVM and what isn't. I think this is the closest we can get to as the why of some of the decisions that were made in the specification of Scala.
Here I have a source file called Var.scala:
trait Var {
var name: String
}
I compile the source to a class file and decompile it using javap:
$ scalac Var.scala
$ javap Var.class
Compiled from "Var.scala"
public interface Var {
public abstract java.lang.String name();
public abstract void name_$eq(java.lang.String);
}
As we can see, a var in a trait declares two JVM methods, a getter, and a setter. The getter takes the name of the var, while the setter has a bit of a weird name. Here, $eq is just how an = in a Scala identifier is encoded by scalac in the class file it produces. This is part of the Scala specification and is required for binary compatibility between different compilation units.
So the name of the setter as seen from Scala is simply name_=. This is also part of the Scala specification. When we write a statement that sets the value of a var, a call to a method with a name of that form is generated. When we write a statement that just reads the var, a call to the first method is generated.
Instead of declaring a var, we could just as well declare those two methods directly:
trait ValAndDef {
val name: String
def name_=(newName: String): Unit
}
Compiling and decompiling this will show the exact same methods as before. There is also nothing preventing you from declaring only one of those methods, which would create a member which can only be read but not written, or vice versa.
Naming of the private packing field
Until now, I've only talked about declaring a field. Implementing the field means also adding storage to the var or implementing the methods when declared directly. Declaring a var in a class instead of a trait will automatically add a field for storing the value of the var:
class VarClass {
var name: String = _
}
$ javap -private VarClass.class
Compiled from "VarClass.scala"
public class VarClass {
private java.lang.String name;
public java.lang.String name();
public void name_$eq(java.lang.String);
public VarClass();
}
If you decide to implement the field using a pair of methods, you will have to declare such a private field yourself. The Scala specification does not say anything about how that field should be named in that case (it is private and thus not part of any interoperability concern).
The only thing "official" I can find is a paragraph in the article I linked at the top, which advocates the _name naming pattern for the backing field, while also stating:
While Hungarian notation is terribly ugly, it does have the advantage of disambiguating the _name variable without cluttering the identifier.
So it is up to you whether you want to follow that guidance or not. ¯\_(ツ)_/¯

Is there any advantage to definining a val over a def in a trait?

In Scala, a val can override a def, but a def cannot override a val.
So, is there an advantage to declaring a trait e.g. like this:
trait Resource {
val id: String
}
rather than this?
trait Resource {
def id: String
}
The follow-up question is: how does the compiler treat calling vals and defs differently in practice and what kind of optimizations does it actually do with vals? The compiler insists on the fact that vals are stable — what does in mean in practice for the compiler? Suppose the subclass is actually implementing id with a val. Is there a penalty for having it specified as a def in the trait?
If my code itself does not require stability of the id member, can it be considered good practice to always use defs in these cases and to switch to vals only when a performance bottleneck has been identified here — however unlikely this may be?

Short answer:
As far as I can tell, the values are always accessed through the accessor method. Using def defines a simple method, which returns the value. Using val defines a private [*] final field, with an accessor method. So in terms of access, there is very little difference between the two. The difference is conceptual, def gets reevaluated each time, and val is only evaluated once. This can obviously have an impact on performance.
[*] Java private
Long answer:
Let's take the following example:
trait ResourceDef {
def id: String = "5"
}
trait ResourceVal {
val id: String = "5"
}
The ResourceDef & ResourceVal produce the same code, ignoring initializers:
public interface ResourceVal extends ScalaObject {
volatile void foo$ResourceVal$_setter_$id_$eq(String s);
String id();
}
public interface ResourceDef extends ScalaObject {
String id();
}
For the subsidiary classes produced (which contain the implementation of the methods), the ResourceDef produces is as you would expect, noting that the method is static:
public abstract class ResourceDef$class {
public static String id(ResourceDef $this) {
return "5";
}
public static void $init$(ResourceDef resourcedef) {}
}
and for the val, we simply call the initialiser in the containing class
public abstract class ResourceVal$class {
public static void $init$(ResourceVal $this) {
$this.foo$ResourceVal$_setter_$id_$eq("5");
}
}
When we start extending:
class ResourceDefClass extends ResourceDef {
override def id: String = "6"
}
class ResourceValClass extends ResourceVal {
override val id: String = "6"
def foobar() = id
}
class ResourceNoneClass extends ResourceDef
Where we override, we get a method in the class which just does what you expect. The def is simple method:
public class ResourceDefClass implements ResourceDef, ScalaObject {
public String id() {
return "6";
}
}
and the val defines a private field and accessor method:
public class ResourceValClass implements ResourceVal, ScalaObject {
public String id() {
return id;
}
private final String id = "6";
public String foobar() {
return id();
}
}
Note that even foobar() doesn't use the field id, but uses the accessor method.
And finally, if we don't override, then we get a method which calls the static method in the trait auxiliary class:
public class ResourceNoneClass implements ResourceDef, ScalaObject {
public volatile String id() {
return ResourceDef$class.id(this);
}
}
I've cut out the constructors in these examples.
So, the accessor method is always used. I assume this is to avoid complications when extending multiple traits which could implement the same methods. It gets complicated really quickly.
Even longer answer:
Josh Suereth did a very interesting talk on Binary Resilience at Scala Days 2012, which covers the background to this question. The abstract for this is:
This talk focuses on binary compatibility on the JVM and what it means
to be binary compatible. An outline of the machinations of binary
incompatibility in Scala are described in depth, followed by a set of rules and guidelines that will help developers ensure their own
library releases are both binary compatible and binary resilient.
In particular, this talk looks at:
Traits and binary compatibility
Java Serialization and anonymous classes
The hidden creations of lazy vals
Developing code that is binary resilient

The difference is mainly that you can implement/override a def with a val but not the other way around. Moreover val are evaluated only once and def are evaluated every time they are used, using def in the abstract definition will give the code who mixes the trait more freedom about how to handle and/or optimize the implementation. So my point is use defs whenever there isn't a clear good reason to force a val.

A val expression is evaluated once on variable declaration, it is strict and immutable.
A def is re-evaluated each time you call it

def is evaluated by name and val by value. This means more or less that val must always return an actual value, while def is more like a promess that you can get a value when evaluating it. For example, if you have a function
def trace(s: => String ) { if (level == "trace") println s } // note the => in parameter definition
that logs an event only if the log level is set to trace and you want to log an objects toString. If you have overriden toString with a value, then you need to pass that value to the trace function. If toString however is a def, it will only be evaluated once it's sure that the log level is trace, which could save you some overhead.
def gives you more flexibility, while val is potentially faster
Compilerwise, traits are compiled to java interfaces so when defining a member on a trait, it makes no difference if its a var or def. The difference in performance would depend on how you choose to implement it.

Can I define a class with no public constructor and place a factory method for this class objects in a different class in Scala?

For example (maybe a bit clumsy from a real life view, but just to illustrate):
"User" is a case class containing user name and id. Id can be never set manually, and a User class instance with no id set has no sense.
A UserBase class maintains users base and has a "getUser (name : String) : User" method returning a consistent User instance.
No one other than a UserBase object can know (well, someone can, but really shouldn't rely on this knowledge) a user's id, so constructing a User instance manually makes no sense (and can cause errors in future if someone accidentally hardcodes this and forgets). Moreover, having an orphan User instance not tracked by a UserBase is also undesired.
So the task is to make calling UserBase.getUser the only way to get a User instance.
Can this be implemented in Scala?

You have to put the classes in the same package or make them part of the same class or object. Then:
object O {
class C private[O] (val x: Int) { }
object D { def apply(i:Int) = new C(i) }
def getC(i:Int) = new C(i)
}
scala> O.D(5)
res0: O.C = O$C#5fa6fb3e
scala> new O.C(5)
<console>:10: error: constructor C cannot be accessed in object $iw
new O.C(5)
scala> O.getC(5)
res1: O.C = O$C#127208e4

A case class automatically gets several features, including a companion object with an apply() method for constructing instances of the class. This is why you don't need "new" with case classes. If you try to make an explicit companion with apply() you will get error: method apply is defined twice
If you want to make your own factory method then you should not use case classes. You can still have all of the same features (toString, apply, unapply, etc) but you will have to implement them manually and to your own specification.

You don't actually clarify what a 'base' is in this context, but given your description it sounds like it's really nothing more than a factory for users.
The usual place to put a factory for a class is in the companion object (This is how case classes do it, but the technique isn't restricted to just case classes)
class User private(val id: Int, val name: String) {
...
}
object User {
private def nextId() : Int = ...
def apply(name: String) = new User(nextId(), name)
}
//now create one:
val u = User("Ivan")
Of course, if the User object is immutable (highly recommended), then there's very little reason to hide the id member. You're probably also going to want a (restricted) method to construct a User with a specified ID, mostly for reasons of unit testing.
Working with companions like this, it's also unlikely that you'll still need a distinct UserBase factory. Having your factory named the same as the instances it produces will result in cleaner code.