Scala immutability in persistent storage with Squeryl - scala

So as I am reading the Play for Scala book, I came across something odd that was explained in the book. This is the relevant snippet:
There's something strange going on, though. If you're using immutable
classes—which vanilla case classes are—you might be worried when you
discover that Squeryl updates your object's supposedly immutable id
field when you insert the object. That means that if you execute the
following code,
val myImmutableObject = Product(0, 5010255079763,
"plastic coated blue", "standard paperclip, coated with blue plastic")
Database.productsTable.insert(myImmutableObject)
println(myImmutableObject)
the output will unexpectedly be something like: Product(13,
5010255079763, "plastic coated blue", "standard paperclip, coated with
blue plastic"). This can lead to bad situations if the rest of your
code expects an instance of one of your model classes to never change.
In order to protect yourself from this sort of stuff, we recommend you
change the insert methods we showed you earlier into this:
def insert(product: Product): Product = inTransaction {
val defensiveCopy = product.copy
productsTable.insert(defensiveCopy)
}
My question is, given that the product class is defined like this:
import org.squeryl.KeyedEntity
case class Product(
id: Long,
ean: Long,
name: String,
description: String) extends KeyedEntity[Long]
Database object is defined like this:
import org.squeryl.Schema
import org.squeryl.PrimitiveTypeMode._
object Database extends Schema {
val productsTable = table[Product]("products")
...
on(productsTable) { p => declare {
p.id is(autoIncremented)
}}
}
How then is it possible that a case class declared as val can have one of its fields changed? Is Squeryl using reflection of some sort to change the field or is the book mistaken somehow?
I am not able to run the examples to verify what the case might be, but someone who has used Squeryl can perhaps give an answer?

You can check the definition of table method for yourself:
https://github.com/squeryl/squeryl/blob/master/src/main/scala/org/squeryl/Schema.scala#L345
It's a generic function which does use reflection to instantiate the Table object bound to the given case class. Functions are first-class citizens in Scala, so they can be assigned to a val just like anything else.
The last fragment is also an asynchronous function, which maps a given argument to some modification defined for it.

Related

Creating Spark Dataframes from regular classes

I have always seen that, when we are using a map function, we can create a dataframe from rdd using case class like below:-
case class filematches(
row_num:Long,
matches:Long,
non_matches:Long,
non_match_column_desc:Array[String]
)
newrdd1.map(x=> filematches(x._1,x._2,x._3,x._4)).toDF()
This works great as we all know!!
I was wondering , why we specifically need case classes here?
We should be able to achieve same effect using normal classes with parameterized constructors (as they will be vals and not private):-
class filematches1(
val row_num:Long,
val matches:Long,
val non_matches:Long,
val non_match_column_desc:Array[String]
)
newrdd1.map(x=> new filematches1(x._1,x._2,x._3,x._4)).toDF
Here , I am using new keyword to instantiate the class.
Running above has given me the error:-
error: value toDF is not a member of org.apache.spark.rdd.RDD[filematches1]
I am sure I am missing some key concept on case classes vs regular classes here but not able to find it yet.
To resolve error of
value toDF is not a member of org.apache.spark.rdd.RDD[...]
You should move your case class definition out of function where you are using it. You can refer http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/td-p/29878 for mode detail.
On your Other query - case classes are syntactic sugar and they provide following additional things
Case classes are different from general classes. They are specially used when creating immutable objects.
They have default apply function which is used as constructor to create object. (so Lesser code)
All the variables in case class are by default val type. Hence immutable. which is a good thing in spark world as all red are immutable
example for case class is
case class Book( name : string)
val book1 = Book("test")
you cannot change value of book1.name as it is immutable. and you do not need to say new Book() to create object here.
The class variables are public by default. so you don't need setter and getters.
Moreover while comparing two objects of case classes, their structure is compared instead of references.
Edit : Spark Uses Following class to Infer Schema
Code Link :
https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
If you check. in schemaFor function (Line 719 to 791). It converts Scala types to catalyst types. I this the case to handle non case classes for schema inference is not added yet. so the every time you try to use non case class with infer schema. It goes to other option and hence gives error of Schema for type $other is not supported.
Hope this helps

Case class Members not accessible via Companion Object

Lets start off with a story
I am in contact with a fictious club namd Foo where people have registered for a party. The manager has asked me to maintain separate list of member names and member ids. The manager came to me one day and told about the problem. I as a developer playing around with scala gave him a solution like the aforementioned.
I wanted to maintain a class which will contain the name of the members and their ids. The code was very crude in nature and I absolutely had performed no validations
STEP 1: I created a class called NonEmptyFoo
case class NonEmptyFoo(bar1:String,bar2:String)
while drinking beer in the club, I found few people registering
STEP 2: Adding users (username,userid) both are String objects
val member1=NonEmptyFoo("member1","Foo_member1")
val member2=NonEmptyFoo("member2","Foo_member2")
val member3=NonEmptyFoo("member3","Foo_member3")
many people registered later, which was really good for the manager
STEP 3: Creating separate lists for username and userid
val (memName,memId)=clubMembers map(x=>(x.bar1,x.bar2)) unzip
Whoooah! my solution seemed to work unless one day the manager came and told me;
"Man! people are so excited, that at times, they are registering without even entering the data; I am not able to get the correct list of member names and their ids as the list have many empty fields. This got on my nerves.
I assured the manager of the club to provide him with a robust solution the next day.
Two things came to my mind
1. If I write the code from scratch I need to remove the old code and the new code may not be readable
2. I need to keep the code readable and make it in a way such that the new developer can find it easy to maintain the new code and add/remove features from it
At night I thought to add abstractions to it. I added the following code
STEP 1 Created a trait Foo with NonEmptyFoo for users with valid entry and EmptyFoo for people with invalid entries
trait Foo
case class EmptyFoo() extends Foo
case class NonEmptyFoo(bar1:String,bar2:String) extends Foo
object Foo{
def apply(bar1:String,bar2:String)= (bar1,bar2) match{
case (x,y)=>if(x.isEmpty||y.isEmpty) EmptyFoo() else NonEmptyFoo(x,y)
case _=> EmptyFoo()
}
}
The above code has benefits. First off I am able to validate users with valid entry and users with invalid entry. Secondly I am abstracting out the empty and non empty Foo to only Foo such that the new methods can easily be added into Foo and implemented thereby in the child classes which as a result hides the inner implementation.
As the user enters a valid data as under
val mamber1=Foo("member1","Foo_member1")
The output on the worksheet is being shown as
member1: Product with Serializable with Foo = NonEmptyFoo(member1,Foo_member1)
and, when someone misses to enter one of the fields as given below
val member2=Foo("member2","")
The output on the worksheet is being shown as
member2: Product with Serializable with Foo = EmptyFoo()
Interesting!! it works...
Please be little patient;
I am able to perform an abstraction and able to come out with a minimal yet good solution. But, I was facing real problem when I wrote the code as follows
val member1=Foo("member1","Foo_member1")
val member2=Foo("member2","Foo_member2")
val member3=Foo("member3","Foo_member3")
val clubMembers=List(member1,member2,member3)
//heres where the problem occurs
val (memName,memId)=clubMembers map(x=>(x.bar1,x.bar2)) unzip
Whooah! I got stuck here, because the compiler, in the lambda expression given below
x=>(x.bar1,x.bar2)
was not able to recognize bar1 and bar2.
But, this solution worked well;
val member1=NonEmptyFoo("member1","Foo_member1")
val member2=NonEmptyFoo("member2","Foo_member2")
val member3=NonEmptyFoo("member3","Foo_member3")
val clubMembers=List(member1,member2,member3)
//heres where the problem occurs
val (memName,memId)=clubMembers map(x=>(x.bar1,x.bar2)) unzip
Apparently, the solution violates abstraction since I am explicitly using the inner class names whereas I should have made use of the name of the trait from where based upon the validation, the compiler shall infer the need for instatiation of the objects as per the inner classes.
Could you please suggest what could be wrong with the new solution.
The reasons due to which the compiler did not recognize bar1 and bar2 will be highly appreciated. Yes alternate solutions do exist yet I would love to be suggested with the problems existing solution and that may be followed by an alternate solution which completely upon your discretion.
Thanks in advance for help!
Inferred type of clubMembers is List[Foo] (even though concrete elements are instances of NonEmptyFoo) and since Foo doesn't have bar1 and bar2 fields, you can't access them in your map invocation. One possible solution would be to add bar1 and bar2 to the Foo:
sealed abstract class Foo(val bar1: String, val bar2: String)
case object EmptyFoo extends Foo("", "")
case class NonEmptyFoo(override val bar1: String, override val bar2: String) extends Foo(bar1, bar2)
object Foo {
def apply(bar1: String, bar2: String): Foo = {
if (bar1.isEmpty || bar2.isEmpty) EmptyFoo else NonEmptyFoo(bar1, bar2)
}
}

Copying almost identical objects in Scala (aka. How to package models for REST)?

I have several objects that closely (but not perfectly) mirror other objects in Scala. For example, I have a PackagedPerson that has all of the same fields as the PersonModel object, plus some. (The PackagedPerson adds in several fields from other entities, things that are not on the PersonModel object).
Generally, the PackagedPerson is used for transmitting a "package" of person-related things over REST, or receiving changes back (again over REST).
When preparing these transactions, I have a pack method, such as:
def pack(p: PersonModel): PackagedPerson
After all the preamble is out of the way (for instance, loading optional, extra objects that will be included in the package), I create a PackagedPerson from the PersonModel and "everything else:"
new PackagedPerson(p.id, p.name, // these (and more) from the model object
x.profilePicture, y.etc // these from elsewhere
)
In many cases, the model object has quite a few fields. My question is, how can I minimize repetitive code.
In a way it's like unapply and apply except that there are "extra" parameters, so what I really want is something like this:
new PackagePerson(p.unapply(), x.profilePicture, y.etc)
But obviously that won't work. Any ideas? What other approaches have you taken for this? I very much want to keep my REST-compatible "transport objects" separate from the model objects. Sometimes this "packaging" is not necessary, but sometimes there is too much delta between what goes over the wire, and what gets stored in the database. Trying to use a single object for both gets messy fast.
You could use LabelledGeneric from shapeless.
You can convert between a case class and its a generic representation.
case class Person(id: Int, name: String)
case class PackagedPerson(id: Int, name: String, age: Int)
def packagePerson(person: Person, age: Int) : PackagedPerson = {
val personGen = LabelledGeneric[Person]
val packPersonGen = LabelledGeneric[PackagedPerson]
// turn a Person into a generic representation
val rec = personGen.to(person)
// add the age field to the record
// and turn the updated record into a PackagedPerson
packPersonGen.from(rec + ('age ->> age))
}
Probably the order of the fields of your two case classes won't correspond as nice as my simple example. If this is the case shapeless can reorder your fields using Align. Look at this brilliant answer on another question.
You can try Java/Scala reflection. Create a method that accepts a person model, all other models and model-free parameters:
def pack(p: PersonModel, others: Seq[Model], freeParams: (String, Any)*): PackedPerson
In the method, you reflectively obtain PackedPerson's constructor, see what arguments go there. Then you (reflectively) iterate over the fields of PersonModel, other models and free args: if there's a field the name and type of which are same as one of the cunstructor params, you save it. Then you invoke the PackedPerson constructor reflectively using saved args.
Keep in mind though, that a case class can contain only up to 22 constructor params.

Scala code generation with annotations + macros or external script?

I want to know:
Can Scala annotations/transforms implement the code generation below? (objective)
What are the trade-offs as opposed to source code generation with an external tool? (objective)
Is there a better way / how would you do it? (subjective)
Background:
I'm designing an embedded database for Scala + Java as a side project. Although I want it to be usable from Java, I'm writing it in Scala for the extra flexibility/power and because writing Java code kills my soul.
I'm working out what I want model definitions to look like. My initial idea was to parse some model definition files and generate Scala classes. But I think now I'd like them to be definable in Scala code so that no parsing is required and people can bring the full power of Scala to bear in defining and customizing the models (e.g. custom column types.) That's a valuable feature from my Python/Django experience.
So far I have something like:
#model
class Person {
val Name = StringColumn(length=32)
val BirthDate = DateColumn(optional=true)
}
#model
class Student extends Person {
val GPA = FloatColumn(propertyName="gpa")
}
#model
class Teacher extends Person {
val salary = NumericColumn()
}
Which would generate:
class Person {
val Name = StringColumn(name="name", length=32)
val BirthDate = DateColumn(name="birthDate", optional=true)
// generated accessor methods
def name = Person.Name.get(...)
def name_= (name : String) : Unit = Person.Name.set(..., name)
// etc ...
}
// static access to model metadata, e.g. Person.Name is an immutable StringColumn instance
object Person extends Person
class Student extends Person {
val GPA = DoubleColumn(name = "GPA")
def gpa = ...
def gpa_= (value : Float) = ...
}
object Student extends Student
class Teacher extends Person {
// You get the idea
}
object Teacher extends Teacher
Looking at some examples online and doing some research, it seems like AST transforms using a special #model annotation could actually generate the needed code, maybe with a little bit of help, e.g. having the user define the object as well with the model definition. Am I right that this can be done?
Some problems that occur to me with this idea:
The object will be cluttered with properties that are not useful, all it needs are the Column objects. This could be fixed by splitting the class into two classes, PersonMeta and Person extends PersonMeta with Person object extending PersonMeta only.
IDEs will probably not pick up on the generated properties, causing them to underline them with wavy lines (eww...) and making it so auto-complete for property names won't work. The code would still be compile-time checked, so it's really just an IDE gotcha (Dynamic, no doubt, has the same problem.)
Code generation using a script is more IDE friendly, but it's hacky, probably more work, especially since you have to leave custom methods and things intact. It also requires a custom build step that you have to run whenever you change a model (which means you can forget to do it.) While the IDE might not help you with macro code generation (yet, anyway) the compiler will shout at you if you get things wrong. That makes me lean towards doing it with macros + annotation.
What do you think? I'm new to Scala, I kind of doubt I've hit on the best way to define models and generate implementations for them. How would you do it better?
It's possible yeah. Macros can be unpleasant to write and debug, but they do work.
Seems like you already have your solution there
Scala IDEs tend to handle macros correctly-ish (I mean, they have to, they're part of the language and used in some pretty fundamental libraries), so I wouldn't worry about that; if anything a macro is more ide-friendly than an external codegen step because a macro will stay in sync with a user's changes.
I'd see whether you can achieve what you want in vanilla scala before resorting to a macro. Remember that your class-like things don't necessarily have to be actual case classes; take a look at Shapeless generic records for an idea of how you can represent a well-typed "row" of named values from somewhere external. I think a system that could map a structure like those records to and from SQL might end up being more principled (that is, easier to reason about) than one based on "special" case classes.

Why people define class, trait, object inside another object in Scala?

Ok, I'll explain why I ask this question. I begin to read Lift 2.2 source code these days.
It's good if you happened to read lift source code before.
In Lift, I found that, define inner class and inner trait are very heavily used.
object Menu has 2 inner traits and 4 inner classes. object Loc has 18 inner classes, 5 inner traits, 7 inner objects.
There're tons of codes write like this. I wanna to know why the author write like this.
Is it because it's the author's
personal taste or a powerful use of
language feature?
Is there any trade-off for this kind
of usage?
Before 2.8, you had to choose between packages and objects. The problem with packages is that they cannot contain methods or vals on their own. So you have to put all those inside another object, which can get awkward. Observe:
object Encrypt {
private val magicConstant = 0x12345678
def encryptInt(i: Int) = i ^ magicConstant
class EncryptIterator(ii: Iterator[Int]) extends Iterator[Int] {
def hasNext = ii.hasNext
def next = encryptInt(ii.next)
}
}
Now you can import Encrypt._ and gain access to the method encryptInt as well as the class EncryptIterator. Handy!
In contrast,
package encrypt {
object Encrypt {
private[encrypt] val magicConstant = 0x12345678
def encryptInt(i: Int) = i ^ magicConstant
}
class EncryptIterator(ii: Iterator[Int]) extends Iterator[Int] {
def hasNext = ii.hasNext
def next = Encrypt.encryptInt(ii.next)
}
}
It's not a huge difference, but it makes the user import both encrypt._ and encrypt.Encrypt._ or have to keep writing Encrypt.encryptInt over and over. Why not just use an object instead, as in the first pattern? (There's really no performance penalty, since nested classes aren't actually Java inner classes under the hood; they're just regular classes as far as the JVM knows, but with fancy names that tell you that they're nested.)
In 2.8, you can have your cake and eat it too: call the thing a package object, and the compiler will rewrite the code for you so it actually looks like the second example under the hood (except the object Encrypt is actually called package internally), but behaves like the first example in terms of namespace--the vals and defs are right there without needing an extra import.
Thus, projects that were started pre-2.8 often use objects to enclose lots of stuff as if they were a package. Post-2.8, one of the main motivations has been removed. (But just to be clear, using an object still doesn't hurt; it's more that it's conceptually misleading than that it has a negative impact on performance or whatnot.)
(P.S. Please, please don't try to actually encrypt anything that way except as an example or a joke!)
Putting classes, traits and objects in an object is sometimes required when you want to use abstract type variables, see e.g. http://programming-scala.labs.oreilly.com/ch12.html#_parameterized_types_vs_abstract_types
It can be both. Among other things, an instance of an inner class/trait has access to the variables of its parent. Inner classes have to be created with a parent instance, which is an instance of the outer type.
In other cases, it's probably just a way of grouping closely related things, as in your object example. Note that the trait LocParam is sealed, which means that all subclasses have to be in the same compile unit/file.
sblundy has a decent answer. One thing to add is that only with Scala 2.8 do you have package objects which let you group similar things in a package namespace without making a completely separate object. For that reason I will be updating my Lift Modules proposal to use a package object instead of a simple object.