I am trying to create a case class given the schema of another case class, so that given:
case class givenClass(given1: Type1, given2:Type1)
I could create a new case class:
case class newClass(given1:Type1, given2:Type2, new3:Type3...)
One direction I have tried to follow is inferring the schema of givenClass using Scala Reflection and then creating a new case class given that schema, but I am having trouble finding a good way to do it.
Related
I have always seen that, when we are using a map function, we can create a dataframe from rdd using case class like below:-
case class filematches(
row_num:Long,
matches:Long,
non_matches:Long,
non_match_column_desc:Array[String]
)
newrdd1.map(x=> filematches(x._1,x._2,x._3,x._4)).toDF()
This works great as we all know!!
I was wondering , why we specifically need case classes here?
We should be able to achieve same effect using normal classes with parameterized constructors (as they will be vals and not private):-
class filematches1(
val row_num:Long,
val matches:Long,
val non_matches:Long,
val non_match_column_desc:Array[String]
)
newrdd1.map(x=> new filematches1(x._1,x._2,x._3,x._4)).toDF
Here , I am using new keyword to instantiate the class.
Running above has given me the error:-
error: value toDF is not a member of org.apache.spark.rdd.RDD[filematches1]
I am sure I am missing some key concept on case classes vs regular classes here but not able to find it yet.
To resolve error of
value toDF is not a member of org.apache.spark.rdd.RDD[...]
You should move your case class definition out of function where you are using it. You can refer http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/td-p/29878 for mode detail.
On your Other query - case classes are syntactic sugar and they provide following additional things
Case classes are different from general classes. They are specially used when creating immutable objects.
They have default apply function which is used as constructor to create object. (so Lesser code)
All the variables in case class are by default val type. Hence immutable. which is a good thing in spark world as all red are immutable
example for case class is
case class Book( name : string)
val book1 = Book("test")
you cannot change value of book1.name as it is immutable. and you do not need to say new Book() to create object here.
The class variables are public by default. so you don't need setter and getters.
Moreover while comparing two objects of case classes, their structure is compared instead of references.
Edit : Spark Uses Following class to Infer Schema
Code Link :
https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
If you check. in schemaFor function (Line 719 to 791). It converts Scala types to catalyst types. I this the case to handle non case classes for schema inference is not added yet. so the every time you try to use non case class with infer schema. It goes to other option and hence gives error of Schema for type $other is not supported.
Hope this helps
In the activator template for simple rest API project in Scala the Book.scala file looks like the following.
package models
import play.api.libs.json.Json
object Book {
case class Book(name: String, author: String)
implicit val bookWrites = Json.writes[Book]
implicit val bookReads = Json.reads[Book]
var books = List(Book("TAOCP", "Knuth"), Book("SICP", "Sussman, Abelson"))
def addBook(b: Book) = books = books ::: List(b)
}
Why is there a Book object and a Book case class inside it? Why not just a Book case class (or just a Book class and not a case class)? What advantages/disadvantages are there in the above structure?
I'm sure this is just a small example that somebody put together, and so you shouldn't read too much into it. But it exhibits what some consider an anti-pattern: nesting case classes in other classes. Some best-practices guides, such as this one, suggest avoiding nesting case classes in other classes, and for good reason:
It is tempting, but you should almost never define nested case classes
inside another object/class because it messes with Java's
serialization. The reason is that when you serialize a case class it
closes over the "this" pointer and serializes the whole object, which
if you are putting in your App object means for every instance of a
case class you serialize the whole world.
And the thing with case classes specifically is that:
one expects a case class to be immutable (a value, a fact) and hence
one expects a case class to be easily serializable
Prefer flat hierarchies.
For example, this small program throws an exception, somewhat unexpectedly:
import java.io._
class Outer {
case class Inner(a: Int)
}
object Test extends App {
val inner = (new Outer).Inner(1)
val oos = new ObjectOutputStream(new FileOutputStream("/tmp/test"))
oos.writeObject(inner)
oos.close
}
If the only purpose of this outer Book object is to group together common functionality, a package would be the preferred structure.
Furthermore, even if an object were desired for some other reason, naming that object the same as the inner case class is confusing, especially since case classes automatically generate companion objects. So in this example there is a Book object, a Book.Book case class, and therefore also a Book.Book companion object.
The role of Book object in this code is more like a static book utils/manager class which hold a list of books. You can imagine that this is a Library class, which allow to add books.
The Book case class is just an anonymous class for Book instances. As m-z said, it is just an example, for more complicated class, you could move it to a standalone Book class.
I am trying to model a DSL in Scala. (I am very new to Scala so I might be missing something trivial, in which case apologies).
The DSL supports a very simple type system, where entities called 'Terms' can have a type, which either extends Object by default, or can extend other types, which in their own right eventually extend another type or Object.
I am trying to model this type hierarchy in Scala using a case class:
case class TermType(name: String, superType: TermType)
However, I want to be able to support a 'default' case (the one where the type just extends 'Object'), without having to specify the super type, so something of the sort:
//the following does not work, just illustrating what I want to achieve
case class TermType(name: String, superType: TermType = new TermType("Object", ???))
Not sure if it is the right approach. I wish to avoid putting nulls or stuff like that. I don't know if going the Option way is in some way better (if it works at all).
How is it best to go about it?
for example:
sealed abstract class TermType
case class TermTypeSimple(name: String) extends TermType
case class TermTypeWithParen(name: String, parent: TermType) extends TermType
Other way:
case class TermType(name: String, superType: Option[TermType] = None)
With usages:
TermType("Hi")
TermType("Buy", Some(TermType("Beer"))
I have to work with Lift's Mapper (I know there might be better ORMs for Scala, but this is not something I have the power to change right now). Typically, with Mapper, a table is defined this way:
package com.sample.model
import net.liftweb.mapper._
class Table extends LongKeyedMapper[Table] with IdPK {
def getSingleton = Table
object other_table_id extends MappedLongForeignKey(this, OtherTable)
object date_field extends MappedDate(this)
object string_field extends MappedString(this, 255)
def toCaseClass = ...
}
object Table extends Table with LongKeyedMetaMapper[Table]
Now I'd like to define a case class for Table to manipulate the records more easily, as Mapper is not very "Scala-idiomatic", not very type-safe, and definitely not immutable. My case class would look like this:
case class TableCC(id: Long, otherTableId: Long, dateField: Option[Date], ...) {
def toMapper = ...
}
How should I name the case class and where should I put it?
In com.sample.model with a different name (TableCC or TableCaseClass)?
In a different packagge (e.g. com.sample.model.caseclass) with the same name (Table)?
In the Table object?
...?
First off, I know you said that you cannot use another ORM but there are case class based ORM's for scala which work very well. (Slick works basically like this, and so do a few others)
Personally if you are going to going to use this case class as the main way to manipulate the domain I would put them all in there own package. That way if you are manuliplating the domain, you can do a
import com.sample.model.caseclass._
A few points to note:
Do not name it the same thing as the case class, this becomes confusing and I guarantee at some point you will want to use both the classes in the same file in which case you will have to alias one of them.
I would use naming like Table and TableCC. This way in your IDE you can easily find one or the other.
If you have a Scala case class that looks like:
case class Fnord(x: Parcelable, y: Parcelable)
Where Parcelable doesn't extend Serializable, but does have a way to return a ByteArray of itself (something that does implement Serializable).
How to do you make the case class Serializable? Is there a way to do custom (de)serialization on particular fields?
(I'm solving it right now by creating a simple wrapper class that is serializable and using that in the case classes, but it'd be nice to skip that step.)
Take a look at java.io.Externalizable. It allows a class to manually serialize and restore its fields.