Constructor with non-instance variable assistant? - scala

I have a number of classes that look like this:
class Foo(val:BasicData) extends Bar(val) {
val helper = new Helper(val)
val derived1 = helper.getDerived1Value()
val derived2 = helper.getDerived2Value()
}
...except that I don't want to hold onto an instance of "helper" beyond the end of the constructor. In Java, I'd do something like this:
public class Foo {
final Derived derived1, derived2;
public Foo(BasicData val) {
super(val);
Helper helper = new Helper(val);
derived1 = helper.getDerived1Value();
derived2 = helper.getDerived2Value();
}
}
So how do I do something like that in Scala? I'm aware of creating a helper object of the same name of the class with an apply method: I was hoping for something slightly more succinct.

You could use a block to create a temporary helper val and return a tuple, like this:
class Foo(v: BasicData) extends Bar(v) {
val (derived1, derived2) = {
val helper = new Helper(v)
(helper.getDerived1Value(), helper.getDerived2Value())
}
}

Better look at the javap output (including private members) before you conclude this has side-stepped any fields for the Tuple2 used in the intermediate pattern-matching.
As of Scala 2.8.0.RC2, this Scala code (fleshed out to compile):
class BasicData
{
def basic1: Int = 23
def basic2: String = "boo!"
}
class Helper(v: BasicData)
{
def derived1: Int = v.basic1 + 19
def derived2: String = v.basic2 * 2
}
class Bar(val v: BasicData)
class Foo(v: BasicData)
extends Bar(v)
{
val (derived1, derived2) = {
val helper = new Helper(v)
(helper.derived1, helper.derived2)
}
}
Produces this Foo class:
% javap -private Foo
public class Foo extends Bar implements scala.ScalaObject{
private final scala.Tuple2 x$1;
private final int derived1;
private final java.lang.String derived2;
public int derived1();
public java.lang.String derived2();
public Foo(BasicData);
}

Related

Is there any way to rewrite the below code using Scala value class or other concept?

I need to write two functions to get the output format and the output index for file conversion. As part of this, I wrote a TransformSettings class for these methods and set the default value. And in the transformer class, I created a new object of TransformSettings class to get the default values for each job run. Also, I have another class called ParquetTransformer that extends Transformer where I want to change these default values. So I implemented like below.
class TransformSettings{
def getOuputFormat: String = {
"orc"
}
def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
table.StorageDescriptor.SerdeInfo.Parameters.get("orc.column.index.access")
}
}
class Transformer{
def getTransformSettings: TransformSettings = {
new TransformSettings
}
def posttransform(table: AWSGlueDDL.Table):Dateframe ={
val indexAccess = getTransformSettings.getOuputIndex(table: AWSGlueDDL.Table)
........
}
}
class ParquetTransformer extends Transformer{
override def getTransformSettings: TransformSettings = {
val transformSettings = new TransformSettings {
override def getOuputFormat: String = {
"parquet"
}
override def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
table.StorageDescriptor.SerdeInfo.Parameters.get("parquet.column.index.access")
}
}
}
}
Is there a way to avoid creating a brand new object of TransformSettings in Transfomer class every time this is called?
Also is there a way to rewrite the code using Scala value class?
As #Dima proposed in the comments try to make TransformSettings a field / constructor parameter (a val) in the class Transformer and instantiate them outside
class TransformSettings{
def getOuputFormat: String = {
"orc"
}
def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
table.StorageDescriptor.SerdeInfo.Parameters.get("orc.column.index.access")
}
}
class Transformer(val transformSettings: TransformSettings) {
def posttransform(table: AWSGlueDDL.Table): DataFrame ={
val indexAccess = transformSettings.getOuputIndex(table: AWSGlueDDL.Table)
???
}
}
val parquetTransformSettings = new TransformSettings {
override def getOuputFormat: String = {
"parquet"
}
override def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
table.StorageDescriptor.SerdeInfo.Parameters.get("parquet.column.index.access")
}
}
class ParquetTransformer extends Transformer(parquetTransformSettings)
You don't seem to need value classes (... extends AnyVal) now. They are more about unboxing, not about life-cycle management. TransformSettings and Transformer can't be value classes because they are not final (you're extending them in class ParquetTransformer extends Transformer... and new TransformSettings { ... }). By the way, value classes have many limatations
https://failex.blogspot.com/2017/04/the-high-cost-of-anyval-subclasses.html
https://github.com/scala/bug/issues/12271
Besides value classes, there are scala-newtype library in Scala 2 and opaque types in Scala 3.

No implicit Ordering defined for my case class

I have a case class as follow:
case class AGG_RECON_4( var VOL_PROBE_DL_VOL:Int, var VOL_PROBE_FREE_VOL:Int, var VOL_PROBE_TOT_VOL:Int, VOL_NW_UL_VOL:Int,VOL_NW_DL_VOL:Int, VOL_NW_FREE_VOL:Int, VOL_NW_TOT_VOL:Int, VOL_CHG_UL_VOL:Int,
VOL_CHG_DL_VOL:Int, VOL_CHG_FREE_VOL:Int, VOL_CHG_TOT_VOL:Int, VOL_DXE_Session_End_Time:String, VOL_NW_Session_End_Time:String,
VOL_CHG_Session_End_Time:String, VOL_Session_Closed_Time:String, VOL_DXE_Is_Completed:Boolean, VOL_NW_Is_Completed:Boolean, VOL_CHG_Is_Completed:Boolean, VOL_Is_Closed:Boolean, VOL_Session_Category:String) extends Serializable
case class AGG_RECON_3( CHG_ROAM_TYPE:String, CHG_APN:String,
CHG_APN_Category:String, CHG_Charging_Characteristics:String, CHG_Rate_Plan:String, CHG_Rating_Group:String, var CHG_CDR_Count:Int, var VOL_PROBE_UL_VOL:Int) extends Serializable
case class AGG_RECON_2(NW_First_Report_Time:String, NW_Last_Report_Time:String, NW_Session_Start_Time:String, NW_IMSI:String, NW_MSISDN:String, NW_RAT_Type:String, NW_ROAM_TYPE:String, NW_APN:String, NW_APN_Category:String, NW_Charging_Characteristics:String, var NW_CDR_Count:Int,
CHG_First_Report_Time:String, CHG_Last_Report_Time:String, CHG_Session_Start_Time:String, CHG_IMSI:String, CHG_MSISDN:String) extends Serializable
case class AGG_RECON(SUBSCRIBER_ID:String, ChargingID:String ,NodeID:String, START_TIME:String, DXE_First_Report_Time:String, DXE_Last_Report_Time:String, DXE_Session_Start_Time:String, DXE_Bearer_Creation_Time:String, DXE_IMSI:String, DXE_MSISDN:String, DXE_RAT_Type:String,
DXE_Subscriber_Type:String, DXE_VPMN:String, DXE_ROAM_TYPE:String, DXE_APN:String, DXE_APN_Category:String, DXE_Charging_Characteristics:String,var DXE_CDR_Count:Int,agg_recon_2:AGG_RECON_2,agg_recon_3:AGG_RECON_3,agg_recon_4:AGG_RECON_4) extends Ordered[AGG_RECON] with Serializable
{
def compare(that: AGG_RECON): Int = {
var formatter: DateTimeFormatter = null
var d1: DateTime= //..
var d2: DateTime =//..
return d1.compareTo(d2)
}
}
I go then alist of my case class instances; however, when I try to sort it:
val elements=//Array[AGG_RECONN]
val sorted_cdrs=elements().sorted[AGG_RECON]
I got : No implicit Ordering defined for AGG_RECON.
Do not implement Ordered in AGG_RECON. Instead, define an implicit Ordering[AGG_RECON] like this:
object AGG_RECON {
implicit object AGG_RECON_Ordering extends Ordering[AGG_RECON] {
override def compare(x: AGG_RECON, y: AGG_RECON) = ??? // Compare x and y
}
}
(Side note: Is it really necessary to use UPPER_CASE names? It really clashes with convention.)

Scala compilation: anonymized function

Is there any specification of scala compilator that can explain that behaviour?
scala version: 2_10_6
code example
trait Service {
def process(s: String)
}
object ServiceImpl extends Service{
override def process(s: String): Unit = {
println(s)
}
}
object Register {
var serviceInst : Service = ServiceImpl
}
object Client1 {
def process1(l: List[String]): Unit ={
l.foreach(x => Register.serviceInst.process(x))
}
}
object Client2 {
def process1(l: List[String]): Unit ={
l.foreach(Register.serviceInst.process)
}
}
I assume that process1 and process2 should have the similar behaviour. However, after comilation / decom
public final class Client1$$anonfun$process$1$$anonfun$apply$1 extends AbstractFunction1<String, BoxedUnit> implements Serializable {
public static final long serialVersionUID = 0L;
public final void apply(final String x$1) {
Register$.MODULE$.serviceInst().process(x$1);
}
}
public static final class Client2$$anonfun$process$1 extends AbstractFunction1<String, BoxedUnit> implements Serializable {
public static final long serialVersionUID = 0L;
private final Service eta$0$1$1;
public final void apply(final String s) {
this.eta$0$1$1.process(s);
}
}
It's because Scala compiler performs eta-expansion on method given in Client2, which works by generating Function that calls process directly on a concrete Service instance.
Here is an example how these functions look like before they are turned into bytecode:
object Client1 {
def process1(l: List[String]): Unit = {
l.foreach(new Function1[String, Unit] {
def apply(x: String) = Register.serviceInst.process(x)
})
}
}
object Client2 {
def process1(l: List[String]): Unit = {
l.foreach(new Function1[String, Unit] {
val eta = Register.serviceInst
def apply(x: String) = eta.process(x)
})
}
}
It's become more interesting if we rewrite serviceInst a bit:
object Register {
def serviceInst : Service = {
println("get service instance!!!")
ServiceImpl
}
}
And then execute:
Client1.process1(List("a","b"))
Client2.process1(List("a","b"))
Obviously results are different:
1.
get service instance!!!
a
get service instance!!!
b
res0: Unit = ()
2.
get service instance!!!
a
b
res1: Unit = ()
Explanation is behind parameter of foreach function:
Client1 contains function as below, that executes each invocation x => Register.serviceInst.process(x)
Client2 has function process that's going to be executed, but firstly serviceInst is about to be initialized.
The line below
l.foreach(x => Register.serviceInst.process(x))
is operationally equivalent to
l.foreach(Register.serviceInst.process)
The first one is called "point-ful style" while the second is "point-free style", or more specifically "eta-conversion", with the term "point" referring to the named argument which doesn't exist in the second case. They are two different concepts and thus compile differently. You can write code in point-free style and the Scala compiler is eta-expanding it internally, which is what you're seeing in the decompiled code for Client2.
eta conversion is a term from Lambda Calculus. If the sole purpose of a lambda abstraction is to pass its argument to another function, then the lambda is redundant and can be stripped via eta conversion/reduction. Java's Lambda Expressions vs Method References is another example.

scala: how to use a class like a variable

Is it possible to refer to different classes on each pass of an iteration?
I have a substantial number of Hadoop Hive tables, and will be processing them with Spark. Each of the tables has an auto-generated class, and I would like to loop through the tables, instead of the tedious, non-code reuse copy/paste/handCodeIndividualTableClassNames technique resorted to first.
import myJavaProject.myTable0Class
import myJavaProject.myTable1Class
object rawMaxValueSniffer extends Logging {
/* tedious sequential: it works, and sometimes a programmer's gotta do... */
def tedious(args: Array[String]): Unit = {
val tablePaths = List("path0_string_here","path1_string")
var maxIds = ArrayBuffer[Long]()
FileInputFormat.setInputPaths(conf, tablePaths(0))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, myTable0Class.getClassSchema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[myTable0Class]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[myTable0Class]],
classOf[Void],
classOf[myTable0Class]).map(x => x._2)
maxIds += records.map(_.getId).collect().max
FileInputFormat.setInputPaths(conf, tablePaths(1))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, myTable1Class.getClassSchema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[myTable1Class]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[myTable1Class]],
classOf[Void],
classOf[myTable1Class]).map(x => x._2)
maxIds += records.map(_.getId).collect().max
}
/* class as variable, used in a loop. I have seen the mountain... */
def hopedFor(args: Array[String]): Unit = {
val tablePaths = List("path0_string_here","path1_string")
var maxIds = ArrayBuffer[Long]()
val tableClasses = List(classOf[myTable0Class],classOf[myTable1Class]) /* error free, but does not get me where I'm trying to go */
var counter=0
tableClasses.foreach { tc =>
FileInputFormat.setInputPaths(conf, tablePaths(counter))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, tc.getClassSchema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[tc]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[tc]],
classOf[Void],
classOf[tc]).map(x => x._2)
maxIds += records.map(_.getId).collect().max /* all the myTableXXX classes have getId() */
counter += 1
}
}
}
/* the classes being referenced... */
#org.apache.avro.specific.AvroGenerated
public class myTable0Class extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"rsivr_surveyquestiontypes\",\"namespace\":\"myJavaProject\",\"fields\":[{\"name\":\"id\",\"type\":\"in t\"},{\"name\":\"description\",\"type\":\"st,ing\"},{\"name\":\"scale_range\",\"type\":\"int\"}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
#Deprecated public int id;
yada.yada.yada0
}
#org.apache.avro.specific.AvroGenerated
public class myTable1Class extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"rsivr_surveyresultdetails\",\"namespace\":\"myJavaProject\",\"fields\":[{\"name\":\"id\",\"type\":\"in t\"},{\"name\":\"survey_dts\",\"type\":\"string\"},{\"name\":\"survey_id\",\"type\":\"int\"},{\"name\":\"question\",\"type\":\"int\"},{\"name\":\"caller_id\",\"type\":\"string\"},{\"name\":\"rec_msg\",\"type\":\"string\"},{\"name\ ":\"note\",\"type\":\"string\"},{\"name\":\"lang\",\"type\":\"string\"},{\"name\":\"result\",\"type\":\"string\"}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
#Deprecated public int id;
yada.yada.yada1
}
Something like this, perhaps:
def doStuff[T <: SpecificRecordBase : ClassTag](index: Int, schema: => Schema, clazz: Class[T]) = {
FileInputFormat.setInputPaths(conf, tablePaths(index))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, schema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[T]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[T]],
classOf[Void],
clazz).map(x => x._2)
maxIds += records.map(_.getId).collect().max
}
Seq(
(classOf[myTable0Class], myTable0Class.getClassSchema _),
(classOf[myTable1Class], myTable1Class.getClassSchema _)
).zipWithIndex
.foreach { case ((clazz, schema), index) => doStuff(index, schema, clazz) }
You could use reflection to invoke getClassSchema instead (clazz.getMethod("getClassSchema").invoke(null).asInstanceOf[Schema]), then you would not need to pass it in as a aprameter, just clazz would be enough, but that's kinda cheating ... I like this approach a bit better.

Scala: Can I reproduce anonymous class creation with a factory method?

As far as I understand it, Scala creates an anonymous class if I create a class using the new keyword and follow the class name with a constructor:
class MyClass {
def doStuff() {
// ...
}
}
val mc = new MyClass {
doStuff()
}
The nice thing being that all the code in the constructor is in the scope of the new object.
Is there a way I can reproduce this syntax where the class is created by a factory method rather than the new keyword? i.e. make the following code work:
val mf = new MyFactory
val mc = mf.MyClass {
doStuff()
}
I can't find a way to do it but Scala has so much to it that this might be pretty easy!
Using an import as suggested by #Ricky below I can get:
val mf = MyFactory;
val mc = mf.MyClass
{
import mc._
doStuff()
}
(Where the blank line before the block is needed) but that code block is not a constructor.
You can do this, but you still have to keep the new keyword, and create the nested class as a path-dependent type:
class Bippy(x: Int) {
class Bop {
def getIt = x
}
}
val bip = new Bippy(7)
val bop = new bip.Bop
bop.getIt // yields 7
val bop2 = new bip.Bop{ override def getIt = 42 }
bop2.getIt // yields 42
I don't think it's possible. However, a common pattern is to add a parameter to factory methods which takes a function modifying the created object:
trait MyClass {
var name = ""
def doStuff():Unit
}
class Foo extends MyClass {
def doStuff() { println("FOO: " + name) }
}
trait MyClassFactory {
def make: MyClass
def apply( body: MyClass => Unit ) = {
val mc = make
body(mc)
mc
}
}
object FooFactory extends MyClassFactory {
def make = new Foo
}
You can then create and modify instance with a syntax close to your example:
val foo = FooFactory { f=>
f.name = "Joe"
f.doStuff
}
It sounds like you're just looking to mix in a trait. Instead of calling myFactoryMethod(classOf[Foo]] which ideally would do (if Scala permitted it):
new T {
override def toString = "My implementation here."
}
you can instead write
trait MyImplementation {
override def toString = "My implementation here."
}
new Foo with MyImplementation
However, if you are just looking to get the members of the new object accessible without qualification, remember you can import from any stable identifier:
val foo = new Bar
import foo._
println(baz) //where baz is a member of foo.