List[String] Object in scala case class - scala

I am using dse 5.1.0 (packaged with spark 2.0.2.6 and scala 2.11.8).
reading a cassandra table as below.
val sparkSession = ...
val rdd1 = sparkSession.table("keyspace.table")
This table contains a List[String] column, say list1, which I read in scala rdd, say rdd1. But when I try to use encoder, it throws error.
val myVoEncoder = Encoders.bean(classOf[myVo])
val dataSet = rdd1.as(myVoEncoder)
I have tried with
scala.collection.mutable.list,
scala.collection.immutable.list,
scala.collection.list,
Seq,
WrappedArray. All gave the same error as below.
java.lang.UnsupportedOperationException: Cannot infer type for class scala.collection.immutable.List because it is not bean-compliant
MyVo.scala
case class MyVo(
#BeanProperty var id: String,
#BeanProperty var duration: Int,
#BeanProperty var list1: List[String],
) {
def this() = this("", 0, null)
}
Any help will be appriciated.

You should use Array[String]:
case class MyVo(
#BeanProperty var id: String,
#BeanProperty var duration: Int,
#BeanProperty var list1: Array[String]
) {
def this() = this("", 0, null)
}
although it is important to stress out, that more idiomatic approach would be:
import sparkSession.implicits._
case class MyVo(
id: String,
duration: Int,
list1: Seq[String]
)
rdd1.as[MyVo]

Related

Scala Akka (un)marshalling nested Seq collections with Spray JSON

I'm trying to use Spray JSON to marshall the 'Seq' collection below into a 'BidRequest' entity with the parameters as defined.
The Seq collection is mostly nested, therefore some 'Seq' parameter fields also have variable collection types that need to be marshalled.
Then after a computation, the aim is to unmarshall the results as an entity of 'BidResponse'.
What's the best approach to do this?
I'm using Akka-HTTP, Akka-Streams, Akka-Actor.
Seq collection:
val activeUsers = Seq(
Campaign(
id = 1,
country = "UK",
targeting = Targeting(
targetedSiteIds = Seq("0006a522ce0f4bbbbaa6b3c38cafaa0f")
),
banners = List(
Banner(
id = 1,
src ="https://business.URLTV.com/wp-content/uploads/2020/06/openGraph.jpeg",
width = 300,
height = 250
)
),
bid = 5d
)
)
BidRequest case class:
case class BidRequest(id: String, imp: Option[List[Impression]], site:Site, user: Option[User], device: Option[Device])
BidResponse case class:
case class BidResponse(id: String, bidRequestId: String, price: Double, adid:Option[String], banner: Option[Banner])
The other case classes:
case class Campaign(id: Int, country: String, targeting: Targeting, banners: List[Banner], bid: Double)
case class Targeting(targetedSiteIds: Seq[String])
case class Banner(id: Int, src: String, width: Int, height: Int)
case class Impression(id: String, wmin: Option[Int], wmax: Option[Int], w: Option[Int], hmin: Option[Int], hmax: Option[Int], h: Option[Int], bidFloor: Option[Double])
case class Site(id: Int, domain: String)
case class User(id: String, geo: Option[Geo])
case class Device(id: String, geo: Option[Geo])
case class Geo(country: Option[String])
I've so far tried using the code below but keep getting type mismatch errors:
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport._
import spray.json.DefaultJsonProtocol._
implicit val resFormat = jsonFormat2(BidResponse)
implicit val bidFormat = jsonFormat1(BidRequest)
implicit val cFormat = jsonFormat1(Campaign)
implicit val tFormat = jsonFormat1(Targeting)
implicit val bFormat = jsonFormat1(Banner)
implicit val iFormat = jsonFormat1(Impression)
implicit val sFormat = jsonFormat1(Site)
implicit val uFormat = jsonFormat1(User)
implicit val dFormat = jsonFormat1(Device)
implicit val gFormat = jsonFormat1(Geo)
The reason why you are getting Type errors with Spray JSON is because you need to use the corresponding jsonFormatN method, depending on the number of parameters in the case class.
In your case:
implicit val resFormat = jsonFormat5(BidResponse)
implicit val bidFormat = jsonFormat5(BidRequest)
implicit val cFormat = jsonFormat1(Campaign)
implicit val tFormat = jsonFormat1(Targeting)
implicit val bFormat = jsonFormat4(Banner)
...

How to create generated objects in shapeless

Suppose I have a normalized database model for a generic type that comes in like this:
case class BaseModel(id: String,
createdBy: String,
attr1: Option[String] = None,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)
Given a sequence of BaseModel, if all the fields of a certain Option attribute are not populated, can shapeless create a reduced model for me?
For example suppose that all the attr1 fields are empty. Without me having to specify the object before hand can shapeless create a generic object that looks like this?
case class BaseModel(id: String,
createdBy: String,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)
What Shapeless can do is, given two case classes, create an object of one of them from an object of another.
import java.time.LocalDate
import shapeless.LabelledGeneric
import shapeless.record._
case class BaseModel(id: String,
createdBy: String,
attr1: Option[String] = None,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)
case class BaseModel1(id: String,
createdBy: String,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)
val bm = BaseModel(
id = "cff4545gvgf",
createdBy = "John Doe",
attr2 = Some(42),
attr3 = Some(LocalDate.parse("2018-11-03"))
) // BaseModel(cff4545gvgf,John Doe,None,Some(42),Some(2018-11-03))
val hlist = LabelledGeneric[BaseModel].to(bm)
val hlist1 = hlist - 'attr1
val bm1 = LabelledGeneric[BaseModel1].from(hlist1)
// BaseModel1(cff4545gvgf,John Doe,Some(42),Some(2018-11-03))
But Shapeless can't create a new case class. If you need a new case class to be created automatically you can write a macro.

Why does the Scala Macro for case class copy fail?

I have about 24 Case classes that I need to programatically enhance by changing several common elements prior to serialization in a datastore that doesn't support joins. Since case classes don't have a trait defined for the copy(...) constructor, I have been attempting to use Macros - As a base I've looked at
this post documenting a macro and come up with this macro:
When I try to compile, I get the following:
import java.util.UUID
import org.joda.time.DateTime
import scala.language.experimental.macros
trait RecordIdentification {
val receiverId: Option[String]
val transmitterId: Option[String]
val patientId: Option[UUID]
val streamType: Option[String]
val sequenceNumber: Option[Long]
val postId: Option[UUID]
val postedDateTime: Option[DateTime]
}
object WithRecordIdentification {
import scala.reflect.macros.Context
def withId[T, I](entity: T, id: I): T = macro withIdImpl[T, I]
def withIdImpl[T: c.WeakTypeTag, I: c.WeakTypeTag](c: Context)(
entity: c.Expr[T], id: c.Expr[I]
): c.Expr[T] = {
import c.universe._
val tree = reify(entity.splice).tree
val copy = entity.actualType.member(newTermName("copy"))
val params = copy match {
case s: MethodSymbol if (s.paramss.nonEmpty) => s.paramss.head
case _ => c.abort(c.enclosingPosition, "No eligible copy method!")
}
c.Expr[T](Apply(
Select(tree, copy),
AssignOrNamedArg(Ident("postId"), reify(id.splice).tree) ::
AssignOrNamedArg(Ident("patientId"), reify(id.splice).tree) ::
AssignOrNamedArg(Ident("receiverId"), reify(id.splice).tree) ::
AssignOrNamedArg(Ident("transmitterId"), reify(id.splice).tree) ::
AssignOrNamedArg(Ident("sequenceNumber"), reify(id.splice).tree) :: Nil
))
}
}
And I invoke it with something like:
class GenericAnonymizer[A <: RecordIdentification]() extends Schema {
def anonymize(dataPost: A, header: DaoDataPostHeader): A = WithRecordIdentification.withId(dataPost, header)
}
But I get a compile error:
Error:(44, 71) type mismatch;
found : com.dexcom.rt.model.DaoDataPostHeader
required: Option[String]
val copied = WithRecordIdentification.withId(sampleGlucoseRecord, header)
Error:(44, 71) type mismatch;
found : com.dexcom.rt.model.DaoDataPostHeader
required: Option[java.util.UUID]
val copied = WithRecordIdentification.withId(sampleGlucoseRecord, header)
Error:(44, 71) type mismatch;
found : com.dexcom.rt.model.DaoDataPostHeader
required: Option[Long]
val copied = WithRecordIdentification.withId(sampleGlucoseRecord, header)
I'm not quite sure how to change the macro to support multiple parameters... any sage advice?
Assuming you have a set of following case classes, which you wish to anonymize on certain attributes prior to serialization.
case class MyRecordA(var receiverId: String, var y: Int)
case class MyRecordB(var transmitterId: Int, var y: Int)
case class MyRecordC(var patientId: UUID, var y: Int)
case class MyRecordD(var streamType: String, var y: Int)
case class MyRecordE(var sequenceNumber: String, var streamType: String, var y: Int)
You can use scala reflection library to mutate an instance's attributes in runtime. You can implement your custom anonymize/enhancing logic in implicit anonymize method that the Mutator can use to alter a given instance's field selectively if required as per your implementation.
import java.util.UUID
import scala.reflect.runtime.{universe => ru}
implicit def anonymize(field: String /* field name */, value: Any /* use current field value if reqd */): Option[Any] = field match {
case "receiverId" => Option(value.toString.hashCode)
case "transmitterId" => Option(22)
case "patientId" => Option(UUID.randomUUID())
case _ => None
}
implicit class Mutator[T: ru.TypeTag](i: T)(implicit c: scala.reflect.ClassTag[T], anonymize: (String, Any) => Option[Any]) {
def mask = {
val m = ru.runtimeMirror(i.getClass.getClassLoader)
ru.typeOf[T].members.filter(!_.isMethod).foreach(s => {
val fVal = m.reflect(i).reflectField(s.asTerm)
anonymize(s.name.decoded.trim, fVal.get).foreach(fVal.set)
})
i
}
}
Now you can invoke masking on any instance as:
val maskedRecord = MyRecordC(UUID.randomUUID(), 2).mask

Scala:case class runTime Error

This demo ran Ok. But when I move it to another class function(my former project) and call the function, it compiles failure.
object DFMain {
case class Person(name: String, age: Double, t:String)
def main (args: Array[String]): Unit = {
val sc = new SparkContext("local", "Scala Word Count")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val bsonRDD = sc.parallelize(("foo",1,"female")::
("bar",2,"male")::
("baz",-1,"female")::Nil)
.map(tuple=>{
var bson = new BasicBSONObject()
bson.put("name","bfoo")
bson.put("value",0.1)
bson.put("t","female")
(null,bson)
})
val tDf = bsonRDD.map(_._2)
.map(f=>Person(f.get("name").toString,
f.get("value").toString.toDouble,
f.get("t").toString)).toDF()
tDf.limit(1).show()
}
}
'MySQLDao.insertIntoMySQL()' compile error
object MySQLDao {
private val sc= new SparkContext("local", "Scala Word Count")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Person(name: String, age: Double, t:String)
def insertIntoMySQL(): Unit ={
val bsonRDD = sc.parallelize(("foo",1,"female")::
("bar",2,"male")::
("baz",-1,"female")::Nil)
.map(tuple=>{
val bson = new BasicBSONObject()
bson.put("name","bfoo")
bson.put("value",0.1)
bson.put("t","female")
(null,bson)
})
val tDf = bsonRDD.map(_._2).map( f=> Person(f.get("name").toString,
f.get("value").toString.toDouble,
f.get("t").toString)).toDF()
tDf.limit(1).show()
}
}
Will, when I call 'MySQLDao.insertIntoMySQL()' gets the Error of
value typedProductIterator is not a member of object scala.runtim.scala.scalaRuntTime
case class Person(name: String, age: Double, t:String)
I suppose that the case class isn't seen in closure inside map function. Move it to the package level.
case class Person(name: String, age: Double, t:String)
object MySQLDao {
...
}

Scala: How to access a class property dynamically by name?

How can I look up the value of an object's property dynamically by name in Scala 2.10.x?
E.g. Given the class (it can't be a case class):
class Row(val click: Boolean,
val date: String,
val time: String)
I want to do something like:
val fields = List("click", "date", "time")
val row = new Row(click=true, date="2015-01-01", time="12:00:00")
fields.foreach(f => println(row.getProperty(f))) // how to do this?
class Row(val click: Boolean,
val date: String,
val time: String)
val row = new Row(click=true, date="2015-01-01", time="12:00:00")
row.getClass.getDeclaredFields foreach { f =>
f.setAccessible(true)
println(f.getName)
println(f.get(row))
}
You could also use the bean functionality from java/scala:
import scala.beans.BeanProperty
import java.beans.Introspector
object BeanEx extends App {
case class Stuff(#BeanProperty val i: Int, #BeanProperty val j: String)
val info = Introspector.getBeanInfo(classOf[Stuff])
val instance = Stuff(10, "Hello")
info.getPropertyDescriptors.map { p =>
println(p.getReadMethod.invoke(instance))
}
}