How to create generated objects in shapeless - scala

Suppose I have a normalized database model for a generic type that comes in like this:
case class BaseModel(id: String,
createdBy: String,
attr1: Option[String] = None,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)
Given a sequence of BaseModel, if all the fields of a certain Option attribute are not populated, can shapeless create a reduced model for me?
For example suppose that all the attr1 fields are empty. Without me having to specify the object before hand can shapeless create a generic object that looks like this?
case class BaseModel(id: String,
createdBy: String,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)

What Shapeless can do is, given two case classes, create an object of one of them from an object of another.
import java.time.LocalDate
import shapeless.LabelledGeneric
import shapeless.record._
case class BaseModel(id: String,
createdBy: String,
attr1: Option[String] = None,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)
case class BaseModel1(id: String,
createdBy: String,
attr2: Option[Int] = None,
attr3: Option[LocalDate] = None)
val bm = BaseModel(
id = "cff4545gvgf",
createdBy = "John Doe",
attr2 = Some(42),
attr3 = Some(LocalDate.parse("2018-11-03"))
) // BaseModel(cff4545gvgf,John Doe,None,Some(42),Some(2018-11-03))
val hlist = LabelledGeneric[BaseModel].to(bm)
val hlist1 = hlist - 'attr1
val bm1 = LabelledGeneric[BaseModel1].from(hlist1)
// BaseModel1(cff4545gvgf,John Doe,Some(42),Some(2018-11-03))
But Shapeless can't create a new case class. If you need a new case class to be created automatically you can write a macro.

Related

Scala Akka (un)marshalling nested Seq collections with Spray JSON

I'm trying to use Spray JSON to marshall the 'Seq' collection below into a 'BidRequest' entity with the parameters as defined.
The Seq collection is mostly nested, therefore some 'Seq' parameter fields also have variable collection types that need to be marshalled.
Then after a computation, the aim is to unmarshall the results as an entity of 'BidResponse'.
What's the best approach to do this?
I'm using Akka-HTTP, Akka-Streams, Akka-Actor.
Seq collection:
val activeUsers = Seq(
Campaign(
id = 1,
country = "UK",
targeting = Targeting(
targetedSiteIds = Seq("0006a522ce0f4bbbbaa6b3c38cafaa0f")
),
banners = List(
Banner(
id = 1,
src ="https://business.URLTV.com/wp-content/uploads/2020/06/openGraph.jpeg",
width = 300,
height = 250
)
),
bid = 5d
)
)
BidRequest case class:
case class BidRequest(id: String, imp: Option[List[Impression]], site:Site, user: Option[User], device: Option[Device])
BidResponse case class:
case class BidResponse(id: String, bidRequestId: String, price: Double, adid:Option[String], banner: Option[Banner])
The other case classes:
case class Campaign(id: Int, country: String, targeting: Targeting, banners: List[Banner], bid: Double)
case class Targeting(targetedSiteIds: Seq[String])
case class Banner(id: Int, src: String, width: Int, height: Int)
case class Impression(id: String, wmin: Option[Int], wmax: Option[Int], w: Option[Int], hmin: Option[Int], hmax: Option[Int], h: Option[Int], bidFloor: Option[Double])
case class Site(id: Int, domain: String)
case class User(id: String, geo: Option[Geo])
case class Device(id: String, geo: Option[Geo])
case class Geo(country: Option[String])
I've so far tried using the code below but keep getting type mismatch errors:
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport._
import spray.json.DefaultJsonProtocol._
implicit val resFormat = jsonFormat2(BidResponse)
implicit val bidFormat = jsonFormat1(BidRequest)
implicit val cFormat = jsonFormat1(Campaign)
implicit val tFormat = jsonFormat1(Targeting)
implicit val bFormat = jsonFormat1(Banner)
implicit val iFormat = jsonFormat1(Impression)
implicit val sFormat = jsonFormat1(Site)
implicit val uFormat = jsonFormat1(User)
implicit val dFormat = jsonFormat1(Device)
implicit val gFormat = jsonFormat1(Geo)
The reason why you are getting Type errors with Spray JSON is because you need to use the corresponding jsonFormatN method, depending on the number of parameters in the case class.
In your case:
implicit val resFormat = jsonFormat5(BidResponse)
implicit val bidFormat = jsonFormat5(BidRequest)
implicit val cFormat = jsonFormat1(Campaign)
implicit val tFormat = jsonFormat1(Targeting)
implicit val bFormat = jsonFormat4(Banner)
...

Too Many Parameters

I have an application that has a single EntryPoint, it's a library to automate some data engineers stuffs.
case class DeltaContextConfig(
primaryKey: List[String],
columnToOrder: String,
filesCountFirstBatch: Int,
destinationPath: String,
sparkDf: DataFrame,
sparkContext: SparkSession,
operationType: String,
partitionColumn: Option[String] = None,
tableName: String,
databaseName: String,
autoCompaction: Option[Boolean] = Option(true),
idealFileSize: Option[Int] = Option(128),
deduplicationColumn: Option[String] = None,
compactionIntervalTime: Option[Int] = Option(180),
updateCondition: Option[String] = None,
setExpression: Option[String] = None
)
This is my case class, my single Entrypoint.
After that all these parameters are pass to other objects, I have objects to write in Datalake, to Compact files and so on. And these objects use some of these parameters, for example, I have a DeltaWriterConfig object:
DeltaWriterConfig(
sparkDf = deltaContextConfig.sparkDf,
columnToOrder = deltaContextConfig.columnToOrder,
destinationPath = deltaContextConfig.destinationPath,
primaryKey = deltaContextConfig.primaryKey,
filesCountFirstBatch = deltaContextConfig.filesCountFirstBatch,
sparkContext = deltaContextConfig.sparkContext,
operationType = deltaContextConfig.operationType,
partitionColumn = deltaContextConfig.partitionColumn,
updateCondition = deltaContextConfig.updateCondition,
setExpression = deltaContextConfig.setExpression
)
I use the DeltaWriterConfig, to pass these parameters to my class DeltaWriter. I was creating all these configs objects on the MAIN, but I think it is not good, because, I have 3 Config Objects to populate, so I have 3 big constructors on the application main.
Is there any pattern to solve this?
I think at least it would be better to replace creating another config from the first one to the companion object of DeltaWriterConfig:
case class DeltaWriterConfig(
sparkDf: DataFrame,
columnToOrder: String,
destinationPath: String,
primaryKey: List[String],
filesCountFirstBatch: Int,
sparkContext: SparkSession,
operationType: String,
partitionColumn: Option[String] = None,
updateCondition: Option[String] = None,
setExpression: Option[String] = None
)
case object DeltaWriterConfig {
def from(deltaContextConfig: DeltaContextConfig): DeltaWriterConfig =
DeltaWriterConfig(
sparkDf = deltaContextConfig.sparkDf,
columnToOrder = deltaContextConfig.columnToOrder,
destinationPath = deltaContextConfig.destinationPath,
primaryKey = deltaContextConfig.primaryKey,
filesCountFirstBatch = deltaContextConfig.filesCountFirstBatch,
sparkContext = deltaContextConfig.sparkContext,
operationType = deltaContextConfig.operationType,
partitionColumn = deltaContextConfig.partitionColumn,
updateCondition = deltaContextConfig.updateCondition,
setExpression = deltaContextConfig.setExpression
)
}
it gives us opportunity to create new config just in one line:
val deltaContextConfig: DeltaContextConfig = ???
val deltaWriterConfig = DeltaWriterConfig.from(deltaContextConfig)
but the better solution is have only that configs that are unique. For example if we have duplicates fields in DeltaContextConfig and DeltaWriterConfig why we couldn't have just composition of config and not duplicate these fields:
// instead of this DeltaContextConfig declaration
case class DeltaContextConfig(
tableName: String,
databaseName: String,
autoCompaction: Option[Boolean] = Option(true),
idealFileSize: Option[Int] = Option(128),
deduplicationColumn: Option[String] = None,
compactionIntervalTime: Option[Int] = Option(180),
sparkDf: DataFrame,
columnToOrder: String,
destinationPath: String,
primaryKey: List[String],
filesCountFirstBatch: Int,
sparkContext: SparkSession,
operationType: String,
partitionColumn: Option[String] = None,
updateCondition: Option[String] = None,
setExpression: Option[String] = None
)
case class DeltaWriterConfig(
sparkDf: DataFrame,
columnToOrder: String,
destinationPath: String,
primaryKey: List[String],
filesCountFirstBatch: Int,
sparkContext: SparkSession,
operationType: String,
partitionColumn: Option[String] = None,
updateCondition: Option[String] = None,
setExpression: Option[String] = None
)
we use such config structure:
case class DeltaContextConfig(
tableName: String,
databaseName: String,
autoCompaction: Option[Boolean] = Option(true),
idealFileSize: Option[Int] = Option(128),
deduplicationColumn: Option[String] = None,
compactionIntervalTime: Option[Int] = Option(180),
deltaWriterConfig: DeltaWriterConfig
)
case class DeltaWriterConfig(
sparkDf: DataFrame,
columnToOrder: String,
destinationPath: String,
primaryKey: List[String],
filesCountFirstBatch: Int,
sparkContext: SparkSession,
operationType: String,
partitionColumn: Option[String] = None,
updateCondition: Option[String] = None,
setExpression: Option[String] = None
)
but remember you should use the same config structure in your config file.

compare case class fields with sub fields of another case class in scala

I have the following 3 case classes:
case class Profile(name: String,
age: Int,
bankInfoData: BankInfoData,
userUpdatedFields: Option[UserUpdatedFields])
case class BankInfoData(accountNumber: Int,
bankAddress: String,
bankNumber: Int,
contactPerson: String,
phoneNumber: Int,
accountType: AccountType)
case class UserUpdatedFields(contactPerson: String,
phoneNumber: Int,
accountType: AccountType)
this is just enums, but i added anyway:
sealed trait AccountType extends EnumEntry
object AccountType extends Enum[AccountType] {
val values: IndexedSeq[AccountType] = findValues
case object Personal extends AccountType
case object Business extends AccountType
}
my task is - i need to write a funcc Profile and compare UserUpdatedFields(all of the fields) with SOME of the fields in BankInfoData...this func is to find which fields where updated.
so I wrote this func:
def findDiff(profile: Profile): Seq[String] = {
var listOfFieldsThatChanged: List[String] = List.empty
if (profile.bankInfoData.contactPerson != profile.userUpdatedFields.get.contactPerson){
listOfFieldsThatChanged = listOfFieldsThatChanged :+ "contactPerson"
}
if (profile.bankInfoData.phoneNumber != profile.userUpdatedFields.get.phoneNumber) {
listOfFieldsThatChanged = listOfFieldsThatChanged :+ "phoneNumber"
}
if (profile.bankInfoData.accountType != profile.userUpdatedFields.get.accountType) {
listOfFieldsThatChanged = listOfFieldsThatChanged :+ "accountType"
}
listOfFieldsThatChanged
}
val profile =
Profile(
"nir",
34,
BankInfoData(1, "somewhere", 2, "john", 123, AccountType.Personal),
Some(UserUpdatedFields("lee", 321, AccountType.Personal))
)
findDiff(profile)
it works, but wanted something cleaner..any suggestions?
Each case class extends Product interface so we could use it to convert case classes into sets of (field, value) elements. Then we can use set operations to find the difference. For example,
def findDiff(profile: Profile): Seq[String] = {
val userUpdatedFields = profile.userUpdatedFields.get
val bankInfoData = profile.bankInfoData
val updatedFieldsMap = userUpdatedFields.productElementNames.zip(userUpdatedFields.productIterator).toMap
val bankInfoDataMap = bankInfoData.productElementNames.zip(bankInfoData.productIterator).toMap
val bankInfoDataSubsetMap = bankInfoDataMap.view.filterKeys(userUpdatedFieldsMap.keys.toList.contains)
(bankInfoDataSubsetMap.toSet diff updatedFieldsMap.toSet).toList.map { case (field, value) => field }
}
Now findDiff(profile) should output List(phoneNumber, contactPerson). Note we are using productElementNames from Scala 2.13 to get the filed names which we then zip with corresponding values
userUpdatedFields.productElementNames.zip(userUpdatedFields.productIterator)
Also we rely on filterKeys and diff.
A simple improvement would be to introduce a trait
trait Fields {
val contactPerson: String
val phoneNumber: Int
val accountType: AccountType
def findDiff(that: Fields): Seq[String] = Seq(
Some(contactPerson).filter(_ != that.contactPerson).map(_ => "contactPerson"),
Some(phoneNumber).filter(_ != that.phoneNumber).map(_ => "phoneNumber"),
Some(accountType).filter(_ != that.accountType).map(_ => "accountType")
).flatten
}
case class BankInfoData(accountNumber: Int,
bankAddress: String,
bankNumber: Int,
contactPerson: String,
phoneNumber: Int,
accountType: String) extends Fields
case class UserUpdatedFields(contactPerson: String,
phoneNumber: Int,
accountType: AccountType) extends Fields
so it was possible to call
BankInfoData(...). findDiff(UserUpdatedFields(...))
If you want to further-improve and avoid naming all the fields multiple times, for example shapeless could be used to do it compile time. Not exactly the same but something like this to get started. Or use reflection to do it runtime like this answer.
That would be a very easy task to achieve if it would be an easy way to convert case class to map. Unfortunately, case classes don't offer that functionality out-of-box yet in Scala 2.12 (as Mario have mentioned it will be easy to achieve in Scala 2.13).
There's a library called shapeless, that offers some generic programming utilities. For example, we could write an extension function toMap using Record and ToMap from shapeless:
object Mappable {
implicit class RichCaseClass[X](val x: X) extends AnyVal {
import shapeless._
import ops.record._
def toMap[L <: HList](
implicit gen: LabelledGeneric.Aux[X, L],
toMap: ToMap[L]
): Map[String, Any] =
toMap(gen.to(x)).map{
case (k: Symbol, v) => k.name -> v
}
}
}
Then we could use it for findDiff:
def findDiff(profile: Profile): Seq[String] = {
import Mappable._
profile match {
case Profile(_, _, bankInfo, Some(userUpdatedFields)) =>
val bankInfoMap = bankInfo.toMap
userUpdatedFields.toMap.toList.flatMap{
case (k, v) if bankInfoMap.get(k).exists(_ != v) => Some(k)
case _ => None
}
case _ => Seq()
}
}

List[String] Object in scala case class

I am using dse 5.1.0 (packaged with spark 2.0.2.6 and scala 2.11.8).
reading a cassandra table as below.
val sparkSession = ...
val rdd1 = sparkSession.table("keyspace.table")
This table contains a List[String] column, say list1, which I read in scala rdd, say rdd1. But when I try to use encoder, it throws error.
val myVoEncoder = Encoders.bean(classOf[myVo])
val dataSet = rdd1.as(myVoEncoder)
I have tried with
scala.collection.mutable.list,
scala.collection.immutable.list,
scala.collection.list,
Seq,
WrappedArray. All gave the same error as below.
java.lang.UnsupportedOperationException: Cannot infer type for class scala.collection.immutable.List because it is not bean-compliant
MyVo.scala
case class MyVo(
#BeanProperty var id: String,
#BeanProperty var duration: Int,
#BeanProperty var list1: List[String],
) {
def this() = this("", 0, null)
}
Any help will be appriciated.
You should use Array[String]:
case class MyVo(
#BeanProperty var id: String,
#BeanProperty var duration: Int,
#BeanProperty var list1: Array[String]
) {
def this() = this("", 0, null)
}
although it is important to stress out, that more idiomatic approach would be:
import sparkSession.implicits._
case class MyVo(
id: String,
duration: Int,
list1: Seq[String]
)
rdd1.as[MyVo]

Scala Implementation

I have a case class:
case class EvaluateAddress(addressFormat: String,
screeningAddressType: String,
value: Option[String])
This was working fine until I have a new use case where "value" parameter can be a class Object instead of String.
My initial implementation to handle this use case:
case class EvaluateAddress(addressFormat: String,
screeningAddressType: String,
addressId: Option[String],
addressValue: Option[MailingAddress]) {
def this(addressFormat: String, screeningAddressType: String, addressId: String) = {
this(addressFormat, screeningAddressType, Option(addressId), None)
}
def this(addressFormat: String, screeningAddressType: String, address: MailingAddress) = {
this(addressFormat, screeningAddressType, None, Option(address))
}
}
But because of some problem, I can not have four parameters in any constructor.
Is there a way I can create a class containing three parameters: ** addressFormat, screeningAddressType, value** and handle both the use cases?
Your code works fine, to use the other constructor's you just need to use the new keyword:
case class MailingAddress(i: Int)
case class EvaluateAddress(addressFormat: String, screeningAddressType: String, addressId: Option[String], addressValue: Option[MailingAddress]) {
def this(addressFormat: String, screeningAddressType: String, addressId: String) = {
this(addressFormat, screeningAddressType, Option(addressId), None)
}
def this(addressFormat: String, screeningAddressType: String, address: MailingAddress) = {
this(addressFormat, screeningAddressType, None, Option(address))
}
}
val e1 = EvaluateAddress("a", "b", None, None)
val e2 = new EvaluateAddress("a", "b", "c")
val e3 = new EvaluateAddress("a", "b", MailingAddress(0))
You can create an auxilliary ADT to wrap different types of values. Inside EvaluateAddress you can check the alternative that was provided with a match:
case class EvaluateAddress(addressFormat: String,
screeningAddressType: String,
value: Option[EvaluateAddress.Value]
) {
import EvaluateAddress._
def doEvaluation() = value match {
case Some(Value.AsId(id)) =>
case Some(Value.AsAddress(mailingAddress)) =>
case None =>
}
}
object EvaluateAddress {
sealed trait Value
object Value {
case class AsId(id: String) extends Value
case class AsAddress(address: MailingAddress) extends Value
}
}
It's then possible to also define some implicit conversions to automatically convert Strings and MailingAddresses into Values:
object EvaluateAddress {
sealed trait Value
object Value {
case class AsId(id: String) extends Value
case class AsAddress(address: MailingAddress) extends Value
implicit def idAsValue(id: String): Value = AsId(id)
implicit def addressAsValue(address: MailingAddress): Value = AsAddress(address)
}
def withRawValue[T](addressFormat: String,
screeningAddressType: String,
rawValue: Option[T])(implicit asValue: T => Value): EvaluateAddress =
{
EvaluateAddress(addressFormat, screeningAddressType, rawValue.map(asValue))
}
}
Some examples of using those implicit conversions:
scala> EvaluateAddress("a", "b", Some("c"))
res1: EvaluateAddress = EvaluateAddress(a,b,Some(AsId(c)))
scala> EvaluateAddress("a", "b", Some(MailingAddress("d")))
res2: EvaluateAddress = EvaluateAddress(a,b,Some(AsAddress(MailingAddress(d))))
scala> val id: Option[String] = Some("id")
id: Option[String] = Some(id)
scala> EvaluateAddress.withRawValue("a", "b", id)
res3: EvaluateAddress = EvaluateAddress(a,b,Some(AsId(id)))