Read yaml of same nested structure with different properties in scala - scala

I am trying to make a generic function to read yaml files of same nested structure but different properties in scala using snakeYaml. Like one yaml could be
myMap:
-
name: key1
value: value1
-
name: key2
value: value2
Another yaml could be
myMap:
-
name: key1
value: value1
data: data1
-
name: key2
value: value2
data: data2
To read first yaml , I am able to read using below code from here :
class configParamsKeyValue {
#BeanProperty var name: String = null
#BeanProperty var value: String = null
}
class myConfig{
#BeanProperty var myMap= new java.util.ArrayList[configParamsKeyValue]();
}
def loadConfig(filename : String): myConfig = {
val yaml = new Yaml(new Constructor(classOf[myConfig]))
val stream = new FileInputStream(filename)
try {
val obj = yaml.load(stream)
obj.asInstanceOf[myConfig]
} finally {
stream.close()
}
}
I want to be able to pass this type configParamsKeyValue of ArrayList as parameter in class myConfig so that I can read second yaml file as well by defining another class like
class configParamsKeyValueData {
#BeanProperty var name: String = null
#BeanProperty var value: String = null
#BeanProperty var data: String = null
}
Can some body please help

Related

Converting a data class to a #MappedEntity data class in Micronaut using Kotlin

Firstly, I'm new to micronaut and kotlin.
I'm trying to convert a data class that I receive as a request body to my API. I need to persist data from this json into a Postgres table. I wasn't able to add #field annotation to a #MappedEntity data class to validate empty fields with a message. Hence, I decided to use a normal data class to parse the request body and then create a new object of the #MappedEntity class to run the save query.
The issue:
I have a #GeneratedValue Id column in my mapped Entity class. When I try to create a new object of this class, I'm not able to leave the Id column blank.
Controller:
#Post("/contribute")
#Status(HttpStatus.CREATED)
fun processIndividualContribution(
#Valid #Body individualContributionRequestTO: IndividualContributionRequestTO
): Mono<IndividualContributionDTO>
Request Body Data CLass:
#JsonIgnoreProperties(ignoreUnknown = true)
data class IndividualContributionRequestTO(
#field: NotEmpty(message = "Transaction Id cannot be null") val transactionId: String,
#field: NotEmpty(message = "First Name can't be null") val firstName: String,
val lastName: String? = null,
#field: NotEmpty(message = "Email address can't be null") val emailAddress: String,
#field: NotEmpty(message = "Individual Contribution can't be null") val contribution: String,
)
{
fun toPurchasedDataEntity(individualContributionRequestTO: IndividualContributionRequestTO): PurchaseDataAll{
return PurchaseDataAll(
purchaserFirstName = individualContributionRequestTO.firstName,
purchaserLastName = individualContributionRequestTO.lastName,
purchaserEmail = individualContributionRequestTO.emailAddress,
contribution = individualContributionRequestTO.contribution
)
}
}
Mapped Entity Class:
#MappedEntity
#Table(name="purchased_all")
data class PurchaseDataAll (
#Id
#GeneratedValue
#Column(name = "purchase_id")
val purchaseId: Int,
#Column(name = "transaction_id")
val transactionId: String,
#Column(name = "purchaser_first_name")
val purchaserFirstName: String,
#Column(name = "purchaser_last_name")
val purchaserLastName: String? = null,
#Column(name = "purchaser_email")
val purchaserEmail: String,
#Column(name = "contribution")
val contribution: String,
#DateCreated
#Column(name = "created_ts")
var createdTs: LocalDateTime? = null,
#DateUpdated
#Column(name = "updated_ts")
var updatedTs: LocalDateTime? = null
)
The function toPurchasedDataEntity doesn't compile due to the missing purchaseId field in the returned object.
Is there a way I can parse the request body data class to the mapped entity class by ignoring the auto generated field?
In Kotlin, you'll have to prefix annotations witn field: like shown below. I also changed def for purchaseId so you don't have to specify it when mapping from view class (DTO) to entity class.
IMHO, I think it's a good approach to separate entity classes from view classes as you've done in your question.
#MappedEntity
#Table(name="purchased_all")
data class PurchaseDataAll (
#field:Id
#field:GeneratedValue
#field:Column(name = "purchase_id")
var purchaseId: Int? = null,
#field:Column(name = "transaction_id")
val transactionId: String,
#field:Column(name = "purchaser_first_name")
val purchaserFirstName: String,
#field:Column(name = "purchaser_last_name")
val purchaserLastName: String? = null,
#field:Column(name = "purchaser_email")
val purchaserEmail: String,
#field:Column(name = "contribution")
val contribution: String,
#field:DateCreated
#field:Column(name = "created_ts")
var createdTs: LocalDateTime? = null,
#field:DateUpdated
#field:Column(name = "updated_ts")
var updatedTs: LocalDateTime? = null
)

Scala spark dataframe parse string field to typesafe Config error

I´m trying to parse a String field from a spark Dataframe to a Config, reading the Dataframe as a specific case class. The objective is to get a new row in the dataframe for each input & output that the config field contains. I'm far from having a solution, but fighting with an error about serializable objects:
config field comes as this:
data {
inputs = [
{
name = "table1"
paths = ["/data/master/table1"]
},
{
name = "table2"
paths = ["/data/master/table2"]
}
]
outputs = [
{
name = "table1"
type = "parquet"
mode = "append"
force = true
path = "/data/master/table1"
},
{
name = "table2"
type = "parquet"
mode = "append"
force = true
path = "/data/master/table2"
}
]
}
Code I'm using:
// case class to read the config field
case class JobConf(name: String, job: String, version: Long, updatedAt: Timestamp, conf: String)
//case class for output dataframe
case class InputOutputJob(name: String, job: String, version: String, path: String,
inputOuput: String)
val jobsDF = spark.read.parquet("path")
jobsDF.map {
job => {
val x = JobConf(job.getString(0), job.getString(1), job.getInt(2), job.getTimestamp(3), job.getString(4))
val c = ConfigFactory.parseString(x.conf, ConfigParseOptions.defaults().setSyntax(ConfigSyntax.PROPERTIES))
val p:Map[String, ConfigValue] = c.root().asScala
//Pending
InputOutputJob( ))
}
}.toDF
The error that comes when try to compile is:
Cause: java.io.NotSerializableException:
Serialization stack:
- object not serializable
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)

Retrieving list of objects from application.conf

I have the following entry in Play for Scala application.conf:
jobs = [
{number: 0, dir: "/dir1", name: "General" },
{number: 1, dir: "/dir2", name: "Customers" }
]
I want to retrieve this list of objects in a Scala program:
val conf = ConfigFactory.load
val jobs = conf.getAnyRefList("jobs").asScala
println(jobs)
this prints
Buffer({number=0, name=General, dir=/dir1}, {number=1, name=Customers, dir=/dir2})
But how to convert the result to actual Scala objects?
Try this one:
case class Job(number: Int, dir: String, name: String)
object Job {
implicit val configLoader: ConfigLoader[List[Job]] = ConfigLoader(_.getConfigList).map(
_.asScala.toList.map(config =>
Job(
config.getInt("number"),
config.getString("dir"),
config.getString("name")
)
)
)
}
Then from Confugutation DI
Configuration.get[List[Job]]("jobs")
Here is a Config object which will extract data from a config file into a type that you specify.
Usage:
case class Job(number: Int, dir: String, name: String)
val jobs = Config[List[Job]]("jobs")
Code:
import com.typesafe.config._
import org.json4s._
import org.json4s.jackson.JsonMethods._
object Config {
private val conf = ConfigFactory.load()
private val jData = parse(conf.root.render(ConfigRenderOptions.concise))
def apply[T](name: String)(implicit formats: Formats = DefaultFormats, mf: Manifest[T]): T =
Extraction.extract(jData \\ name)(formats, mf)
}
This will throw an exception if the particular config object does not exist or does not match the format of T.

spark groupBy operation hangs at 199/200

I have a spark standalone cluster with master and two executors. I have an RDD[LevelOneOutput] and below is LevelOneOutput class
class LevelOneOutput extends Serializable {
#BeanProperty
var userId: String = _
#BeanProperty
var tenantId: String = _
#BeanProperty
var rowCreatedMonth: Int = _
#BeanProperty
var rowCreatedYear: Int = _
#BeanProperty
var listType1: ArrayBuffer[TypeOne] = _
#BeanProperty
var listType2: ArrayBuffer[TypeTwo] = _
#BeanProperty
var listType3: ArrayBuffer[TypeThree] = _
...
...
#BeanProperty
var listType18: ArrayBuffer[TypeEighteen] = _
#BeanProperty
var groupbyKey: String = _
}
Now I want to group this RDD based on userId, tenantId, rowCreatedMonth, rowCreatedYear. For that I did this
val levelOneRDD = inputRDD.map(row => {
row.setGroupbyKey(s"${row.getTenantId}_${row.getRowCreatedYear}_${row.getRowCreatedMonth}_${row.getUserId}")
row
})
val groupedRDD = levelOneRDD.groupBy(row => row.getGroupbyKey)
This gives me the data in key as String and value as Iterable[LevelOneOutput]
Now I want to generate one single object of LevelOneOutput for that group key. For that I was doing something like below:
val rdd = groupedRDD.map(row => {
val levelOneOutput = new LevelOneOutput
val groupKey = row._1.split("_")
levelOneOutput.setTenantId(groupKey(0))
levelOneOutput.setRowCreatedYear(groupKey(1).toInt)
levelOneOutput.setRowCreatedMonth(groupKey(2).toInt)
levelOneOutput.setUserId(groupKey(3))
var listType1 = new ArrayBuffer[TypeOne]
var listType2 = new ArrayBuffer[TypeTwo]
var listType3 = new ArrayBuffer[TypeThree]
...
...
var listType18 = new ArrayBuffer[TypeEighteen]
row._2.foreach(data => {
if (data.getListType1 != null) listType1 = listType1 ++ data.getListType1
if (data.getListType2 != null) listType2 = listType2 ++ data.getListType2
if (data.getListType3 != null) listType3 = listType3 ++ data.getListType3
...
...
if (data.getListType18 != null) listType18 = listType18 ++ data.getListType18
})
if (listType1.isEmpty) levelOneOutput.setListType1(null) else levelOneOutput.setListType1(listType1)
if (listType2.isEmpty) levelOneOutput.setListType2(null) else levelOneOutput.setListType2(listType2)
if (listType3.isEmpty) levelOneOutput.setListType3(null) else levelOneOutput.setListType3(listType3)
...
...
if (listType18.isEmpty) levelOneOutput.setListType18(null) else levelOneOutput.setListType18(listType18)
levelOneOutput
})
This is working as expected for small size of input, but when I try to run on the larger set of input data, group by operation is getting hang at 199/200 and I don't see any specific error or warning in stdout/stderr
Can some one point me why the job is not proceeding further...
Instead of using groupBy operation I have created paired RDD like below
val levelOnePairedRDD = inputRDD.map(row => {
row.setGroupbyKey(s"${row.getTenantId}_${row.getRowCreatedYear}_${row.getRowCreatedMonth}_${row.getUserId}")
(row.getGroupByKey, row)
})
and updated the processing logic, which solved my issue.

Salat and Embeded MongoDb Documents

I have a case class that is made up of 2 embeded documents, one of which is a list. I am having some problems retriving the items in the list.
Please see my code below:
package models
import play.api.Play.current
import com.novus.salat._
import com.novus.salat.dao._
import com.mongodb.casbah.Imports._
import se.radley.plugin.salat._
import com.novus.salat.global._
case class Category(
_id: ObjectId = new ObjectId,
category: Categories,
subcategory: List[SubCategories]
)
case class Categories(
category_id: String,
category_name: String
)
case class SubCategories(
subcategory_id: String,
subcategory_name: String
)
object Category extends ModelCompanion[Category, ObjectId] {
val collection = mongoCollection("category")
val dao = new SalatDAO[Category, ObjectId](collection = collection) {}
val CategoryDAO = dao
def options: Seq[(String,String)] = {
find(MongoDBObject.empty).map(it => (it.category.category_id, it.category.category_name)).toSeq
}
def suboptions: Seq[(String,String,String)] = {
find(MongoDBObject.empty).map(it => (it.category.category_id, it.subcategory.subcategory_id, it.subcategory.subcategory_name)).toSeq
}
}
I get the error: value subcategory_id is not a member of List[models.SubCategories] which doesnt make any sense to me.
You are doing this:
def suboptions: Seq[(String,String,String)] = {
find(MongoDBObject.empty).map(category => {
val categories: Categories = category.category
val categoryId: String = categories.category._id
val subcategory: List[Subcategory] = category.subcategory
val subcategoryId: String = subcategory.subcategory_id //here you are trying to
//get id from list of subcategories, not one of them
val subcategoryName: String = subcategory.subcategory_name //same here
(categoryId, subcategoryId, subcategoryName)).toSeq
}
}
BTW. using snake_case in Scala is quite uncommon, val/var names should be in camelCase, see this
Edit: You can make it simple by doing this:
case class Category(
_id: ObjectId = new ObjectId(),
id: String,
name: String,
subcategories: List[Subcategory]
)
case class Subcategory(
id: String,
name: String
)
//not tested
def suboptions: Seq[(String, String, String)] = {
val categories = find(MongoDBObject.empty)
for{
category <- categories;
subcategory <- category.subcategories
} yield (category.id, subcategory.id, subcategory.name)
}