Cassandra spark connector write nested optional case class

Cassandra spark connector write nested optional case class - scala

How to write optional case classes with cassandra spark connector ?
example :
case class User(name : String, address : Option[Address])
case class Address(street : String, city : String)
When I tried to save user to cassandra with rdd.saveToCassandra it's raise an error
Failed to get converter for field "address" of type scala.Option[Address] in User mapped to column "address" of "testspark.logs_raw"
I have tried to implement a TypeConverter but that has not worked.
However nested case classes are correctly converted to cassandra UDT and optional fields are accepted.
Any good way to deal with that without changing the data model?

Just for visibility. Everything works fine in the modern versions - there were a lot of changes in UDTs around SCC 1.4.0-1.6.0, plus many performance optimizations in the SCC 2.0.8. With SCC 2.5.1, RDD API correctly maps everything - for example, if we have following UDT & table:
cqlsh> create type test.address (street text, city text);
cqlsh> create table test.user(name text primary key, address test.address);
cqlsh> insert into test.user(name, address) values
('with address', {street: 'street 1', city: 'city1'});
cqlsh> insert into test.user(name) values ('without address');
cqlsh> select * from test.user;
name | address
-----------------+-------------------------------------
with address | {street: 'street 1', city: 'city1'}
without address | null
(2 rows)
Then RDD API is able to correctly pull everything when reading data:
scala> import com.datastax.spark.connector._
import com.datastax.spark.connector._
scala> case class Address(street : String, city : String)
defined class Address
scala> case class User(name : String, address : Option[Address])
defined class User
scala> val data = sc.cassandraTable[User]("test", "user")
data: com.datastax.spark.connector.rdd.CassandraTableScanRDD[User] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18
scala> data.collect
res0: Array[User] = Array(User(without address,None),
User(with address,Some(Address(street 1,city1))))

Related

Scala case class whose fields can be mandatory and optional at different instances

I've created two rest end-points in akka http which takes string as input, parse it using Json4s and then do processing on it. My case class is like -
final case class A(id: String, name: String, address: String)
1st end point receives only id while the other receives all three fields and I want to use the same case class A for both. So I used default values for name & address fields like -
final case class A(id: Stirng, name: String = "", address: String = "")
This is working good for me. But now if I don't send address or name (or both) fields at second end point, it does not throw an exception stating that the name (or address) not found.
So, my question is can I create one end point in which id is mandatory while other fields does not matter and another end point where every field is mandatory using same case class ?
The code to parse the string to a case class is -
parse(jsonStr).extract[A]
I hope you're getting my point.
Any suggestions ?

There are two ways you can achieve what you want to do.
Option + Validations
name and address are optional so you need to handle them.
case class A(id: String, name: Option[String], address: Option[String])
val json = """{ "id":"1" }"""
// 1st endpoint
val r = parse(json).extract[A]
r.name.getOrElse("foo")
r.address.getOrElse("bar")
// 2nd endpoint
val r2 = parse(json).extract[A]
r2.name.getOrElse(/* boom! */)
Default JObject + Merge
or you can use an alternative JObject to provide default values to your input.
case class A(id: String, name: String, address: String)
val json = """{ "id":"1" }"""
val defaultValues = JObject(("name", JString("foo")), ("address", JString("bar")))
// 1st endpoint
val r = defaultValues.merge(parse(json)).extract[A]
// 2nd endpoint
val r2 = parse(json).extract[A] // boom! again

No your case class formally defines what you expect in input. It doesn't represent ambiguity. You could use optional and add checks. But that just defeats the purpose of extractor.

Codecs UDT's in Cassandra

I have a table in Cassandra with Map
I have define MyUDT as a case class in Scala and trying to read it. When I save it to Cassandra, it works, but later, I want to read it with the driver and I have some problems.
I'm trying to do this:
val result = session.execute(s"select ls_my_udt from ks.myTable WHERE id = 0")
val queryConContrato = result.one().getMap[String, MyUDT]("ls_my_udt ", classOf[String], classOf[MyUDT])
When I execute this query I get an error:
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [frozen<ks.myudt> <-> com.MyUDT]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:679)
My TYPE is:
CREATE TYPE MyUDT (
id text,
field text
);
My case class is:
This is the error if I don't annote the case class
java.lang.IllegalArgumentException: #UDT annotation was not found on class com.MyUDT
And my case class
#UDT(keyspace="ks", name="myUDT")
case class ProductUDT(id: String,
field: String )

Explicit cast reading .csv with case class Spark 2.1.0

I have the following case class:
case class OrderDetails(OrderID : String, ProductID : String, UnitPrice : Double,
Qty : Int, Discount : Double)
I am trying read this csv: https://github.com/xsankar/fdps-v3/blob/master/data/NW-Order-Details.csv
This is my code:
val spark = SparkSession.builder.master(sparkMaster).appName(sparkAppName).getOrCreate()
import spark.implicits._
val orderDetails = spark.read.option("header","true").csv( inputFiles + "NW-Order-Details.csv").as[OrderDetails]
And the error is:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot up cast `UnitPrice` from string to double as it may truncate
The type path of the target object is:
- field (class: "scala.Double", name: "UnitPrice")
- root class: "es.own3dh2so4.OrderDetails"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
Why can not it be transformed if all fields are "doubles" values? What do not I understand?
Spark version 2.1.0, Scala version 2.11.7

You just need to explicitly cast your field to a Double:
val orderDetails = spark.read
.option("header","true")
.csv( inputFiles + "NW-Order-Details.csv")
.withColumn("unitPrice", 'UnitPrice.cast(DoubleType))
.as[OrderDetails]
On a side note, by Scala (and Java) convention, your case class constructor parameters should be lower camel case:
case class OrderDetails(orderID: String,
productID: String,
unitPrice: Double,
qty: Int,
discount: Double)

If we want to change the datatype for multiple columns; if we use withColumn option it will look ugly.
The better way to apply schema for the data is
Get the Case Class schema using Encoders as shown below
val caseClassschema = Encoders.product[CaseClass].schema
Apply this schema while reading data
val data = spark.read.schema(caseClassschema)

Assertion on retrieving data from cassandra

i defined a class to map rows of a cassandra table:
case class Log(
val time: Long,
val date: String,
val appId: String,
val instanceId: String,
val appName: String,
val channel: String,
val originCode: String,
val message: String) {
}
i created an RDD to save all my tuples
val logEntries = sc.cassandraTable[Log]("keyspace", "log")
to see if all works i printed this:
println(logEntries.counts()) -> works, print the numbers of tuples retrieved.
println(logEntries.first()) -> exception on this line
java.lang.AssertionError: assertion failed: Missing columns needed by
com.model.Log: app_name, app_id, origin_code, instance_id
my columns of table log on cassandra are:
time bigint, date text, appid text, instanceid text, appname text, channel text, origincode text, message text
what's wrong?

As mentioned in cassandra-spark-connector docs, column name mapper has it's own logic for converting case class parameters to column names:
For multi-word column identifiers, separate each word by an underscore in Cassandra, and use the camel case convention on the Scala side.
So if you use case class Log(appId:String, instanceId:String) with camel-cased parameters, it will be automatically mapped to a underscore-separated notation: app_id text, instance_id text. It cannot be automatically mapped to appid text, instanceid text: you've missed an underscore.

slick 2 mapping many to one relationship with mappedColumn

i'm trying to map my class with slick so i can persist them.
My business objects are defined this way
case class Book(id : Option[Long], author : Author, title : String, readDate : Date, review : String){}
case class Author(id : Option[Long], name : String, surname : String) {}
Then I defined the "table" class for authors:
class Authors(tag : Tag) extends Table[Author](tag,"AUTHORS") {
def id = column[Option[Long]]("AUTHOR_ID", O.PrimaryKey, O.AutoInc)
def name = column[String]("NAME")
def surname = column[String]("SURNAME")
def * = (id, name, surname) <> ((Author.apply _).tupled , Author.unapply)
}
And for Books:
class Books (tag : Tag) extends Table[Book](tag, "BOOKS") {
implicit val authorMapper = MappedColumnType.base[Author, Long](_.id.get, AuthorDAO.DAO.findById(_))
def id = column[Option[Long]]("BOOK_ID", O.PrimaryKey, O.AutoInc)
def author = column[Author]("FK_AUTHOR")
def title = column[String]("TITLE")
def readDate = column[Date]("DATE")
def review = column[Option[String]]("REVIEW")
def * = (id, author, title, readDate, review) <> ((Book.apply _).tupled , Book.unapply)
}
But when I compile i get this error
Error:(24, 51) No matching Shape found.
Slick does not know how to map the given types.
Possible causes: T in Table[T] does not match your * projection. Or you use an unsupported type in a Query (e.g. scala List).
Required level: scala.slick.lifted.FlatShapeLevel
Source type: (scala.slick.lifted.Column[Option[Long]], scala.slick.lifted.Column[model.Author], scala.slick.lifted.Column[String], scala.slick.lifted.Column[java.sql.Date], scala.slick.lifted.Column[Option[String]])
Unpacked type: (Option[Long], model.Author, String, java.sql.Date, String)
Packed type: Any
def * = (id, author, title, readDate, review) <> ((Book.apply _).tupled , Book.unapply)
^
and also this one:
Error:(24, 51) not enough arguments for method <>: (implicit evidence$2: scala.reflect.ClassTag[model.Book], implicit shape: scala.slick.lifted.Shape[_ <: scala.slick.lifted.FlatShapeLevel, (scala.slick.lifted.Column[Option[Long]], scala.slick.lifted.Column[model.Author], scala.slick.lifted.Column[String], scala.slick.lifted.Column[java.sql.Date], scala.slick.lifted.Column[Option[String]]), (Option[Long], model.Author, String, java.sql.Date, String), _])scala.slick.lifted.MappedProjection[model.Book,(Option[Long], model.Author, String, java.sql.Date, String)].
Unspecified value parameter shape.
def * = (id, author, title, readDate, review) <> ((Book.apply _).tupled , Book.unapply)
^
What's the mistake here?
What am I not getting about slick?
Thank you in advance!

Slick is not an ORM so there's no auto mapping from a foreign key to an entity, the question has been asked many times on SO (here, here just to name two).
Let's assume for a moment that what you are trying to do is possible:
implicit val authorMapper =
MappedColumnType.base[Author, Long](_.id.get, AuthorDAO.DAO.findById(_))
So you are telling the projection to use the row id and fetch the entity related to that id, there are three problems in your case, first you don't handle failures (id.get), second you primary key is optional (which shouldn't be).
The third problem is that slick will fetch each entity in a separate way, what I mean by this is, you execute some query and get 100 books, slick will make 100 other queries only to fetch the related entity, performance wise is suicide, you are completely bypassing the SQL layer (joins) which has the best performance only to have the possibility of shortening your DAOs.
Fortunately this doesn't seem to be possible, mappers are used for non supported types by slick (for example different date formats without having to explicitly using functions) or to inject format conversion when fetching/inserting rows, have a look at the documentation on how to use joins (depending on your version).

Ende Neu's answer is more knowledgeable and relevant to the use case described in the question, and probably a more proper and correct answer.
The following is merely an observation I made which may have helped tmnd91 by answering the question:
What's the mistake here?
I noticed that:
case class Book( ... review : String){}
does not match with:
def review = column[Option[String]]("REVIEW")
It should be:
def review = column[String]("REVIEW")

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Cassandra spark connector write nested optional case class - scala

Related

Scala case class whose fields can be mandatory and optional at different instances

Codecs UDT's in Cassandra

Explicit cast reading .csv with case class Spark 2.1.0

Assertion on retrieving data from cassandra

slick 2 mapping many to one relationship with mappedColumn

Categories

Resources