Codecs UDT's in Cassandra

Codecs UDT's in Cassandra - scala

I have a table in Cassandra with Map
I have define MyUDT as a case class in Scala and trying to read it. When I save it to Cassandra, it works, but later, I want to read it with the driver and I have some problems.
I'm trying to do this:
val result = session.execute(s"select ls_my_udt from ks.myTable WHERE id = 0")
val queryConContrato = result.one().getMap[String, MyUDT]("ls_my_udt ", classOf[String], classOf[MyUDT])
When I execute this query I get an error:
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [frozen<ks.myudt> <-> com.MyUDT]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:679)
My TYPE is:
CREATE TYPE MyUDT (
id text,
field text
);
My case class is:
This is the error if I don't annote the case class
java.lang.IllegalArgumentException: #UDT annotation was not found on class com.MyUDT
And my case class
#UDT(keyspace="ks", name="myUDT")
case class ProductUDT(id: String,
field: String )

Related

Scala, couchbase - convert AsyncN1qlQueryResult into custom object

I have a case class with simple data:
case class MyClass(
details: Details,
names: List[String],
id: String,
)
I have created a couchbase query which should retrieve all documents from database:
val query = s"SELECT * from `docs`"
for {
docs<- bucket
.query(N1qlQuery.simple(query))
.flatMap((rows: AsyncN1qlQueryResult) => rows.rows())
.toList
.parse[F]
.map(_.asScala.toList)
} yield docs
parse[F] is a simple function to convert from Observable. The problem here is that I got an error type mismatch which says that found List[AsyncN1qlQueryResult] instead of required List[MyClass]. How should I convert from AsyncN1qlQueryResult into MyClass objects?
I'm using Circe to parse documents.

I'm happy to report that there is now an early release of the native Couchbase Scala SDK available, which does include support for converting each row result of a N1QL query directly into your case class:
case class Address(line1: String)
case class User(name: String, age: Int, addresses: Seq[Address])
object User {
// Define a Codec so SDK knows how to convert User to/from JSON
implicit val codec: Codec[User] = Codecs.codec[User]
}
val statement = """select * from `users`;"""
val rows: Try[Seq[User]] = cluster.query(statement)
.map(result => result
.rows.flatMap(row =>
row.contentAs[User].toOption))
rows match {
case Success(rows: Seq[User]) =>
rows.foreach(row => println(row))
case Failure(err) =>
println(s"Error: $err")
}
This is the blocking API. There's also APIs to allow getting the results as Futures, or as Flux/Monos from Reactive Programming, so you have a lot of flexibility on how to get the data.
You can see how to get started here: https://docs.couchbase.com/scala-sdk/1.0alpha/hello-world/start-using-sdk.html
Please note that this is an alpha release to let the community get an idea where we're heading with it and give them an opportunity to provide feedback. It shouldn't be used in production. The forums (https://forums.couchbase.com/) are the best place to raise any feedback you have.

Slick: comparing Rep[Option[Blob]] with Rep[Int]

I am using Slick to analyze a legacy MySQL database (with MyISAM engine). And I’m using implicit classes to navigate through entities, e.g. user.logs with this code:
implicit class UserNav(user: User) {
def logs = Logs.filter(_.userId === user.id)
}
However, in this case the key is an INT but the foreign key is a BLOB. Using a MySQL client I can run select userId * 1 from logs to get an INT from the BLOB, and I can even join despite the different datatypes. But with Slick I get a compile errors for the code above.
error: Cannot perform option-mapped operation [ERROR]
with type: (Option[java.sql.Blob], Int) => R [ERROR]
for base type:(java.sql.Blob, java.sql.Blob) => Boolean
error: ambiguous implicit values:
both value BooleanOptionColumnCanBeQueryCondition in object CanBeQueryCondition of type => slick.lifted.CanBeQueryCondition[slick.lifted.Rep[Option[Boolean]]]
and value BooleanCanBeQueryCondition in object CanBeQueryCondition of type => slick.lifted.CanBeQueryCondition[Boolean]
match expected type slick.lifted.CanBeQueryCondition[Nothing]
Any idea how to solve this?

As I found out by trying, just pretending the column has a different data type works.
So besides the original val in class Log (which extends Table[LogRow])
val userId: Rep[Option[java.sql.Blob]] = column[Option[java.sql.Blob]]("userId", O.Default(None))
I added
val userId_asInt: Rep[Option[Int]] = column[Option[Int]]("userId", O.Default(None))
and changed the navigation def to
def logs = Logs.filter(_.userId_asInt === user.id)
This did the trick – at least with MySQL/MyISAM (no idea if this works with other engines or even other DBMS).
Instead I could also have replaced the data type in original val userId, but then I would also have to change that data type all over the place like in Log.* and Log.? ...

Spark error: Exception in thread "main" java.lang.UnsupportedOperationException

I am writing a Scala/spark program which would find the max salary of the employee. The employee data is available in a CSV file, and the salary column has a comma separator for thousands and also it has a $ prefixed to it e.g. $74,628.00.
To handle this comma and dollar sign, I have written a parser function in scala which would split each line on "," and then map each column to individual variables to be assigned to a case class.
My parser program looks like below. In this to eliminate the comma and dollar signs I am using the replace function to replace it with empty, and then finally typecase to Int.
def ParseEmployee(line: String): Classes.Employee = {
val fields = line.split(",")
val Name = fields(0)
val JOBTITLE = fields(2)
val DEPARTMENT = fields(3)
val temp = fields(4)
temp.replace(",","")//To eliminate the ,
temp.replace("$","")//To remove the $
val EMPLOYEEANNUALSALARY = temp.toInt //Type cast the string to Int
Classes.Employee(Name, JOBTITLE, DEPARTMENT, EMPLOYEEANNUALSALARY)
}
My Case class look like below
case class Employee (Name: String,
JOBTITLE: String,
DEPARTMENT: String,
EMPLOYEEANNUALSALARY: Number,
)
My spark dataframe sql query looks like below
val empMaxSalaryValue = sc.sqlContext.sql("Select Max(EMPLOYEEANNUALSALARY) From EMP")
empMaxSalaryValue.show
when I Run this program I am getting this below exception
Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for Number
- field (class: "java.lang.Number", name: "EMPLOYEEANNUALSALARY")
- root class: "Classes.Employee"
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:625)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:619)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:607)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:607)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:438)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:282)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:272)
at CalculateMaximumSalary$.main(CalculateMaximumSalary.scala:27)
at CalculateMaximumSalary.main(CalculateMaximumSalary.scala)
Any idea why I am getting this error? what is the mistake I am doing here and why it is not able to typecast to number?
Is there any better approach to handle this problem of getting maximum salary of the employee?

Spark SQL provides only a limited number of Encoders which target concrete classes. Abstract classes like Number are not supported (can be used with limited binary Encoders).
Since you convert to Int anyway, just redefine the class:
case class Employee (
Name: String,
JOBTITLE: String,
DEPARTMENT: String,
EMPLOYEEANNUALSALARY: Int
)

Explicit cast reading .csv with case class Spark 2.1.0

I have the following case class:
case class OrderDetails(OrderID : String, ProductID : String, UnitPrice : Double,
Qty : Int, Discount : Double)
I am trying read this csv: https://github.com/xsankar/fdps-v3/blob/master/data/NW-Order-Details.csv
This is my code:
val spark = SparkSession.builder.master(sparkMaster).appName(sparkAppName).getOrCreate()
import spark.implicits._
val orderDetails = spark.read.option("header","true").csv( inputFiles + "NW-Order-Details.csv").as[OrderDetails]
And the error is:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot up cast `UnitPrice` from string to double as it may truncate
The type path of the target object is:
- field (class: "scala.Double", name: "UnitPrice")
- root class: "es.own3dh2so4.OrderDetails"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
Why can not it be transformed if all fields are "doubles" values? What do not I understand?
Spark version 2.1.0, Scala version 2.11.7

You just need to explicitly cast your field to a Double:
val orderDetails = spark.read
.option("header","true")
.csv( inputFiles + "NW-Order-Details.csv")
.withColumn("unitPrice", 'UnitPrice.cast(DoubleType))
.as[OrderDetails]
On a side note, by Scala (and Java) convention, your case class constructor parameters should be lower camel case:
case class OrderDetails(orderID: String,
productID: String,
unitPrice: Double,
qty: Int,
discount: Double)

If we want to change the datatype for multiple columns; if we use withColumn option it will look ugly.
The better way to apply schema for the data is
Get the Case Class schema using Encoders as shown below
val caseClassschema = Encoders.product[CaseClass].schema
Apply this schema while reading data
val data = spark.read.schema(caseClassschema)

Schema for type Any is not supported

I'm trying to create a spark UDF to extract a Map of (key, value) pairs from a User defined case class.
The scala function seems to work fine, but when I try to convert that to a UDF in spark2.0, I'm running into the " Schema for type Any is not supported" error.
case class myType(c1: String, c2: Int)
def getCaseClassParams(cc: Product): Map[String, Any] = {
cc
.getClass
.getDeclaredFields // all field names
.map(_.getName)
.zip(cc.productIterator.to) // zipped with all values
.toMap
}
But when I try to instantiate a function value as a UDF it results in the following error -
val ccUDF = udf{(cc: Product, i: String) => getCaseClassParams(cc).get(i)}
java.lang.UnsupportedOperationException: Schema for type Any is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:716)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:668)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:654)
at org.apache.spark.sql.functions$.udf(functions.scala:2841)

The error message says it all. You have an Any in the map. Spark SQL and Dataset api does not support Any in the schema. It has to be one of the supported type (which is a list of basic types such as String, Integer etc. a sequence of supported types or a map of supported types).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Codecs UDT's in Cassandra - scala

Related

Scala, couchbase - convert AsyncN1qlQueryResult into custom object

Slick: comparing Rep[Option[Blob]] with Rep[Int]

Spark error: Exception in thread "main" java.lang.UnsupportedOperationException

Explicit cast reading .csv with case class Spark 2.1.0

Schema for type Any is not supported

Categories

Resources