UnsupportedOperationExeception when writing in Cassandra table - scala

I have the following class :
case class AucLog(timestamp: UUID, modelname: String, good: Int,
list: List[Double])
class AucDatabase(override val connector : CassandraConnection)
extends Database[AucDatabase](connector) {
object users extends CMetrics with Connector
}
object AucDatabase extends AucDatabase(AucConnector.connector)
abstract class AucMetrics extends Table[AucMetrics, AucLog] {
object id extends UUIDColumn with PartitionKey
object name extends StringColumn
object ud extends IntColumn
object zob extends ListColumn[Double]
}
abstract class CMetrics extends AucMetrics with RootConnector {
def store(metric : AucLog): Future[ResultSet] = {
insert.value(_.id, metric.timestamp)
.value(_.name, metric.modelname)
.value(_.ud, metric.good)
.value(_.zob, metric.list)
.consistencyLevel_=(ConsistencyLevel.ONE)
.future()
}
DmpDatabase.create()
AucDatabase.create()
val pd = DmpDatabase.users.myselect()
val timeout = new Timeout(500000)
val result = Await.result(pd, timeout.duration)
"<--- this attempt to read from my database is working - no problemo ---> "
val todf = result.records.map { elem => elem.idcat }
val rdd = spark.sparkContext.parallelize(todf)
import spark.implicits._
rdd.toDF().show(100)
---> I'm storing one line in my database to be sure that it is not empty when
i'm reading it.
AucDatabase.users.store(new AucLog(UUIDs.timeBased(), "tyron", 0, List(0.1)))
val second = AucDatabase.users.myselect()
val resultmetric = Await.result(second, timeout.duration)
-----> this line cause the Execption
val r = spark.sparkContext.parallelize(resultmetric.records).toDF().show(
What I do not understand is that i'm doing basically the same thing with both databases. Yet, one is throwing the following error : UnsupportedOperationException : No encoder found for com.outworkers.phantom.dsl.UUID.
Thank you.

First of all the store method is macro generated so you don't need to create one. The problem you are having is likely not related to phantom at all, but to some kind of Spark construct.
The phantom UUID is nothing more than a type alias for java.util.UUID, so I'm quite surprised there is no straight up encoder for a default type. If you help me out with the full name of the Encoder class, including the package, I can figure out explicitly what is broken.

Related

Using Phantom 2 with an existing Cassandra session

I am trying to migrate our current implementation from Phantom 1.28.16 to 2.16.4 but I'm running into problems with the setup.
Our framework is providing us with the Cassandra session object during startup which doesn't seem to fit with Phantom. I am trying to get Phantom to accept that already instantiated session instead of going through the regular CassandraConnection object.
I'm assuming that we can't use the Phantom Database class because of this but I am hoping that there still is some way to set up and use the Tables without using that class.
Is this doable?
I ended up doing the following to be able to use Phantom with an existing connection:
Defined a new trait PhantomTable to be used instead of Phantoms 'Table' trait. They are identical except for removal of the RootConnector
trait PhantomTable[T <: PhantomTable[T, R], R] extends CassandraTable[T, R] with TableAliases[T, R]
Defined my tables by extending the PhantomTable trait and also made it to an object. Here I had to import all of the TableHelper macro to get it to compile
...
import com.outworkers.phantom.macros.TableHelper._
final case class Foo(id: String, name: Option[String])
sealed class FooTable extends PhantomTable[FooTable, Foo] {
override val tableName = "foo"
object id extends StringColumn with PartitionKey
object name extends OptionalStringColumn
}
object FooTable extends FooTable
After that it is possible to use all the wanted methods on the FooTable object as long as an implicit Keyspace and Session exists in the scope.
This is a simple main program that shows how the tables can be used
object Main extends App {
val ks = "foo_keyspace"
val cluster = Cluster.builder().addContactPoints("127.0.0.1").build()
implicit val keyspace: KeySpace = KeySpace(ks)
implicit val session: Session = cluster.connect(ks)
val res = for {
_ <- FooTable.create.ifNotExists.future
_ <- FooTable.insert.value(_.id, "1").value(_.name, Some("data")).future
row <- FooTable.select.where(_.id eqs "1").one
} yield row
val r = Await.result(res, 10.seconds)
println(s"Row: $r")
}

scala-cass generic read from cassandra table as case class

I am attempting to use scala-cass in order to read from cassandra and convert the resultset to a case class using resultSet.as[CaseClass]. This works great when running the following.
import com.weather.scalacass.syntax._
case class TestTable(id: String, data1: Int, data2: Long)
val resultSet = session.execute(s"select * from test.testTable limit 10")
resultSet.one.as[TestTable]
Now I am attempting to make this more generic and I am unable to find the proper type constraint for the generic class.
import com.weather.scalacass.syntax._
case class TestTable(id: String, data1: Int, data2: Long)
abstract class GenericReader[T] {
val table: String
val keyspace: String
def getRows(session: Session): T = {
val resultSet = session.execute(s"select * from $keyspace.$table limit 10")
resultSet.one.as[T]
}
}
I implement this class with the desired case class and attempt to call getRows on the created Object.
object TestTable extends GenericReader[TestTable] {
val keyspace = "test"
val table = "TestTable"
}
TestTable.getRows(session)
This throws an exception could not find implicit value for parameter ccd: com.weather.scalacass.CCCassFormatDecoder[T].
I am trying to add a type constraint to GenericReader in order to ensure the implicit conversion will work. However, I am unable to find the proper type. I am attempting to read through scala-cass in order to find the proper constraint but I have had no luck so far.
I would also be happy to use any other library that can achieve this.
Looks like as[T] requires an implicit value that you don't have in scope, so you'll need to require that implicit parameter in the getRows method as well.
def getRows(session: Session)(implicit cfd: CCCassFormatDecoder[T]): T
You could express this as a type constraint (what you were looking for in the original question) using context bounds:
abstract class GenericReader[T:CCCassFormatDecoder]
Rather than try to bound your generic T type, it might be easier to just pass through the missing implicit parameter:
abstract class GenericReader[T](implicit ccd: CCCassFormatDecoder[T]) {
val table: String
val keyspace: String
def getRows(session: Session): T = {
val resultSet = session.execute(s"select * from $keyspace.$table limit 10")
resultSet.one.as[T]
}
}
Finding a concrete value for that implicit can then be deferred to when you narrow that T to a specific class (like object TestTable extends GenericReader[TestTable])

Reusing Slick's DB driver code in data access layer

I'm trying to wrap my head around data access with Slick 3.0. After consulting various github examples, I've came with following design.
A singleton Slick object where the DataSource and Driver instances are injected
class Slick(dataSource: DataSource, val driver: JdbcDriver) {
val db = driver.api.Database.forDataSource(dataSource)
}
A trait per DB table where the mappings are defined
The trait is mixed in the upper layer where the queries are constructed.
trait RecipeTable {
protected val slick: Slick
// the ugly import that have to be added when Slick API is used
import slick.driver.api._
type RecipeRow = (Option[Long], String)
class RecipeTable(tag: Tag) extends Table[RecipeRow](tag, "recipe") {
def id = column[Option[Long]]("id", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def * = (id, name)
}
protected val recipes = TableQuery[RecipeTable]
}
Now there's obvious drawback that in every *Table trait and also in every place where that is mixed in I need to duplicate import slick.driver.api._ in order to have all Slick's stuff in scope.
This something I'd like to avoid. Ideally the import will be defined only once and reused in downstream components.
Could you please suggest the a design that addresses such a duplication?
I was mainly inspired by this example, however the imports are duplicated there as well.
That "ugly" import is actually a good thing about slick's design. But your way of slick usage can be improved as following,
Create a trait which will provide JdbcDriver
package demo.slick.dbl
trait SlickDriverComponent {
val driver: JdbcDriver
}
trait SlickDBComponent extends SlickDriverComponent {
val db: driver.api.Database
}
Now define your DAO traits as traits dependant on this trait,
package demo.slick.dao
import demo.slick.dbl.SlickDBComponent
trait RecipeDAO { self: SlickDBComponent =>
import driver.api._
type RecipeRow = (Option[Long], String)
class RecipeTable(tag: Tag) extends Table[RecipeRow](tag, "recipe") {
def id = column[Option[Long]]("id", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def * = (id, name)
}
val recipes = TableQuery[RecipeTable]
def get5Future = db.run(recipes.take(5).result)
}
When it comes to actually connecting with DB and doing things,
package demo.slick.dbl
trait MySqlDriverProvider extends SlickDriverComponent {
val driver = slick.driver.MySQLDriver
}
object MySqlDBConnection extends MySqlDriverProvider {
val connection = driver.api.Database.forConfig("mysql")
}
trait MySqlDBProvider extends SlickDBComponent {
val driver = slick.driver.MySQLDriver
val db: Database = MySqlDBConnection.connection
}
trait PostgresDriverProvider extends SlickDriverComponent {
val driver = slick.driver.PostgresDriver
}
object PostgresDBConnection extends PostgresDriverProvider {
val connection = driver.api.Database.forConfig("postgres")
}
trait PostgresDBProvider extends SlickDBComponent {
val driver = slick.driver.PostgresDriver
val db: Database = PostgresDBConnection.connection
}
Now finally define your DAO objects as follows,
package demo.slick.dao
import demo.slick.dbl.MySqlDBProvider
object MySqlRecipeDAO extends RecipeDAO with MySqlDBProvider
object PostgresRecipeDAO extends RecipeDAO with PostgresDBProvider
Now, you can use these as follows,
pakcage demo.slick
import scala.util.{Failure, Success, Try}
import scala.concurrent.ExecutionContext.Implicits.global
import demo.slick.RecipeDAO
object App extends Application {
val recipesFuture = MysqlRecipeDAO.get5Future
recipesFuture.onComplete({
case Success(seq) => println("Success :: found :: " + seq)
case Failure(ex) => println("Failure :: failed :: " + ex.getMessage)
})
}
Now... as we all know that different databases have different sets of functionalities and hence the "things" available to you will depend upon the driver being used.
So that need to ugly import every time is so that you can write your DAO traits once and then be able to use them with whatever database specific driver implementation you want.

phantom cassandra multiple tables throw exceptions

I'm using phantom to connect cassandra in play framework. Created the first class following the tutorial. Everything works fine.
case class User(id: String, page: Map[String,String])
sealed class Users extends CassandraTable[Users, User] {
object id extends StringColumn(this) with PartitionKey[String]
object page extends MapColumn[String,String](this)
def fromRow(row: Row): User = {
User(
id(row),
page(row)
)
}
}
abstract class ConcreteUsers extends Users with RootConnector {
def getById(page: String): Future[Option[User]] = {
select.where(_.id eqs id).one()
}
def create(id:String, kv:(String,String)): Future[ResultSet] = {
insert.value(_.id, id).value(_.page, Map(kv)).consistencyLevel_=(ConsistencyLevel.QUORUM).future()
}
}
class UserDB(val keyspace: KeySpaceDef) extends Database(keyspace) {
object users extends ConcreteUsers with keyspace.Connector
}
object UserDB extends ResourceAuthDB(conn) {
def createTable() {
Await.ready(users.create.ifNotExists().future(), 3.seconds)
}
}
However, when I try to create another table following the exact same way, play throws the exception when compile:
overriding method session in trait RootConnector of type => com.datastax.driver.core.Session;
How could I build create another table? Also can someone explain what causes the exception? Thanks.
EDIT
I moved the connection part together in one class:
class UserDB(val keyspace: KeySpaceDef) extends Database(keyspace) {
object users extends ConcreteUsers with keyspace.Connector
object auth extends ConcreteAuthInfo with keyspace.Connector
}
This time the error message is:
overriding object session in class AuthInfo; lazy value session in trait Connector of
type com.datastax.driver.core.Session cannot override final member
Hope the message helps identify the problem.
The only problem I see here is not to do with connectors, it's here:
def getById(page: String): Future[Option[User]] = {
select.where(_.id eqs id).one()
}
This should be:
def getById(page: String): Future[Option[User]] = {
select.where(_.id eqs page).one()
}
Try this, I was able to compile. Is RootConnector the default one or do you define another yourself?
It took me 6 hours to figure out the problem. It is because there is a column named "session" in the other table. It turns out that you need to be careful when selecting column names. "session" obviously gives the above exception. Cassandra also has a long list of reserved keywords. If you accidentally use one of them as your column name, phantom will not throw any exceptions (maybe it should?). I don't know if any other keywords are reserved in phantom. A list of them will be really helpful.

How to define schema for custom type in Spark SQL?

The following example code tries to put some case objects into a dataframe. The code includes the definition of a case object hierarchy and a case class using this trait:
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.SQLContext
sealed trait Some
case object AType extends Some
case object BType extends Some
case class Data( name : String, t: Some)
object Example {
def main(args: Array[String]) : Unit = {
val conf = new SparkConf()
.setAppName( "Example" )
.setMaster( "local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = sc.parallelize( Seq( Data( "a", AType), Data( "b", BType) ), 4).toDF()
df.show()
}
}
When executing the code, I unfortunately encounter the following exception:
java.lang.UnsupportedOperationException: Schema for type Some is not supported
Questions
Is there a possibility to add or define a schema for certain types (here type Some)?
Does another approach exist to represent this kind of enumerations?
I tried to use Enumeration directly, but also without success. (see below)
Code for Enumeration:
object Some extends Enumeration {
type Some = Value
val AType, BType = Value
}
Thanks in advance. I hope, that the best approach is not to use strings instead.
Spark 2.0.0+:
UserDefinedType has been made private in Spark 2.0.0 and as for now it has no Dataset friendly replacement.
See: SPARK-14155 (Hide UserDefinedType in Spark 2.0)
Most of the time statically typed Dataset can serve as replacement
There is a pending Jira SPARK-7768 to make UDT API public again with target version 2.4.
See also How to store custom objects in Dataset?
Spark < 2.0.0
Is there a possibility to add or define a schema for certain types (here type Some)?
I guess the answer depends on how badly you need this. It looks like it is possible to create an UserDefinedType but it requires access to DeveloperApi and is not exactly straightforward or well documented.
import org.apache.spark.sql.types._
#SQLUserDefinedType(udt = classOf[SomeUDT])
sealed trait Some
case object AType extends Some
case object BType extends Some
class SomeUDT extends UserDefinedType[Some] {
override def sqlType: DataType = IntegerType
override def serialize(obj: Any) = {
obj match {
case AType => 0
case BType => 1
}
}
override def deserialize(datum: Any): Some = {
datum match {
case 0 => AType
case 1 => BType
}
}
override def userClass: Class[Some] = classOf[Some]
}
You should probably override hashCode and equals as well.
Its PySpark counterpart can look like this:
from enum import Enum, unique
from pyspark.sql.types import UserDefinedType, IntegerType
class SomeUDT(UserDefinedType):
#classmethod
def sqlType(self):
return IntegerType()
#classmethod
def module(cls):
return cls.__module__
#classmethod
def scalaUDT(cls): # Required in Spark < 1.5
return 'net.zero323.enum.SomeUDT'
def serialize(self, obj):
return obj.value
def deserialize(self, datum):
return {x.value: x for x in Some}[datum]
#unique
class Some(Enum):
__UDT__ = SomeUDT()
AType = 0
BType = 1
In Spark < 1.5 Python UDT requires a paired Scala UDT, but it look like it is no longer the case in 1.5.
For a simple UDT like you can use simple types (for example IntegerType instead of whole Struct).