I am trying to write an RDD that contains public classes in Cassandra with Spark
class Test(private var id: String, private var randomNumber: Integer, private var lastUpdate: Instant) {
def setId(id: String): Unit = { this.id = id }
def getId: String = { this.id }
def setLastUpdater(lastUpdater: Instant): Unit = { this.lastUpdater = lastUpdater }
def getLastUpdater: Instant = { this.lastUpdater }
def setRandomNumber(number: Integer): Unit = { this.randomNumber = randomNumber }
def getRandomNumber: Integer = { this.randomNumber }
}
This class has all the Setters and Getters to maintain the encapsulation and I need it to not be a Case Class because I have to modify the values during the transformations.
The table corresponding to this entity in Cassandra has slightly different names for the fields:
CREATE TABLE IF NOT EXISTS test.test (
id uuid,
random_number int,
last_update timestamp,
PRIMARY KEY (id)
)
I am trying to write this RDD with the method saveToCassandra
implicit val connector = CassandraConnector(sc.getConf)
val rdd: RDD[Test]
rdd.saveToCassandra("test", "test")
but the method throws me an exception for the coincidence of the names of the attributes of the class with the names of the fields in the table
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Columns not found in entity.Test: [id, random_number, last_update]
at scala.Predef$.require(Predef.scala:277)
at com.datastax.spark.connector.mapper.DefaultColumnMapper.columnMapForWriting(DefaultColumnMapper.scala:106)
at com.datastax.spark.connector.mapper.MappedToGettableDataConverter$$anon$1.<init>(MappedToGettableDataConverter.scala:35)
at com.datastax.spark.connector.mapper.MappedToGettableDataConverter$.apply(MappedToGettableDataConverter.scala:26)
at com.datastax.spark.connector.writer.DefaultRowWriter.<init>(DefaultRowWriter.scala:16)
at com.datastax.spark.connector.writer.DefaultRowWriter$$anon$1.rowWriter(DefaultRowWriter.scala:30)
at com.datastax.spark.connector.writer.DefaultRowWriter$$anon$1.rowWriter(DefaultRowWriter.scala:28)
at com.datastax.spark.connector.writer.TableWriter$.apply(TableWriter.scala:433)
at com.datastax.spark.connector.writer.TableWriter$.apply(TableWriter.scala:417)
at com.datastax.spark.connector.RDDFunctions.saveToCassandra(RDDFunctions.scala:35)
how can I write the entity in Cassandra without having to call the attributes the same and the attributes are private in the class?
saveToCassandra allows you to provide an optional ColumnSelector:
def saveToCassandra(
keyspaceName: String,
tableName: String,
columns: ColumnSelector = AllColumns,
writeConf: WriteConf = WriteConf.fromSparkConf(sparkContext.getConf))(...): Unit
In your case you could use the following selector:
def selector = SomeColumns(
ColumnName("id"),
ColumnName("random_number", alias = Some("randomNumber")),
ColumnName("last_update", alias = Some("lastUpdate"))
)
Btw, while not the typical (and recommended) use of a case class, you could absolutely define fields as vars and benefit from using a typed Dataset. That makes it very easy to rename fields before writing to Cassandra.
I pretty new in using slick and now I faced with the issue how to retrieve some data from two tables.
I have one table
class ExecutionTable(tag: Tag) extends Table[ExecTuple](tag, "execution") {
val id: Rep[String] = column[String]("id")
val executionDefinitionId: Rep[Long] = column[Long]("executionDefinitionId")
// other fields are omitted
def * = ???
}
and another table
class ServiceStatusTable(tag: Tag)
extends Table[(String, Option[String])](tag, "serviceStatus") {
def serviceId: Rep[String] = column[String]("serviceId")
def detail: Rep[String] = column[String]("detail")
def * = (serviceId, detail.?)
}
In Dao I convert data from this two tables to a business object
case class ServiceStatus(
id: String,
detail: Option[String] = None, //other fields
)
like this
private lazy val getServiceStatusCompiled = Compiled {
(id: Rep[String], tenantId: Rep[String]) =>
for {
exec <- getExecutionById(id, tenantId)
status <- serviceStatuses if exec.id === status.serviceId
} yield mapToServiceStatus(exec, status)
}
and later
def getServiceStatus(id: String, tenantId: String)
: Future[Option[ServiceStatus]] = db
.run(getServiceStatusCompiled(id, tenantId).result.transactionally)
.map(_.headOption)
The problem is that not for all entries from table execution exists entry in table serviceStatus. I cannot modify table execution and add to it field details as it is only service specific.
When I run query in case when for entry from execution exists entry in serviceStatus all works as expected. But if there is no entry in serviceStatus, Future[None] is returned.
Question: Is there any option to obtain status in for comprehension as Option depending on existing entry in table serviceStatus or some else workaround?
Usually, in case when join condition does not find corresponding record in the "right" table but the result should still contain the row from "left" table, left join is used.
In your case you can do something like:
Execution
.filter(...execution table filter...)
.joinLeft(ServiceStatus).on(_.id===_.serviceId)
This gives you pair of
(Execution, Rep[Option[ServiceStatus]])
and after query execution:
(Execution, Option[ServiceStatus])
I'm using the play 2.1 framework for scala and the MongoDB Salat plugin.
When I update an Enumeration.Value I got an exception:
java.lang.IllegalArgumentException: can't serialize class scala.Enumeration$Val
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:270) ~[mongo-java-driver-2.11.1.jar:na]
at org.bson.BasicBSONEncoder.putIterable(BasicBSONEncoder.java:295) ~[mongo-java-driver-2.11.1.jar:na]
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:234) ~[mongo-java-driver-2.11.1.jar:na]
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:174) ~[mongo-java-driver-2.11.1.jar:na]
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:120) ~[mongo-java-driver-2.11.1.jar:na]
at com.mongodb.DefaultDBEncoder.writeObject(DefaultDBEncoder.java:27) ~[mongo-java-driver-2.11.1.jar:na]
Inserting the Enumeration.Value works fine. My case class looks like:
case class User(
#Key("_id") id: ObjectId = new ObjectId,
username: String,
email: String,
#EnumAs language: Language.Value = Language.DE,
balance: Double,
added: Date = new Date)
and my update code:
object UserDAO extends ModelCompanion[User, ObjectId] {
val dao = new SalatDAO[User, ObjectId](collection = mongoCollection("users")) {}
def update(): WriteResult = {
UserDAO.dao.update(q = MongoDBObject("_id" -> new ObjectId(id)), o = MongoDBObject("$set" -> MongoDBObject("language" -> Language.EN))))
}
}
Any ideas how to get that working?
EDIT:
workaround: it works if I cast the Enumeration.Value toString, but that's not how it should be...
UserDAO.dao.update(q = MongoDBObject("_id" -> new ObjectId(id)), o = MongoDBObject("$set" -> MongoDBObject("language" -> Language.EN.toString))))
It is possible to add a BSON encoding for Enumeration. So, the conversion is done in a transparent manner.
Here is the code
RegisterConversionHelpers()
custom()
def custom() {
val transformer = new Transformer {
def transform(o: AnyRef): AnyRef = o match {
case e: Enumeration$Val => e.toString
case _ => o
}
}
BSON.addEncodingHook(classOf[Enumeration$Val], transformer)
}
}
At the time of writing mongoDB doesn't place nice with scala enums, I use a decorator method as a work around.
Say you have this enum:
object EmployeeType extends Enumeration {
type EmployeeType = Value
val Manager, Worker = Value
}
and this mongodb record:
import EmployeeType._
case class Employee(
id: ObjectId = new ObjectId
)
In your mongoDB, store the integer index of the enum instead of the enum itself:
case class Employee(
id: ObjectId = new ObjectId,
employeeTypeIndex: Integer = 0
){
def employeeType = EmployeeType(employeeTypeIndex); /* getter */
def employeeType_=(v : EmployeeType ) = { employeeTypeIndex= v.id} /* setter */
}
The extra methods implement getters and setters for the employee type enum.
Salat only does its work when you serialize to and from your model object with the grater, not when you do queries with MongoDB-objects yourself. The mongo driver api knows nothing about the annotation #EnumAs. (In addition to that even if you could use salat for that, how would it be able to know that you are referring to User.language in a generic key->value MongoDBObject?)
So you have to do like you describe in your workaround. Provide the "value" of the enum yourself when you want to do queries.
I'm trying to understand how Slick works and how to use it... and looking at their examples in GitHub I ended up with this code snippet in MultiDBCakeExample.scala:
trait PictureComponent { this: Profile => //requires a Profile to be mixed in...
import profile.simple._ //...to be able import profile.simple._
object Pictures extends Table[(String, Option[Int])]("PICTURES") {
...
def * = url ~ id
val autoInc = url returning id into { case (url, id) => Picture(url, id) }
def insert(picture: Picture)(implicit session: Session): Picture = {
autoInc.insert(picture.url)
}
}
}
I suppose the * method returns a row in the table, while autoInc should somehow provide functionality for automatically incrementing the entity id... but to be honest I've some trouble in understanding this piece of code. What does returning refer to? What does autoInc return?
I looked at the Slick documentation but I was unable to find helpful information. Any help would be really appreciated ;-)
Because that autoInc can be confusing I will provide you a working example (please note that my DB is PostgreSQL so I need that hack with forInsert in order to make Postgresql driver increment auto-inc values).
case class GeoLocation(id: Option[Int], latitude: Double, longitude: Double, altitude: Double)
/**
* Define table "geo_location".
*/
object GeoLocations extends RichTable[GeoLocation]("geo_location") {
def latitude = column[Double]("latitude")
def longitude = column[Double]("longitude")
def altitude = column[Double]("altitude")
def * = id.? ~ latitude ~ longitude ~ altitude <> (GeoLocation, GeoLocation.unapply _)
def forInsert = latitude ~ longitude ~ altitude <> ({ (lat, long, alt) => GeoLocation(None, lat, long, alt) },
{ g: GeoLocation => Some((g.latitude, g.longitude, g.altitude)) })
}
My RichTable is an abstract class in order to not declare ids for each Table I have but just extend this:
abstract class RichTable[T](name: String) extends Table[T](name) {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
val byId = createFinderBy(_.id)
}
and use it like:
GeoLocations.forInsert.insert(GeoLocation(None, 22.23, 25.36, 22.22))
Since you pass None for id, it will be generated automatically by PostgreSql driver when Slick inserts this new entity.
I have a few weeks since I started with Slick and I really recommend it!
UPDATE: If you want to not use forInsert projections, another approach is the following - in my case the entity is Address.
Create sequence for each table on schema creation:
session.withTransaction {
DBSchema.tables.drop
DBSchema.tables.create
// Create schemas to generate ids too.
Q.updateNA("create sequence address_seq")
}
Define a method to generate the ids using sequences (I defined this once in RichTable class:
def getNextId(seqName: String) = Database { implicit db: Session =>
Some((Q[Int] + "select nextval('" + seqName + "_seq') ").first)
}
and in the mapper override insert method like:
def insert(model : Address) = Database { implicit db: Session =>
*.insert(model.copy(id = getNextId(classOf[Address].getSimpleName())))
}
And now, you can pass None when you do an insert and this methods will do a nice work for you...
How does one insert records into PostgreSQL using AutoInc keys with Slick mapped tables? If I use and Option for the id in my case class and set it to None, then PostgreSQL will complain on insert that the field cannot be null. This works for H2, but not for PostgreSQL:
//import scala.slick.driver.H2Driver.simple._
//import scala.slick.driver.BasicProfile.SimpleQL.Table
import scala.slick.driver.PostgresDriver.simple._
import Database.threadLocalSession
object TestMappedTable extends App{
case class User(id: Option[Int], first: String, last: String)
object Users extends Table[User]("users") {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
def first = column[String]("first")
def last = column[String]("last")
def * = id.? ~ first ~ last <> (User, User.unapply _)
def ins1 = first ~ last returning id
val findByID = createFinderBy(_.id)
def autoInc = id.? ~ first ~ last <> (User, User.unapply _) returning id
}
// implicit val session = Database.forURL("jdbc:h2:mem:test1", driver = "org.h2.Driver").createSession()
implicit val session = Database.forURL("jdbc:postgresql:test:slicktest",
driver="org.postgresql.Driver",
user="postgres",
password="xxx")
session.withTransaction{
Users.ddl.create
// insert data
print(Users.insert(User(None, "Jack", "Green" )))
print(Users.insert(User(None, "Joe", "Blue" )))
print(Users.insert(User(None, "John", "Purple" )))
val u = Users.insert(User(None, "Jim", "Yellow" ))
// println(u.id.get)
print(Users.autoInc.insert(User(None, "Johnathan", "Seagul" )))
}
session.withTransaction{
val queryUsers = for {
user <- Users
} yield (user.id, user.first)
println(queryUsers.list)
Users.where(_.id between(1, 2)).foreach(println)
println("ID 3 -> " + Users.findByID.first(3))
}
}
Using the above with H2 succeeds, but if I comment it out and change to PostgreSQL, then I get:
[error] (run-main) org.postgresql.util.PSQLException: ERROR: null value in column "id" violates not-null constraint
org.postgresql.util.PSQLException: ERROR: null value in column "id" violates not-null constraint
This is working here:
object Application extends Table[(Long, String)]("application") {
def idlApplication = column[Long]("idlapplication", O.PrimaryKey, O.AutoInc)
def appName = column[String]("appname")
def * = idlApplication ~ appName
def autoInc = appName returning idlApplication
}
var id = Application.autoInc.insert("App1")
This is how my SQL looks:
CREATE TABLE application
(idlapplication BIGSERIAL PRIMARY KEY,
appName VARCHAR(500));
Update:
The specific problem with regard to a mapped table with User (as in the question) can be solved as follows:
def forInsert = first ~ last <>
({ (f, l) => User(None, f, l) }, { u:User => Some((u.first, u.last)) })
This is from the test cases in the Slick git repository.
I tackled this problem in an different way. Since I expect my User objects to always have an id in my application logic and the only point where one would not have it is during the insertion to the database, I use an auxiliary NewUser case class which doesn't have an id.
case class User(id: Int, first: String, last: String)
case class NewUser(first: String, last: String)
object Users extends Table[User]("users") {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
def first = column[String]("first")
def last = column[String]("last")
def * = id ~ first ~ last <> (User, User.unapply _)
def autoInc = first ~ last <> (NewUser, NewUser.unapply _) returning id
}
val id = Users.autoInc.insert(NewUser("John", "Doe"))
Again, User maps 1:1 to the database entry/row while NewUser could be replaced by a tuple if you wanted to avoid having the extra case class, since it is only used as a data container for the insert invocation.
EDIT:
If you want more safety (with somewhat increased verbosity) you can make use of a trait for the case classes like so:
trait UserT {
def first: String
def last: String
}
case class User(id: Int, first: String, last: String) extends UserT
case class NewUser(first: String, last: String) extends UserT
// ... the rest remains intact
In this case you would apply your model changes to the trait first (including any mixins you might need), and optionally add default values to the NewUser.
Author's opinion: I still prefer the no-trait solution as it is more compact and changes to the model are a matter of copy-pasting the User params and then removing the id (auto-inc primary key), both in case class declaration and in table projections.
We're using a slightly different approach. Instead of creating a further projection, we request the next id for a table, copy it into the case class and use the default projection '*' for inserting the table entry.
For postgres it looks like this:
Let your Table-Objects implement this trait
trait TableWithId { this: Table[_] =>
/**
* can be overriden if the plural of tablename is irregular
**/
val idColName: String = s"${tableName.dropRight(1)}_id"
def id = column[Int](s"${idColName}", O.PrimaryKey, O.AutoInc)
def getNextId = (Q[Int] + s"""select nextval('"${tableName}_${idColName}_seq"')""").first
}
All your entity case classes need a method like this (should also be defined in a trait):
case class Entity (...) {
def withId(newId: Id): Entity = this.copy(id = Some(newId)
}
New entities can now be inserted this way:
object Entities extends Table[Entity]("entities") with TableWithId {
override val idColName: String = "entity_id"
...
def save(entity: Entity) = this insert entity.withId(getNextId)
}
The code is still not DRY, because you need to define the withId method for each table. Furthermore you have to request the next id before you insert an entity which might lead to performance impacts, but shouldn't be notable unless you insert thousands of entries at a time.
The main advantage is that there is no need for a second projection what makes the code less error prone, in particular for tables having many columns.
The simplest solution was to use the SERIAL type like this:
def id = column[Long]("id", SqlType("SERIAL"), O.PrimaryKey, O.AutoInc)
Here's a more concrete block:
// A case class to be used as table map
case class CaseTable( id: Long = 0L, dataType: String, strBlob: String)
// Class for our Table
class MyTable(tag: Tag) extends Table[CaseTable](tag, "mytable") {
// Define the columns
def dataType = column[String]("datatype")
def strBlob = column[String]("strblob")
// Auto Increment the id primary key column
def id = column[Long]("id", SqlType("SERIAL"), O.PrimaryKey, O.AutoInc)
// the * projection (e.g. select * ...) auto-transforms the tupled column values
def * = (id, dataType, strBlob) <> (CaseTable.tupled, CaseTable.unapply _)
}
// Insert and get auto incremented primary key
def insertData(dataType: String, strBlob: String, id: Long = 0L): Long = {
// DB Connection
val db = Database.forURL(jdbcUrl, pgUser, pgPassword, driver = driverClass)
// Variable to run queries on our table
val myTable = TableQuery[MyTable]
val insert = try {
// Form the query
val query = myTable returning myTable.map(_.id) += CaseTable(id, dataType, strBlob)
// Execute it and wait for result
val autoId = Await.result(db.run(query), maxWaitMins)
// Return ID
autoId
}
catch {
case e: Exception => {
logger.error("Error in inserting using Slick: ", e.getMessage)
e.printStackTrace()
-1L
}
}
insert
}
I've faced the same problem trying to make the computer-database sample from play-slick-3.0 when I changed the db to Postgres. What solved the problem was to change the id column (primary key) type to SERIAL in the evolution file /conf/evolutions/default/1.sql (originally was in BIGINT). Take a look at https://groups.google.com/forum/?fromgroups=#%21topic/scalaquery/OEOF8HNzn2U
for the whole discussion.
Cheers,
ReneX
Another trick is making the id of the case class a var
case class Entity(var id: Long)
To insert an instance, create it like below
Entity(null.asInstanceOf[Long])
I've tested that it works.
The solution I've found is to use SqlType("Serial") in the column definition. I haven't tested it extensively yet, but it seems to work so far.
So instead of
def id: Rep[PK[SomeTable]] = column[PK[SomeTable]]("id", O.PrimaryKey, O.AutoInc)
You should do:
def id: Rep[PK[SomeTable]] = column[PK[SomeTable]]("id", SqlType("SERIAL"), O.PrimaryKey, O.AutoInc)
Where PK is defined like the example in the "Essential Slick" book:
final case class PK[A](value: Long = 0L) extends AnyVal with MappedTo[Long]