Write RDD[entity] in cassandra from Spark

Write RDD[entity] in cassandra from Spark - scala

I am trying to write an RDD that contains public classes in Cassandra with Spark
class Test(private var id: String, private var randomNumber: Integer, private var lastUpdate: Instant) {
def setId(id: String): Unit = { this.id = id }
def getId: String = { this.id }
def setLastUpdater(lastUpdater: Instant): Unit = { this.lastUpdater = lastUpdater }
def getLastUpdater: Instant = { this.lastUpdater }
def setRandomNumber(number: Integer): Unit = { this.randomNumber = randomNumber }
def getRandomNumber: Integer = { this.randomNumber }
}
This class has all the Setters and Getters to maintain the encapsulation and I need it to not be a Case Class because I have to modify the values during the transformations.
The table corresponding to this entity in Cassandra has slightly different names for the fields:
CREATE TABLE IF NOT EXISTS test.test (
id uuid,
random_number int,
last_update timestamp,
PRIMARY KEY (id)
)
I am trying to write this RDD with the method saveToCassandra
implicit val connector = CassandraConnector(sc.getConf)
val rdd: RDD[Test]
rdd.saveToCassandra("test", "test")
but the method throws me an exception for the coincidence of the names of the attributes of the class with the names of the fields in the table
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Columns not found in entity.Test: [id, random_number, last_update]
at scala.Predef$.require(Predef.scala:277)
at com.datastax.spark.connector.mapper.DefaultColumnMapper.columnMapForWriting(DefaultColumnMapper.scala:106)
at com.datastax.spark.connector.mapper.MappedToGettableDataConverter$$anon$1.<init>(MappedToGettableDataConverter.scala:35)
at com.datastax.spark.connector.mapper.MappedToGettableDataConverter$.apply(MappedToGettableDataConverter.scala:26)
at com.datastax.spark.connector.writer.DefaultRowWriter.<init>(DefaultRowWriter.scala:16)
at com.datastax.spark.connector.writer.DefaultRowWriter$$anon$1.rowWriter(DefaultRowWriter.scala:30)
at com.datastax.spark.connector.writer.DefaultRowWriter$$anon$1.rowWriter(DefaultRowWriter.scala:28)
at com.datastax.spark.connector.writer.TableWriter$.apply(TableWriter.scala:433)
at com.datastax.spark.connector.writer.TableWriter$.apply(TableWriter.scala:417)
at com.datastax.spark.connector.RDDFunctions.saveToCassandra(RDDFunctions.scala:35)
how can I write the entity in Cassandra without having to call the attributes the same and the attributes are private in the class?

saveToCassandra allows you to provide an optional ColumnSelector:
def saveToCassandra(
keyspaceName: String,
tableName: String,
columns: ColumnSelector = AllColumns,
writeConf: WriteConf = WriteConf.fromSparkConf(sparkContext.getConf))(...): Unit
In your case you could use the following selector:
def selector = SomeColumns(
ColumnName("id"),
ColumnName("random_number", alias = Some("randomNumber")),
ColumnName("last_update", alias = Some("lastUpdate"))
)
Btw, while not the typical (and recommended) use of a case class, you could absolutely define fields as vars and benefit from using a typed Dataset. That makes it very easy to rename fields before writing to Cassandra.

Related

Future and Option in for comprehension in slick

I pretty new in using slick and now I faced with the issue how to retrieve some data from two tables.
I have one table
class ExecutionTable(tag: Tag) extends Table[ExecTuple](tag, "execution") {
val id: Rep[String] = column[String]("id")
val executionDefinitionId: Rep[Long] = column[Long]("executionDefinitionId")
// other fields are omitted
def * = ???
}
and another table
class ServiceStatusTable(tag: Tag)
extends Table[(String, Option[String])](tag, "serviceStatus") {
def serviceId: Rep[String] = column[String]("serviceId")
def detail: Rep[String] = column[String]("detail")
def * = (serviceId, detail.?)
}
In Dao I convert data from this two tables to a business object
case class ServiceStatus(
id: String,
detail: Option[String] = None, //other fields
)
like this
private lazy val getServiceStatusCompiled = Compiled {
(id: Rep[String], tenantId: Rep[String]) =>
for {
exec <- getExecutionById(id, tenantId)
status <- serviceStatuses if exec.id === status.serviceId
} yield mapToServiceStatus(exec, status)
}
and later
def getServiceStatus(id: String, tenantId: String)
: Future[Option[ServiceStatus]] = db
.run(getServiceStatusCompiled(id, tenantId).result.transactionally)
.map(_.headOption)
The problem is that not for all entries from table execution exists entry in table serviceStatus. I cannot modify table execution and add to it field details as it is only service specific.
When I run query in case when for entry from execution exists entry in serviceStatus all works as expected. But if there is no entry in serviceStatus, Future[None] is returned.
Question: Is there any option to obtain status in for comprehension as Option depending on existing entry in table serviceStatus or some else workaround?

Usually, in case when join condition does not find corresponding record in the "right" table but the result should still contain the row from "left" table, left join is used.
In your case you can do something like:
Execution
.filter(...execution table filter...)
.joinLeft(ServiceStatus).on(_.id===_.serviceId)
This gives you pair of
(Execution, Rep[Option[ServiceStatus]])
and after query execution:
(Execution, Option[ServiceStatus])

Generic update with mapping in Slick

I'm writing a CRUD app using Slick, and I want my update queries to only update a specific set of columns and I use .map().update() for that.
I have a function that returns a tuple of fields that can be updated in my table definition (def writableFields). And I have a funciton that returns a tuple of values to write there extracted from a case class.
It works fine, but it's annoying to create a repo and write the whole update function for every table. I want to create a generic form of this function, and make my table and it's companion object to extend some trait. But I cannot come up with correct type definitions.
Slick expects output of map() to be somehow compatible with the output of update. And I don't know how to make a generic type for tuples.
Is it even possible to accomplish? Or is there an alternative way to limit code duplication? Ideally I want to avoid writing Repos at all and just either instantiate a generic class or call a generic method.
object ProjectsRepo extends BaseRepository[Projects, Project] {
protected val query = lifted.TableQuery[Projects]
def update(id: Long, c: Project): Future[Option[Project]] = {
val q = filterByIdQuery(id).map(_.writableFields)
.update(Projects.mapFormToTable(c))
(db run q).flatMap(
affected =>
if (affected > 0) {
findOneById(id)
} else {
Future(None)
}
)
}
}
class Projects(tag: Tag) extends Table[Project](tag, "projects") with IdentifiableTable[Long] {
val id = column[Long]("id", O.PrimaryKey, O.AutoInc)
val title = column[String]("title")
val slug = column[String]("slug")
val created_at = column[Timestamp]("created_at")
val updated_at = column[Timestamp]("updated_at")
def writableFields =
(
title,
slug
)
def readableFields =
(
id,
created_at,
updated_at
)
def allFields = writableFields ++ readableFields // shapeless
def * = allFields <> (Projects.mapFromTable, (_: Project) => None)
}
object Projects {
def mapFormToTable(c: Project): FormFields =
(
c.title,
c.slug
)
}

scala.slick.SlickException: JdbcProfile has no JdbcType for type UnassignedType - on Option fields

I was successfully mapped case classes w/o Optional fields to Postgres db via class that extends Table.
Now I need to use case class with Optional[String] and Optional[DateTime] fields.
I has found how to declare mapping for it:
case class Issue(id: Int,
key: String,
...
resolutionName: Option[String],
resolutionDate: Option[DateTime],
)
case class Issues(tag: Tag) extends Table[Issue](tag, "Issues") {
// This is the primary key column:
def id = column[Int]("id", O.PrimaryKey)
def key = column[String]("key")
...
def resolutionName = column[String]("resolutionName")
def resolutionDate = column[DateTime]("resolutionDate")
def * = (id, key, resolutionName.?, resolutionDate.?) <> (Issue.tupled, Issue.unapply)
}
Code compiles well but during runtime I get exception:
Exception in thread "main" scala.slick.SlickException: JdbcProfile has no JdbcType for type UnassignedType
at scala.slick.driver.JdbcTypesComponent$class.jdbcTypeFor(JdbcTypesComponent.scala:66)
at scala.slick.driver.PostgresDriver$.jdbcTypeFor(PostgresDriver.scala:151)
at scala.slick.driver.JdbcTypesComponent$JdbcType$.unapply(JdbcTypesComponent.scala:49)
What shall I do to make it work?

The columns must be defined as Options too:
def resolutionName = column[Option[String]]("resolutionName")
def resolutionDate = column[Option[DateTime]]("resolutionDate")
Also you can avoid the .?in the projections function since that values are already mapped as options:
def * = (id, key, resolutionName, resolutionDate) <> (Issue.tupled, Issue.unapply)

Slick 2.0: How to convert lifted query results to a case class?

in order to implement a ReSTfull APIs stack, I need to convert data extracted from a DB to JSON format. I think that the best way is to extract data from the DB and then convert the row set to JSON using Json.toJson() passing as argument a case class after having defined a implicit serializer (writes).
Here's my case class and companion object:
package deals.db.interf.slick2
import scala.slick.driver.MySQLDriver.simple._
import play.api.libs.json.Json
case class PartnerInfo(
id: Int,
name: String,
site: String,
largeLogo: String,
smallLogo: String,
publicationSite: String
)
object PartnerInfo {
def toCaseClass( ?? ) = { // what type are the arguments to be passed?
PartnerInfo( fx(??) ) // how to transform the input types (slick) to Scala types?
}
// Notice I'm using slick 2.0.0 RC1
class PartnerInfoTable(tag: Tag) extends Table[(Int, String, String, String, String, String)](tag, "PARTNER"){
def id = column[Int]("id")
def name = column[String]("name")
def site = column[String]("site")
def largeLogo = column[String]("large_logo")
def smallLogo = column[String]("small_logo")
def publicationSite = column[String]("publication_site")
def * = (id, name, site, largeLogo, smallLogo, publicationSite)
}
val partnerInfos = TableQuery[PartnerInfoTable]
def qPartnerInfosForPuglisher(publicationSite: String) = {
for (
pi <- partnerInfos if ( pi.publicationSite == publicationSite )
) yield toCaseClass( _ ) // Pass all the table columns to toCaseClass()
}
implicit val partnerInfoWrites = Json.writes[PartnerInfo]
}
What I cannot get is how to implement the toCaseClass() method in order to transform the column types from Slick 2 to Scala types - notice the function fx() in the body of toCaseClass() is only meant to give emphasis to that.
I'm wondering if is it possible to get the Scala type from Slick column type because it is clearly passed in the table definition, but I cannot find how to get it.
Any idea?

I believe the simplest method here would be to map PartnerInfo in the table schema:
class PartnerInfoTable(tag: Tag) extends Table[PartnerInfo](tag, "PARTNER"){
def id = column[Int]("id")
def name = column[String]("name")
def site = column[String]("site")
def largeLogo = column[String]("large_logo")
def smallLogo = column[String]("small_logo")
def publicationSite = column[String]("publication_site")
def * = (id, name, site, largeLogo, smallLogo, publicationSite) <> (PartnerInfo.tupled, PartnerInfo.unapply)
}
val partnerInfos = TableQuery[PartnerInfoTable]
def qPartnerInfosForPuglisher(publicationSite: String) = {
for (
pi <- partnerInfos if ( pi.publicationSite == publicationSite )
) yield pi
}
Otherwise PartnerInfo.tupled should do the trick:
def toCaseClass(pi:(Int, String, String, String, String, String)) = PartnerInfo.tupled(pi)

ClassCastException when trying to insert with Squeryl

This may be due to my misunderstanding of how Squeryl works. My entity is defined as:
case class Wallet(userid: Int, amount: Long)
extends KeyedEntity[Int] with Optimistic {
def id = userid
}
My table variable is defined as:
val walletTable = table[Wallet]("wallets")
on(walletTable) {
w =>
declare {
w.userid is (primaryKey)
}
}
Then I'm just calling a method to try to add money to the wallet:
val requestedWallet = wallet.copy(amount = wallet.amount + amount)
try {
inTransaction {
walletTable.update(requestedWallet)
}
On the line where I call update, an exception is getting thrown:
[ClassCastException: java.lang.Integer cannot be cast to org.squeryl.dsl.CompositeKey]
I'm not using composite keys at all, so this is very confusing. Does it have to do with the fact that my id field is not called "id", but instead "userid"?

I get the same behavior when I try what you are doing. It seems that for some reason id can't be an alias unless it is a composite key (at least in 0.9.5). You can work around that and get the same result with something like this:
case class Wallet(#Column("userid") id: Int, amount: Long)
extends KeyedEntity[Int] with Optimistic {
def userid = id
}
The column annotation will look to the database for the userid field, and id will be a val instead. You can then alias userid for consistency.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Write RDD[entity] in cassandra from Spark - scala

Related

Future and Option in for comprehension in slick

Generic update with mapping in Slick

scala.slick.SlickException: JdbcProfile has no JdbcType for type UnassignedType - on Option fields

Slick 2.0: How to convert lifted query results to a case class?

ClassCastException when trying to insert with Squeryl

Categories

Resources