Slick3 with SQLite - autocommit seems to not be working - scala

I'm trying to write some basic queries with Slick for SQLite database
Here is my code:
class MigrationLog(name: String) {
val migrationEvents = TableQuery[MigrationEventTable]
lazy val db: Future[SQLiteDriver.backend.DatabaseDef] = {
val db = Database.forURL(s"jdbc:sqlite:$name.db", driver = "org.sqlite.JDBC")
val setup = DBIO.seq(migrationEvents.schema.create)
val createFuture = for {
tables <- db.run(MTable.getTables)
createResult <- if (tables.length == 0) db.run(setup) else Future.successful()
} yield createResult
createFuture.map(_ => db)
}
val addEvent: (String, String) => Future[String] = (aggregateId, eventType) => {
val id = java.util.UUID.randomUUID().toString
val command = DBIO.seq(migrationEvents += (id, aggregateId, None, eventType, "CREATED", System.currentTimeMillis, None))
db.flatMap(_.run(command).map(_ => id))
}
val eventSubmitted: (String, String) => Future[Unit] = (id, batchId) => {
val q = for { e <- migrationEvents if e.id === id } yield (e.batchId, e.status, e.updatedAt)
val updateAction = q.update(Some(batchId), "SUBMITTED", Some(System.currentTimeMillis))
db.map(_.run(updateAction))
}
val eventMigrationCompleted: (String, String, String) => Future[Unit] = (batchId, id, status) => {
val q = for { e <- migrationEvents if e.batchId === batchId && e.id === id} yield (e.status, e.updatedAt)
val updateAction = q.update(status, Some(System.currentTimeMillis))
db.map(_.run(updateAction))
}
val allEvents = () => {
db.flatMap(_.run(migrationEvents.result))
}
}
Here is how I'm using it:
val migrationLog = MigrationLog("test")
val events = for {
id <- migrationLog.addEvent("aggregateUserId", "userAccessControl")
_ <- migrationLog.eventSubmitted(id, "batchID_generated_from_idam")
_ <- migrationLog.eventMigrationCompleted("batchID_generated_from_idam", id, "Successful")
events <- migrationLog.allEvents()
} yield events
events.map(_.foreach(event => event match {
case (id, aggregateId, batchId, eventType, status, submitted, updatedAt) => println(s"$id $aggregateId $batchId $eventType $status $submitted $updatedAt")
}))
The idea is to add event first, then update it with batchId (which also updates status) and then update the status when the job is done. events should contain events with status Successful.
What happens is that after running this code it prints events with status SUBMITTED. If I wait a while and do the same allEvents query or just go and check the db from command line using sqlite3 then it's updated correctly.
I'm properly waiting for futures to be resolved before starting the next operation, auto-commit should be enabled by default.
Am I missing something?

Turns out the problem was with db.map(_.run(updateAction)) which returns Future[Future[Int]] which means that the command was not finished by the time I tried to run another query.
Replacing it with db.flatMap(_.run(updateAction)) solved the issue.

Related

Transactions With JSONCollection instead of BSONCollection

I have a problem to make Transaction via JSONCollection, I getting the following error:
JsResultException(errors:List((,List(JsonValidationError(List(CommandError[code=14, errmsg=BSON field 'OperationSessionInfo.txnNumber' is the wrong type 'int', expected type 'long', doc: {"operationTime":{"$time":1596894245,"$i":5,"$timestamp":{"t":1596894245,"i":5}},"ok":0,"errmsg":"BSON field 'OperationSessionInfo.txnNumber' is the wrong type 'int', expected type 'long'","code":14,"codeName":"TypeMismatch","$clusterTime":{"clusterTime":{"$time":1596894245,"$i":5,"$timestamp":{"t":1596894245,"i":5}},"signature":{"hash":{"$binary":"0000000000000000000000000000000000000000","$type":"00"},"keyId":0}}}]),WrappedArray())))))
I tried to change my project to BSONCollection but got some troubles, maybe there solution to overcome the above error with JSONCollection.
Also the exceptions occurs on testing update method, but checking the insertOneViaTransaction and setRuleAsInactiveViaTransaction is finished with success
This is my code for Transaction:
Update:
def update(oldRule: ExistRuleDto): Future[UpdateResult] = {
val transaction = (collection: JSONCollection) => for {
newRule <- dao.insertOneViaTransaction(collection,oldRule.toUpdatedRule) // insert new with ref to old
oldRule <- dao.setRuleAsInactiveViaTransaction(collection,oldRule.id)
} yield UpdateResult(oldRule, newRule)
makeTransaction[UpdateResult](transaction)
}
makeTransaction:
def makeTransaction[Out](block: JSONCollection => Future[Out]): Future[Out] = for {
dbWithSession <- dao.collection.db.startSession()
dbWithTx <- dbWithSession.startTransaction(None)
coll = dbWithTx.collection[JSONCollection](dao.collection.name)
// Operations:
res <- block(coll)
_ <- dbWithTx.commitTransaction()
_ <- dbWithSession.endSession()
} yield res
insertOneViaTransaction:
def insertOneViaTransaction(collection: JSONCollection, rule: Rule): Future[Rule] = {
collection.insert.one(rule).map {
case DefaultWriteResult(true, 1, _, _, _, _) => rule
case err => throw GeneralDBError(s"$rule was not inserted, something went wrong: $err")
}.recover {
case WriteResult.Code(11000) => throw DuplicationError(s"$rule exist on DB")
case err => throw GeneralDBError(err.getMessage)
}
}
setRuleAsInactiveViaTransaction:
def setRuleAsInactiveViaTransaction(collection: JSONCollection, ruleId: BSONObjectID): Future[Rule] = {
collection.findAndUpdate(
Json.obj(s"${Rule.ID}" -> ruleId),
Json.obj(
"$set" -> Json.obj(s"${Rule.Metadata}.${Metadata.Active}" -> false),
"$unset" -> Json.obj(s"${Rule.Metadata}.${Metadata.LastVersionExists}" -> "")),
fetchNewObject = true, upsert = false, sort = None, fields = None, bypassDocumentValidation = false, writeConcern = WriteConcern.Acknowledged, maxTime = None, collation = None, arrayFilters = Nil
).map(el => el.result[Rule].getOrElse {
val msg = s"Operation fail for updating ruleId = $ruleId"
logger.error(msg)
throw GeneralUpdateError(msg)
})
}
I'm using the following dependencies:
Play:
"com.typesafe.play" % "sbt-plugin" % "2.7.2
Reactivemongo:
"org.reactivemongo" %% "play2-reactivemongo" % "0.18.8-play27"
Solve it. (Not with compact)
Serializers:
implicit object JsValueHandler extends BSONHandler[BSONValue, JsValue] {
implicit override def read(bson: BSONValue): JsValue = BSONFormats.toJSON(bson)
implicit override def write(j: JsValue): BSONValue = BSONFormats.toBSON(j).get
}
asTransaction:
def asTransaction[Out](block: BSONCollection => Future[Out]): Future[Out] = {
for {
dbWithSession <- collection.db.startSession()
dbWithTx <- dbWithSession.startTransaction(None)
collectionWithTx = dbWithTx.collection[BSONCollection](collection.name)
out <- block(collectionWithTx)
_ <- dbWithTx.commitTransaction()
_ <- dbWithSession.endSession()
} yield out
}.recover {
case ex: Exception =>
logger.warn(s"asTransaction failed with ex: ${ex.getMessage}, rollback to previous state...")
throw GeneralDBErrorOnTx(ex.getMessage)
}
transaction example:
def `change visibility of ExistsRules and insert UpdateEvents`(oldRules: List[Rule], active: Boolean): Future[Unit] = {
ruleDao.asTransaction { collectionTx =>
for {
// (1) - $active old Rules
_ <- ruleDao.updateManyWithBsonCollection(
collectionTx,
filter = BSONDocument(s"${Rule.ID}" -> BSONDocument("$in" -> oldRules.map(_._id))),
update = BSONDocument("$set" -> BSONDocument(s"${Rule.Metadata}.${Metadata.Active}" -> active)))
// (2) - Sync Cache with Update Events
_ <- eventsService.addEvents(oldRules.map(rule => RuleEvent(rule.metadata.cacheKey, Update)))
} yield ()
}
}
Enjoy!

Scala slick update table after querying with a join

I want to update a table but the row needs to be selected based on certain conditions. Following code compiles fine but throws run-time exception:
play.api.http.HttpErrorHandlerExceptions$$anon$1: Execution exception[[SlickException: A query for an UPDATE statement must resolve to a comprehension with a single table -- Unsupported shape: Comprehension s2, Some(Apply Function =), None, ConstArray(), None, None, None, None]]
Here is the function (the intention is to allow update only for qualified user):
def updateRecord(record: Record)(implicit loggedInUserId: User) = {
val q = records.withFilter(_.id === record.id).join(companies).on(_.companyId === _.id).filter(_._2.userid === loggedInUserId)
val recordToUpdate = q.map { case (r, c) => r }
val action = for {
c <- recordToUpdate.update(record)
} yield (c)
... // there are other actions in the for comprehension, removed them for clarity
I thought the result of the map is a row from the records table (not a tuple) but the errors seems to be indicating that I am not updating a "single" table.
Or is there a better way of doing query + update?
Yes, you seem to try updating both tables.
Maybe you should try something like
def updateRecord(record: Record)(implicit loggedInUserId: User): Future[Int] = {
val recordToUpdate = records.filter(_.id === record.id)
val q = recordToUpdate
.join(companies).on(_.companyId === _.id)
.filter(_._2.userid === loggedInUserId)
.exists
val action = for {
c <- recordToUpdate.update(record)
// ...
} yield c
for {
isLoggedIn <- db.run(q.result)
if isLoggedIn
c <- db.run(action)
} yield c
}
You can also try
def updateRecord(record: Record)(implicit loggedInUserId: User):
DBIOAction[Int, NoStream, Read with Write with Transactional] = {
val recordToUpdate = records.filter(_.id === record.id)
val action =
recordToUpdate
.join(companies).on(_.companyId === _.id)
.filter(_._2.userid === loggedInUserId)
.exists
.result
(for {
isLoggedIn <- action
if isLoggedIn
c <- recordToUpdate.update(record)
// ...
} yield c).transactionally
}
Variant that should work without NoSuchElementException: Action.withFilter failed. Based on the answer.
def updateRecord(record: Record)(implicit loggedInUserId: User):
DBIOAction[Int, NoStream, Read with Write with Transactional] = {
val recordToUpdate = records.filter(_.id === record.id)
val action =
recordToUpdate
.join(companies).on(_.companyId === _.id)
.filter(_._2.userid === loggedInUserId)
.exists
.result
action.flatMap {
case true => for {
c <- recordToUpdate.update(record)
// ...
} yield c
case false => DBIO.successful(0) /*DBIO.failed(new IllegalStateException)*/
}.transactionally
}

MongoSpark duplicate key error only on massive datasets

Using MongoSpark, running the same code on 2 different datasets of differing sizes causing one to throw the E11000 duplicate key error.
Before we proceed, here is the code below:
object ScrapeHubCompanyImporter {
def importData(path: String, companyMongoUrl: String): Unit = {
val spark = SparkSession.builder()
.master("local[*]")
.config("spark.mongodb.input.uri", companyMongoUrl)
.config("spark.mongodb.output.uri", companyMongoUrl)
.config("spark.mongodb.input.partitionerOptions.partitionKey", "profileUrl")
.getOrCreate()
import spark.implicits._
val websiteToDomainTransformer = udf((website: String) => {
val tldExtract = SplitHost.fromURL(website)
if (tldExtract.domain == "") {
null
} else {
tldExtract.domain + "." + tldExtract.tld
}
})
val jsonDF =
spark
.read
.json(path)
.filter { row =>
row.getAs[String]("canonical_url") != null
}
.dropDuplicates(Seq("canonical_url"))
.select(
toHttpsUdf($"canonical_url").as("profileUrl"),
$"city",
$"country",
$"founded",
$"hq".as("headquartes"),
$"industry",
$"company_id".as("companyId"),
$"name",
$"postal",
$"size",
$"specialties",
$"state",
$"street_1",
$"street_2",
$"type",
$"website"
)
.filter { row => row.getAs[String]("website") != null }
.withColumn("domain", websiteToDomainTransformer($"website"))
.filter(row => row.getAs[String]("domain") != null)
.as[ScrapeHubCompanyDataRep]
val jsonColsSet = jsonDF.columns.toSet
val mongoData = MongoSpark
.load[LinkedinCompanyRep](spark)
.withColumn("companyUrl", toHttpsUdf($"companyUrl"))
.as[CompanyRep]
val mongoColsSet = mongoData.columns.toSet
val union = jsonDF.joinWith(
mongoData,
jsonDF("companyUrl") === mongoData("companyUrl"),
joinType = "left")
.map { t =>
val scrapeHub = t._1
val liCompanyRep = if (t._2 != null ) {
t._2
} else {
CompanyRep(domain = scrapeHub.domain)
}
CompanyRep(
_id = pickValue(liCompanyRep._id, None),
city = pickValue(scrapeHub.city, liCompanyRep.city),
country = pickValue(scrapeHub.country, liCompanyRep.country),
postal = pickValue(scrapeHub.postal, liCompanyRep.postal),
domain = scrapeHub.domain,
founded = pickValue(scrapeHub.founded, liCompanyRep.founded),
headquartes = pickValue(scrapeHub.headquartes, liCompanyRep.headquartes),
headquarters = liCompanyRep.headquarters,
industry = pickValue(scrapeHub.industry, liCompanyRep.industry),
linkedinId = pickValue(scrapeHub.companyId, liCompanyRep.companyId),
companyUrl = Option(scrapeHub.companyUrl),
name = pickValue(scrapeHub.name, liCompanyRep.name),
size = pickValue(scrapeHub.size, liCompanyRep.size),
specialties = pickValue(scrapeHub.specialties, liCompanyRep.specialties),
street_1 = pickValue(scrapeHub.street_1, liCompanyRep.street_1),
street_2 = pickValue(scrapeHub.street_2, liCompanyRep.street_2),
state = pickValue(scrapeHub.state, liCompanyRep.state),
`type` = pickValue(scrapeHub.`type`, liCompanyRep.`type`),
website = pickValue(scrapeHub.website, liCompanyRep.website),
updatedDate = None,
scraped = Some(true)
)
}
val idToMongoId = udf { st: String =>
if (st != null) {
ObjectId(st)
} else {
null
}
}
val saveReady =
union
.map { rep =>
rep.copy(
updatedDate = Some(new Timestamp(System.currentTimeMillis)),
scraped = Some(true),
headquarters = generateCompanyHeadquarters(rep)
)
}
.dropDuplicates(Seq("companyUrl"))
MongoSpark.save(
saveReady.withColumn("_id", idToMongoId($"_id")),
WriteConfig(Map(
"uri" -> companyMongoUrl
)))
}
def generateCompanyHeadquarters(companyRep: CompanyRep): Option[CompanyHeadquarters] = {
val hq = CompanyHeadquarters(
country = companyRep.country,
geographicArea = companyRep.state,
city = companyRep.city,
postalCode = companyRep.postal,
line1 = companyRep.street_1,
line2 = companyRep.street_2
)
CompanyHeadquarters
.unapply(hq)
.get
.productIterator.toSeq.exists {
case a: Option[_] => a.isDefined
case _ => false
} match {
case true => Some(hq)
case false => None
}
}
def pickValue(left: Option[String], right: Option[String]): Option[String] = {
def _noneIfNull(opt: Option[String]): Option[String] = {
if (opt != null) {
opt
} else {
None
}
}
val lOpt = _noneIfNull(left)
val rOpt = _noneIfNull(right)
lOpt match {
case Some(l) => Option(l)
case None => rOpt match {
case Some(r) => Option(r)
case None => None
}
}
}
}
This issue is around the companyUrl which is one of the unique keys in the collection, the other being the _id key. The issue is that there are tons of duplicates that Spark will attempt to save on a 700gb dataset, but if I run a very small dataset locally, Im never able to replicate the issue. Im trying to understand whats going on, and how can I make sure to group all the existing companies on the companyUrl, and make sure that duplicates really are removed globally across the dataset.
EDIT
Here are some scenarios that arise:
Company is in Mongo, the file thats read has updated data -> Duplicate key error can occur here
Company not in Mongo but in file -> Duplicate key error can occur here as well.
EDIT2
The duplication error occurs around companyUrl field.
EDIT 3
I've narrowed down this as being an issue the merging stage. Looking through records that have been marked as having duplicate companyUrl's, some of those records are not in the target collection yet somehow a duplicate record is still being written to the collection. In other situations, the _id field of the new record doesn't match the old record that had the same companyUrl.

How transaction works in case of failure

In the method below, how could I rollback all transactions when one fails?
Am I able to write all insert statements in a single transaction?
def save(types: List[admin] ): Unit = {
try {
DB.withTransaction { implicit c =>
val update = SQL("insert IGNORE into table1 (user_id, full_name, user_name) values ({user_id},{full_name},{user_name})")
val batch = (update.asBatch /: types)(
(sql, _type) => sql.addBatchParams(_type.user.id, _type.user.name, _type.user.login_id))
batch.execute
}
DB.withTransaction { implicit c =>
val update1 = SQL("INSERT IGNORE INTO table2 (user_id, role_id, is_enabled) values ({user_id},{role_id},{is_enabled})")
val batch1 = (update1.asBatch /: types)(
(sql, _type) => sql.addBatchParams(_type.user.id, _type.role_id, 1))
batch1.execute
}
DB.withTransaction { implicit c =>
val update2 = SQL("INSERT IGNORE INTO table3 (user_id, bu_id, role_id, is_enabled) values ({user_id},{bu_id},{role_id},{is_enabled})")
val batch2 = (update2.asBatch /: types)(
(sql, _type) => sql.addBatchParams(_type.user.id, 1, _type.role_id, 1))
batch2.execute
}
} catch {
case ex: Exception =>
Logger.error("Exception : " + ex.getMessage)
play.Logger.info("Exception" + ex.getMessage)
}
}

Insert if not exists in Slick 3.0.0

I'm trying to insert if not exists, I found this post for 1.0.1, 2.0.
I found snippet using transactionally in the docs of 3.0.0
val a = (for {
ns <- coffees.filter(_.name.startsWith("ESPRESSO")).map(_.name).result
_ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*)
} yield ()).transactionally
val f: Future[Unit] = db.run(a)
I'm struggling to write the logic from insert if not exists with this structure. I'm new to Slick and have little experience with Scala. This is my attempt to do insert if not exists outside the transaction...
val result: Future[Boolean] = db.run(products.filter(_.name==="foo").exists.result)
result.map { exists =>
if (!exists) {
products += Product(
None,
productName,
productPrice
)
}
}
But how do I put this in the transactionally block? This is the furthest I can go:
val a = (for {
exists <- products.filter(_.name==="foo").exists.result
//???
// _ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*)
} yield ()).transactionally
Thanks in advance
It is possible to use a single insert ... if not exists query. This avoids multiple database round-trips and race conditions (transactions may not be enough depending on isolation level).
def insertIfNotExists(name: String) = users.forceInsertQuery {
val exists = (for (u <- users if u.name === name.bind) yield u).exists
val insert = (name.bind, None) <> (User.apply _ tupled, User.unapply)
for (u <- Query(insert) if !exists) yield u
}
Await.result(db.run(DBIO.seq(
// create the schema
users.schema.create,
users += User("Bob"),
users += User("Bob"),
insertIfNotExists("Bob"),
insertIfNotExists("Fred"),
insertIfNotExists("Fred"),
// print the users (select * from USERS)
users.result.map(println)
)), Duration.Inf)
Output:
Vector(User(Bob,Some(1)), User(Bob,Some(2)), User(Fred,Some(3)))
Generated SQL:
insert into "USERS" ("NAME","ID") select ?, null where not exists(select x2."NAME", x2."ID" from "USERS" x2 where x2."NAME" = ?)
Here's the full example on github
This is the version I came up with:
val a = (
products.filter(_.name==="foo").exists.result.flatMap { exists =>
if (!exists) {
products += Product(
None,
productName,
productPrice
)
} else {
DBIO.successful(None) // no-op
}
}
).transactionally
It's is a bit lacking though, for example it would be useful to return the inserted or existing object.
For completeness, here the table definition:
case class DBProduct(id: Int, uuid: String, name: String, price: BigDecimal)
class Products(tag: Tag) extends Table[DBProduct](tag, "product") {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc) // This is the primary key column
def uuid = column[String]("uuid")
def name = column[String]("name")
def price = column[BigDecimal]("price", O.SqlType("decimal(10, 4)"))
def * = (id, uuid, name, price) <> (DBProduct.tupled, DBProduct.unapply)
}
val products = TableQuery[Products]
I'm using a mapped table, the solution works also for tuples, with minor changes.
Note also that it's not necessary to define the id as optional, according to the documentation it's ignored in insert operations:
When you include an AutoInc column in an insert operation, it is silently ignored, so that the database can generate the proper value
And here the method:
def insertIfNotExists(productInput: ProductInput): Future[DBProduct] = {
val productAction = (
products.filter(_.uuid===productInput.uuid).result.headOption.flatMap {
case Some(product) =>
mylog("product was there: " + product)
DBIO.successful(product)
case None =>
mylog("inserting product")
val productId =
(products returning products.map(_.id)) += DBProduct(
0,
productInput.uuid,
productInput.name,
productInput.price
)
val product = productId.map { id => DBProduct(
id,
productInput.uuid,
productInput.name,
productInput.price
)
}
product
}
).transactionally
db.run(productAction)
}
(Thanks Matthew Pocock from Google group thread, for orienting me to this solution).
I've run into the solution that looks more complete. Section 3.1.7 More Control over Inserts of the Essential Slick book has the example.
At the end you get smth like:
val entity = UserEntity(UUID.random, "jay", "jay#localhost")
val exists =
users
.filter(
u =>
u.name === entity.name.bind
&& u.email === entity.email.bind
)
.exists
val selectExpression = Query(
(
entity.id.bind,
entity.name.bind,
entity.email.bind
)
).filterNot(_ => exists)
val action = usersDecisions
.map(u => (u.id, u.name, u.email))
.forceInsertQuery(selectExpression)
exec(action)
// res17: Int = 1
exec(action)
// res18: Int = 0
according to the slick 3.0 manual insert query section (http://slick.typesafe.com/doc/3.0.0/queries.html), the inserted values can be returned with id as below:
def insertIfNotExists(productInput: ProductInput): Future[DBProduct] = {
val productAction = (
products.filter(_.uuid===productInput.uuid).result.headOption.flatMap {
case Some(product) =>
mylog("product was there: " + product)
DBIO.successful(product)
case None =>
mylog("inserting product")
(products returning products.map(_.id)
into ((prod,id) => prod.copy(id=id))) += DBProduct(
0,
productInput.uuid,
productInput.name,
productInput.price
)
}
).transactionally
db.run(productAction)
}