Yield nothing in for ... yield when inserting into DB using DBAction - scala

When inserting objects in the DB using a for ... yield construct (See below) I want to be able to 'yield nothing' for certain conditions. I am still learning Scala and have little experience in Slick.
val usersTable = TableQuery[dbuserTable]
val inserts = for (user <- x.userList) yield {
if (user.someCondition == true) {
val userRow = userClass(user.name, user.id)
usersTable += userRow
}
else {
//yield nothing
}
}
DBAccess.db.run(DBIO.seq(inserts: _*))
I'm afraid I'm missing something about how the yield loop leads to a Sequence of DBAction objects.

Try
val inserts = for (user <- x.userList) yield {
if (user.someCondition == true) {
val userRow = userClass(user.name, user.id)
usersTable += userRow
}
else {
DBIO.successful(0)
}
}
Int in DBIOAction[Int, NoStream, Effect.Write] means number of rows inserted.

Related

Check that the request returned to the database

I have a method that makes a query to the database.
I want to understand. How can I check if there are rows in the table that meet my condition?
def getMessage() = {
val query = messages.filter(_.status === true)
val action = query.result.head
val result = db.run(action)
//val sql = action.statements.head
val temp = result.map(_.phone)
Thread.sleep(3000)
println(temp.toString)
}
For example, I need an example logic of how the method works.
def getMessage():String = {
val query = messages.filter(_.status === true)
val action = query.result.head
val result = db.run(action)
//val sql = action.statements.head
val temp = result.map(_.phone)
Thread.sleep(3000)
if (temp != "null") return temp
else return "null"
}
db.run returns a Future so you need to check the result when that completes:
db.run(action).onComplete{
case Success(res) =>
// Process result in res
case Failure(e) =>
// Handle error case
}
You should never Sleep or wait for a Future, so getMessage should return Future[String] and the calling code can handle the result when it is ready.
def getMessage(): Future[String] = {
val query = messages.filter(_.status === true)
val action = query.result.head
db.run(action)
}
More generally you need to look at a how Future works and how to modify the results (map), chain multiple Futures (flatMap) or handle multiple Futures as a single Future (Future.sequence).
In general you should keep the processing inside the Future for as long as possible.

Scala slick update table after querying with a join

I want to update a table but the row needs to be selected based on certain conditions. Following code compiles fine but throws run-time exception:
play.api.http.HttpErrorHandlerExceptions$$anon$1: Execution exception[[SlickException: A query for an UPDATE statement must resolve to a comprehension with a single table -- Unsupported shape: Comprehension s2, Some(Apply Function =), None, ConstArray(), None, None, None, None]]
Here is the function (the intention is to allow update only for qualified user):
def updateRecord(record: Record)(implicit loggedInUserId: User) = {
val q = records.withFilter(_.id === record.id).join(companies).on(_.companyId === _.id).filter(_._2.userid === loggedInUserId)
val recordToUpdate = q.map { case (r, c) => r }
val action = for {
c <- recordToUpdate.update(record)
} yield (c)
... // there are other actions in the for comprehension, removed them for clarity
I thought the result of the map is a row from the records table (not a tuple) but the errors seems to be indicating that I am not updating a "single" table.
Or is there a better way of doing query + update?
Yes, you seem to try updating both tables.
Maybe you should try something like
def updateRecord(record: Record)(implicit loggedInUserId: User): Future[Int] = {
val recordToUpdate = records.filter(_.id === record.id)
val q = recordToUpdate
.join(companies).on(_.companyId === _.id)
.filter(_._2.userid === loggedInUserId)
.exists
val action = for {
c <- recordToUpdate.update(record)
// ...
} yield c
for {
isLoggedIn <- db.run(q.result)
if isLoggedIn
c <- db.run(action)
} yield c
}
You can also try
def updateRecord(record: Record)(implicit loggedInUserId: User):
DBIOAction[Int, NoStream, Read with Write with Transactional] = {
val recordToUpdate = records.filter(_.id === record.id)
val action =
recordToUpdate
.join(companies).on(_.companyId === _.id)
.filter(_._2.userid === loggedInUserId)
.exists
.result
(for {
isLoggedIn <- action
if isLoggedIn
c <- recordToUpdate.update(record)
// ...
} yield c).transactionally
}
Variant that should work without NoSuchElementException: Action.withFilter failed. Based on the answer.
def updateRecord(record: Record)(implicit loggedInUserId: User):
DBIOAction[Int, NoStream, Read with Write with Transactional] = {
val recordToUpdate = records.filter(_.id === record.id)
val action =
recordToUpdate
.join(companies).on(_.companyId === _.id)
.filter(_._2.userid === loggedInUserId)
.exists
.result
action.flatMap {
case true => for {
c <- recordToUpdate.update(record)
// ...
} yield c
case false => DBIO.successful(0) /*DBIO.failed(new IllegalStateException)*/
}.transactionally
}

MongoSpark duplicate key error only on massive datasets

Using MongoSpark, running the same code on 2 different datasets of differing sizes causing one to throw the E11000 duplicate key error.
Before we proceed, here is the code below:
object ScrapeHubCompanyImporter {
def importData(path: String, companyMongoUrl: String): Unit = {
val spark = SparkSession.builder()
.master("local[*]")
.config("spark.mongodb.input.uri", companyMongoUrl)
.config("spark.mongodb.output.uri", companyMongoUrl)
.config("spark.mongodb.input.partitionerOptions.partitionKey", "profileUrl")
.getOrCreate()
import spark.implicits._
val websiteToDomainTransformer = udf((website: String) => {
val tldExtract = SplitHost.fromURL(website)
if (tldExtract.domain == "") {
null
} else {
tldExtract.domain + "." + tldExtract.tld
}
})
val jsonDF =
spark
.read
.json(path)
.filter { row =>
row.getAs[String]("canonical_url") != null
}
.dropDuplicates(Seq("canonical_url"))
.select(
toHttpsUdf($"canonical_url").as("profileUrl"),
$"city",
$"country",
$"founded",
$"hq".as("headquartes"),
$"industry",
$"company_id".as("companyId"),
$"name",
$"postal",
$"size",
$"specialties",
$"state",
$"street_1",
$"street_2",
$"type",
$"website"
)
.filter { row => row.getAs[String]("website") != null }
.withColumn("domain", websiteToDomainTransformer($"website"))
.filter(row => row.getAs[String]("domain") != null)
.as[ScrapeHubCompanyDataRep]
val jsonColsSet = jsonDF.columns.toSet
val mongoData = MongoSpark
.load[LinkedinCompanyRep](spark)
.withColumn("companyUrl", toHttpsUdf($"companyUrl"))
.as[CompanyRep]
val mongoColsSet = mongoData.columns.toSet
val union = jsonDF.joinWith(
mongoData,
jsonDF("companyUrl") === mongoData("companyUrl"),
joinType = "left")
.map { t =>
val scrapeHub = t._1
val liCompanyRep = if (t._2 != null ) {
t._2
} else {
CompanyRep(domain = scrapeHub.domain)
}
CompanyRep(
_id = pickValue(liCompanyRep._id, None),
city = pickValue(scrapeHub.city, liCompanyRep.city),
country = pickValue(scrapeHub.country, liCompanyRep.country),
postal = pickValue(scrapeHub.postal, liCompanyRep.postal),
domain = scrapeHub.domain,
founded = pickValue(scrapeHub.founded, liCompanyRep.founded),
headquartes = pickValue(scrapeHub.headquartes, liCompanyRep.headquartes),
headquarters = liCompanyRep.headquarters,
industry = pickValue(scrapeHub.industry, liCompanyRep.industry),
linkedinId = pickValue(scrapeHub.companyId, liCompanyRep.companyId),
companyUrl = Option(scrapeHub.companyUrl),
name = pickValue(scrapeHub.name, liCompanyRep.name),
size = pickValue(scrapeHub.size, liCompanyRep.size),
specialties = pickValue(scrapeHub.specialties, liCompanyRep.specialties),
street_1 = pickValue(scrapeHub.street_1, liCompanyRep.street_1),
street_2 = pickValue(scrapeHub.street_2, liCompanyRep.street_2),
state = pickValue(scrapeHub.state, liCompanyRep.state),
`type` = pickValue(scrapeHub.`type`, liCompanyRep.`type`),
website = pickValue(scrapeHub.website, liCompanyRep.website),
updatedDate = None,
scraped = Some(true)
)
}
val idToMongoId = udf { st: String =>
if (st != null) {
ObjectId(st)
} else {
null
}
}
val saveReady =
union
.map { rep =>
rep.copy(
updatedDate = Some(new Timestamp(System.currentTimeMillis)),
scraped = Some(true),
headquarters = generateCompanyHeadquarters(rep)
)
}
.dropDuplicates(Seq("companyUrl"))
MongoSpark.save(
saveReady.withColumn("_id", idToMongoId($"_id")),
WriteConfig(Map(
"uri" -> companyMongoUrl
)))
}
def generateCompanyHeadquarters(companyRep: CompanyRep): Option[CompanyHeadquarters] = {
val hq = CompanyHeadquarters(
country = companyRep.country,
geographicArea = companyRep.state,
city = companyRep.city,
postalCode = companyRep.postal,
line1 = companyRep.street_1,
line2 = companyRep.street_2
)
CompanyHeadquarters
.unapply(hq)
.get
.productIterator.toSeq.exists {
case a: Option[_] => a.isDefined
case _ => false
} match {
case true => Some(hq)
case false => None
}
}
def pickValue(left: Option[String], right: Option[String]): Option[String] = {
def _noneIfNull(opt: Option[String]): Option[String] = {
if (opt != null) {
opt
} else {
None
}
}
val lOpt = _noneIfNull(left)
val rOpt = _noneIfNull(right)
lOpt match {
case Some(l) => Option(l)
case None => rOpt match {
case Some(r) => Option(r)
case None => None
}
}
}
}
This issue is around the companyUrl which is one of the unique keys in the collection, the other being the _id key. The issue is that there are tons of duplicates that Spark will attempt to save on a 700gb dataset, but if I run a very small dataset locally, Im never able to replicate the issue. Im trying to understand whats going on, and how can I make sure to group all the existing companies on the companyUrl, and make sure that duplicates really are removed globally across the dataset.
EDIT
Here are some scenarios that arise:
Company is in Mongo, the file thats read has updated data -> Duplicate key error can occur here
Company not in Mongo but in file -> Duplicate key error can occur here as well.
EDIT2
The duplication error occurs around companyUrl field.
EDIT 3
I've narrowed down this as being an issue the merging stage. Looking through records that have been marked as having duplicate companyUrl's, some of those records are not in the target collection yet somehow a duplicate record is still being written to the collection. In other situations, the _id field of the new record doesn't match the old record that had the same companyUrl.

Slick3 with SQLite - autocommit seems to not be working

I'm trying to write some basic queries with Slick for SQLite database
Here is my code:
class MigrationLog(name: String) {
val migrationEvents = TableQuery[MigrationEventTable]
lazy val db: Future[SQLiteDriver.backend.DatabaseDef] = {
val db = Database.forURL(s"jdbc:sqlite:$name.db", driver = "org.sqlite.JDBC")
val setup = DBIO.seq(migrationEvents.schema.create)
val createFuture = for {
tables <- db.run(MTable.getTables)
createResult <- if (tables.length == 0) db.run(setup) else Future.successful()
} yield createResult
createFuture.map(_ => db)
}
val addEvent: (String, String) => Future[String] = (aggregateId, eventType) => {
val id = java.util.UUID.randomUUID().toString
val command = DBIO.seq(migrationEvents += (id, aggregateId, None, eventType, "CREATED", System.currentTimeMillis, None))
db.flatMap(_.run(command).map(_ => id))
}
val eventSubmitted: (String, String) => Future[Unit] = (id, batchId) => {
val q = for { e <- migrationEvents if e.id === id } yield (e.batchId, e.status, e.updatedAt)
val updateAction = q.update(Some(batchId), "SUBMITTED", Some(System.currentTimeMillis))
db.map(_.run(updateAction))
}
val eventMigrationCompleted: (String, String, String) => Future[Unit] = (batchId, id, status) => {
val q = for { e <- migrationEvents if e.batchId === batchId && e.id === id} yield (e.status, e.updatedAt)
val updateAction = q.update(status, Some(System.currentTimeMillis))
db.map(_.run(updateAction))
}
val allEvents = () => {
db.flatMap(_.run(migrationEvents.result))
}
}
Here is how I'm using it:
val migrationLog = MigrationLog("test")
val events = for {
id <- migrationLog.addEvent("aggregateUserId", "userAccessControl")
_ <- migrationLog.eventSubmitted(id, "batchID_generated_from_idam")
_ <- migrationLog.eventMigrationCompleted("batchID_generated_from_idam", id, "Successful")
events <- migrationLog.allEvents()
} yield events
events.map(_.foreach(event => event match {
case (id, aggregateId, batchId, eventType, status, submitted, updatedAt) => println(s"$id $aggregateId $batchId $eventType $status $submitted $updatedAt")
}))
The idea is to add event first, then update it with batchId (which also updates status) and then update the status when the job is done. events should contain events with status Successful.
What happens is that after running this code it prints events with status SUBMITTED. If I wait a while and do the same allEvents query or just go and check the db from command line using sqlite3 then it's updated correctly.
I'm properly waiting for futures to be resolved before starting the next operation, auto-commit should be enabled by default.
Am I missing something?
Turns out the problem was with db.map(_.run(updateAction)) which returns Future[Future[Int]] which means that the command was not finished by the time I tried to run another query.
Replacing it with db.flatMap(_.run(updateAction)) solved the issue.

Insert if not exists in Slick 3.0.0

I'm trying to insert if not exists, I found this post for 1.0.1, 2.0.
I found snippet using transactionally in the docs of 3.0.0
val a = (for {
ns <- coffees.filter(_.name.startsWith("ESPRESSO")).map(_.name).result
_ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*)
} yield ()).transactionally
val f: Future[Unit] = db.run(a)
I'm struggling to write the logic from insert if not exists with this structure. I'm new to Slick and have little experience with Scala. This is my attempt to do insert if not exists outside the transaction...
val result: Future[Boolean] = db.run(products.filter(_.name==="foo").exists.result)
result.map { exists =>
if (!exists) {
products += Product(
None,
productName,
productPrice
)
}
}
But how do I put this in the transactionally block? This is the furthest I can go:
val a = (for {
exists <- products.filter(_.name==="foo").exists.result
//???
// _ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*)
} yield ()).transactionally
Thanks in advance
It is possible to use a single insert ... if not exists query. This avoids multiple database round-trips and race conditions (transactions may not be enough depending on isolation level).
def insertIfNotExists(name: String) = users.forceInsertQuery {
val exists = (for (u <- users if u.name === name.bind) yield u).exists
val insert = (name.bind, None) <> (User.apply _ tupled, User.unapply)
for (u <- Query(insert) if !exists) yield u
}
Await.result(db.run(DBIO.seq(
// create the schema
users.schema.create,
users += User("Bob"),
users += User("Bob"),
insertIfNotExists("Bob"),
insertIfNotExists("Fred"),
insertIfNotExists("Fred"),
// print the users (select * from USERS)
users.result.map(println)
)), Duration.Inf)
Output:
Vector(User(Bob,Some(1)), User(Bob,Some(2)), User(Fred,Some(3)))
Generated SQL:
insert into "USERS" ("NAME","ID") select ?, null where not exists(select x2."NAME", x2."ID" from "USERS" x2 where x2."NAME" = ?)
Here's the full example on github
This is the version I came up with:
val a = (
products.filter(_.name==="foo").exists.result.flatMap { exists =>
if (!exists) {
products += Product(
None,
productName,
productPrice
)
} else {
DBIO.successful(None) // no-op
}
}
).transactionally
It's is a bit lacking though, for example it would be useful to return the inserted or existing object.
For completeness, here the table definition:
case class DBProduct(id: Int, uuid: String, name: String, price: BigDecimal)
class Products(tag: Tag) extends Table[DBProduct](tag, "product") {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc) // This is the primary key column
def uuid = column[String]("uuid")
def name = column[String]("name")
def price = column[BigDecimal]("price", O.SqlType("decimal(10, 4)"))
def * = (id, uuid, name, price) <> (DBProduct.tupled, DBProduct.unapply)
}
val products = TableQuery[Products]
I'm using a mapped table, the solution works also for tuples, with minor changes.
Note also that it's not necessary to define the id as optional, according to the documentation it's ignored in insert operations:
When you include an AutoInc column in an insert operation, it is silently ignored, so that the database can generate the proper value
And here the method:
def insertIfNotExists(productInput: ProductInput): Future[DBProduct] = {
val productAction = (
products.filter(_.uuid===productInput.uuid).result.headOption.flatMap {
case Some(product) =>
mylog("product was there: " + product)
DBIO.successful(product)
case None =>
mylog("inserting product")
val productId =
(products returning products.map(_.id)) += DBProduct(
0,
productInput.uuid,
productInput.name,
productInput.price
)
val product = productId.map { id => DBProduct(
id,
productInput.uuid,
productInput.name,
productInput.price
)
}
product
}
).transactionally
db.run(productAction)
}
(Thanks Matthew Pocock from Google group thread, for orienting me to this solution).
I've run into the solution that looks more complete. Section 3.1.7 More Control over Inserts of the Essential Slick book has the example.
At the end you get smth like:
val entity = UserEntity(UUID.random, "jay", "jay#localhost")
val exists =
users
.filter(
u =>
u.name === entity.name.bind
&& u.email === entity.email.bind
)
.exists
val selectExpression = Query(
(
entity.id.bind,
entity.name.bind,
entity.email.bind
)
).filterNot(_ => exists)
val action = usersDecisions
.map(u => (u.id, u.name, u.email))
.forceInsertQuery(selectExpression)
exec(action)
// res17: Int = 1
exec(action)
// res18: Int = 0
according to the slick 3.0 manual insert query section (http://slick.typesafe.com/doc/3.0.0/queries.html), the inserted values can be returned with id as below:
def insertIfNotExists(productInput: ProductInput): Future[DBProduct] = {
val productAction = (
products.filter(_.uuid===productInput.uuid).result.headOption.flatMap {
case Some(product) =>
mylog("product was there: " + product)
DBIO.successful(product)
case None =>
mylog("inserting product")
(products returning products.map(_.id)
into ((prod,id) => prod.copy(id=id))) += DBProduct(
0,
productInput.uuid,
productInput.name,
productInput.price
)
}
).transactionally
db.run(productAction)
}