I am looking for a way to generate an UPDATE query over multiple columns that are only known at runtime.
For instance, given a List[(String, Int)], how would I go about generating a query in the form of UPDATE <table> SET k1=v1, k2=v2, kn=vn for all key/value pairs in the list?
I have found that, given a single key/value pair, a plain SQL query can be built as sqlu"UPDATE <table> SET #$key=$value (where the key is from a trusted source to avoid injection), but I've been unsuccessful in generalizing this to a list of updates without running a query for each.
Is this possible?
This is one way to do it. I create a table definition T here with table and column names (TableDesc) as implicit arguments. I would have thought that it should be possible to set them explicitly, but I couldn't find it. For the example a create to table query instances, aTable and bTable. Then I insert and select some values and in the end I update a value in the bTable.
import slick.driver.H2Driver.api._
import scala.concurrent.Await
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.util.{Failure, Success}
val db = Database.forURL("jdbc:h2:mem:test1;DB_CLOSE_DELAY=-1", "sa", "", null, "org.h2.Driver")
case class TableDesc(tableName: String, intColumnName: String, stringColumnName: String)
class T(tag: Tag)(implicit tableDesc: TableDesc) extends Table[(String, Int)](tag, tableDesc.tableName) {
def stringColumn = column[String](tableDesc.intColumnName)
def intColumn = column[Int](tableDesc.stringColumnName)
def * = (stringColumn, intColumn)
}
val aTable = {
implicit val tableDesc = TableDesc("TABLE_A", "sa", "ia")
TableQuery[T]
}
val bTable = {
implicit val tableDesc = TableDesc("TABLE_B", "sb", "ib")
TableQuery[T]
}
val future = for {
_ <- db.run(aTable.schema.create)
_ <- db.run(aTable += ("Hi", 1))
resultA <- db.run(aTable.result)
_ <- db.run(bTable.schema.create)
_ <- db.run(bTable ++= Seq(("Test1", 1), ("Test2", 2)))
_ <- db.run(bTable.filter(_.stringColumn === "Test1").map(_.intColumn).update(3))
resultB <- db.run(bTable.result)
} yield (resultA, resultB)
Await.result(future, Duration.Inf)
future.onComplete {
case Success(a) => println(s"OK $a")
case Failure(f) => println(s"DOH $f")
}
Thread.sleep(500)
I've got the sleep statement in the end to assert that the Future.onComplete gets time to finish before the application ends. Is there any other way?
Related
I am using scala with Cassandra4io library. I am trying to perform a select IN query. The parameter of IN is like a tuple (comma separated string values). And it has not worked for me. I tried different approaches.
// keys (List[String])
val clientIdCommaSepValues = keys.mkString(",")
val selectValue = selectQuery(clientIdCommaSepValues)
private def selectQuery(clientids: String) =
cql"select * from clientinformation WHERE (clientid IN ( ${clientids} ))".as[CassandraClientInfoRow]
this worked only when the value is one (length of keys is 1).
or
private val selectQuery =
cqlt"select * from clientinformation WHERE (clientid IN ${Put[String]}) ".as[CassandraClientInfoRow]
I also tried to put ' ' quotes on the strings.
sorry for the delay on this. It turns out that adding that extra set of parenthesis around your value (in the example above IN (${clientIds})) throws off the string interpolator leading it to select the wrong Binder datatype which is used to serialize the datatype in your query before it sends it off to Cassandra (ouch!).
This selected TEXT instead of List[TEXT]
What you want to do instead is reformulate the query like so:
val keys: List[String] = ???
val selectValue = selectQuery(keys)
private def selectQuery(clientids: List[String]) =
cql"select * from clientinformation WHERE clientid IN ${clientids}".as[CassandraClientInfoRow]"""
I was able to reproduce this on my end and drop the parens. Here's what I did
CREATE KEYSPACE example WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE IF NOT EXISTS test_data (
id TEXT,
data INT,
PRIMARY KEY ((id))
);
package com.ringcentral.cassandra4io
import cats.effect._
import com.datastax.oss.driver.api.core.CqlSession
import com.ringcentral.cassandra4io.cql._
import fs2._
import java.net.InetSocketAddress
import scala.jdk.CollectionConverters._
object Investigation extends IOApp {
final case class TestDataRow(id: String, data: Int)
def insert(in: TestDataRow, session: CassandraSession[IO]): IO[Boolean] =
cql"INSERT INTO test_data (id, data) VALUES (${in.id}, ${in.data})"
.execute(session)
override def run(args: List[String]): IO[ExitCode] = {
val rSession = {
val builder =
CqlSession
.builder()
.addContactPoints(List(InetSocketAddress.createUnresolved("localhost", 9042)).asJava)
.withLocalDatacenter("dc1")
.withKeyspace("example")
CassandraSession.connect[IO](builder)
}
rSession.use { session =>
val insertData: Stream[IO, INothing] =
Stream.eval(insert(TestDataRow("test", 1), session) *> insert(TestDataRow("test2", 2), session)).drain
def query(ids: List[String]): Stream[IO, TestDataRow] =
cql"SELECT id, data FROM test_data WHERE id IN $ids"
.as[TestDataRow]
.select(session)
(insertData ++ query(List("test", "test2")))
.evalTap(i => IO(println(i)))
.compile
.drain
.as(ExitCode.Success)
}
}
}
This works great since now it selects the right Binder which is List(TEXT) as you can see above! Sorry for the trouble you had and the cryptic error messages but thank you for using this library :D
My objective is to run a number of spark ml regression models (1000s of times) on one dataset and I want to do this using zio instead of future, because it is running too slow. Below is the working example of using Future.
A distinct list of keys is used to filter the partitioned dataset on key and run the model on. I've set up a thread pool with 8 executors to manage it, but it quickly degrades in performance.
import scala.concurrent.{Await, ExecutionContext, ExecutionContextExecutorService, Future}
import java.util.concurrent.{Executors, TimeUnit}
import scala.concurrent.duration._
import org.apache.spark.sql.SaveMode
val pool = Executors.newFixedThreadPool(8)
implicit val xc: ExecutionContextExecutorService = ExecutionContext.fromExecutorService(pool)
case class Result(key: String, coeffs: String)
try {
import spark.implicits._
val tasks = {
for (x <- keys)
yield Future {
Seq(
Result(
x.group,
runModel(input.filter(col("group")===x)).mkString(",")
)
).toDS()
.write.mode(SaveMode.Overwrite).option("header", false).csv(
s"hdfs://namenode:8020/results/$x.csv"
)
}
}.toSeq
Await.result(Future.sequence(tasks), Duration.Inf)
}
finally {
pool.shutdown()
pool.awaitTermination(Long.MaxValue, TimeUnit.NANOSECONDS)
}
I've tried to implement this in zio, but I don't know how to implement queues and set a limit of executors like in futures.
Below is my failed attempt so far...
import zio._
import zio.console._
import zio.stm._
import org.apache.spark.sql.{Dataset, SaveMode, SparkSession}
import org.apache.spark.sql.functions.col
//example data/signatures
case class ModelResult(key: String, coeffs: String)
case class Data(key: String, sales: Double)
val keys: Array[String] = Array("100_1", "100_2")
def runModel[T](ds: Dataset[T]): Vector[Double]
object MyApp1 extends App {
val spark = SparkSession
.builder()
.getOrCreate()
import spark.implicits._
val input: Dataset[Data] = Seq(Data("100_1", 1d), Data("100_2", 2d)).toDS
def run(args: List[String]): ZIO[ZEnv, Nothing, Int] = {
for {
queue <- Queue.bounded[Int](8)
_ <- ZIO.foreach(1 to 8) (i => queue.offer(i)).fork
_ <- ZIO.foreach(keys) { k => queue.take.flatMap(_ => readWrite(k, input, queue)) }
} yield 0
}
def writecsv(k: String, v: String) = {
Seq(ModelResult(k, v))
.toDS
.write
.mode(SaveMode.Overwrite).option("header", value = false)
.csv(s"hdfs://namenode:8020/results/$k.csv")
}
def readWrite[T](key: String, ds: Dataset[T], queue: Queue[Int]): ZIO[ZEnv, Nothing, Int] = {
(for {
result <- runModel(ds.filter(col("key")===key)).mkString(",")
_ <- writecsv(key, result)
_ <- queue.offer(1)
_ <- putStrLn(s"successfully wrote output for $key")
} yield 0)
}
}
//to run
MyApp1.run(List[String]())
What is the best way to deal with compute this in zio?
To parallelize some workload across, say, 8 threads all you need is
ZIO.foreachParN(8)(1 to 100)(id => zio.blocking.blocking(Task{yourClusterJob(id)}))
But don't expect lots of a boost by switching from Futures to ZIO here:
1) Actual workload dominates coordination overhead so difference between ZIO and Future should be marginal.
2) Maybe you won't get any boost at all because 8 tasks will be fighting for the same resource pool in the Spark cluster.
I'm new to Slick and struggling to find a good canonical example for the following.
I'd like to insert a row into two tables. The first table has a primary key which auto-increments. The second table is related to the first via its primary key.
So I'd like to:
Start a transaction
Insert a row into table 1, which generates a key
Insert a row into table 2, with a foreign key generated in the previous step
End transaction (rollback steps 2 & 3 if either fail)
Would appreciate a canonical example for the above logic, and any related suggestions on my definitions below (I'm very new to Slick!). Thanks!
Insert logic for table 1
private def insertAndReturn(entry: Entry) =
entries returning entries.map(_.id)
into ((_, newId) => entry.copy(id = newId))
def insert(entry: Entry): Future[Entry] =
db.run(insertAndReturn(entry) += entry)
(similar for table 2)
Table 1
class EntryTable(tag: Tag) extends Table[Entry](tag, "tblEntry") {
def id = column[EntryId]("entryID", O.PrimaryKey, O.AutoInc)
...
def * = (id, ...).shaped <> (Entry.tupled, Entry.unapply)
}
Table 2
class UsernameChangeTable(tag: Tag) extends Table[UserNameChange](tag, "tblUserNameChange") {
def entryId = column[EntryId]("entryID")
...
def entry = foreignKey("ENTRY_FK", entryId, entryDao.entries)(
_.id, onUpdate = Restrict, onDelete = Cascade
)
I'm using a MySQL database and Slick 3.1.0.
All that you have to do is
val tx =
insertAndReturn(entry).flatMap { id =>
insertUserNameChange(UserNameChange(id, ...))
}.transactionally
db.run(tx)
Note that insertUserNameChange is the function which inserts the UserNameChange instance into the database. It needs the EntryId which you get back from the previous insertion action.
Compose actions using flatMap and use transactionally to run the whole query in a transaction.
Your Slick tables look fine.
Here is a canonical example implementing this functionality
package models
import scala.concurrent.{Future, Await}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import slick.backend.DatabasePublisher
import slick.driver.H2Driver.api._
case class Supplier1(id:Int,name:String)
class Suppliers1(tag:Tag) extends Table[Supplier1](tag,"SUPPLIERS") {
def id:Rep[Int] = column[Int]("SUP_ID",O.PrimaryKey,O.AutoInc)
def name:Rep[String] = column[String]("NAME")
def * = (id,name) <>
(Supplier1.tupled,Supplier1.unapply)
}
case class Coffee1(id:Int,name:String,suppId:Int)
class Coffees1(tag:Tag) extends Table[Coffee1](tag,"COFFEES"){
def id:Rep[Int] = column[Int]("C_ID",O.PrimaryKey,O.AutoInc)
def name:Rep[String] = column[String]("COFFEE_NAME")
def suppId:Rep[Int] = column[Int]("SUP_ID")
def * = (id,name,suppId) <> (Coffee1.tupled,Coffee1.unapply)
def supplier = foreignKey("supp_fk", suppId, TableQuery[Suppliers])(_.id)
}
object HelloSlick1 extends App{
val db = Database.forConfig("h2mem1")
val suppliers = TableQuery[Suppliers1]
val coffees = TableQuery[Coffees1]
val setUpF = (suppliers.schema ++ coffees.schema).create
val insertSupplier = suppliers returning suppliers.map(_.id)
//val tx = (insertSupplier += Supplier1(0,"SUPP 1")).flatMap(id=>(coffees += Coffee1(0,"COF",id))).transactionally
val tx = for{
supId <- insertSupplier += Supplier1(0,"SUPP 1")
cId <- coffees += Coffee1(0,"COF",supId)
} yield ()
tx.transactionally
def exec[T](action: DBIO[T]): T =
Await.result(db.run(action), Duration.Inf)
exec(setUpF)
exec(tx)
exec(suppliers.result.map(println))
exec(coffees.result.map(println))
}
I'm trying to wrap some blocking calls in Future.The return type is Seq[User] where User is a case class. The following just wouldn't compile with complaints of various overloaded versions being present. Any suggestions? I tried almost all the variations is Source.apply without any luck.
// All I want is Seq[User] => Future[Seq[User]]
def findByFirstName(firstName: String) = {
val users: Seq[User] = userRepository.findByFirstName(firstName)
val sink = Sink.fold[User, User](null)((_, elem) => elem)
val src = Source(users) // doesn't compile
src.runWith(sink)
}
First of all, I assume that you are using version 1.0 of akka-http-experimental since the API may changed from previous release.
The reason why your code does not compile is that the akka.stream.scaladsl.Source$.apply() requires
scala.collection.immutable.Seq instead of scala.collection.mutable.Seq.
Therefore you have to convert from mutable sequence to immutable sequence using to[T] method.
Document: akka.stream.scaladsl.Source
Additionally, as you see the document, Source$.apply() accepts ()=>Iterator[T] so you can also pass ()=>users.iterator as argument.
Since Sink.fold(...) returns the last evaluated expression, you can give an empty Seq() as the first argument, iterate over the users with appending the element to the sequence, and finally get the result.
However, there might be a better solution that can create a Sink which puts each evaluated expression into Seq, but I could not find it.
The following code works.
import akka.actor._
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Source,Sink}
import scala.concurrent.ExecutionContext.Implicits.global
case class User(name:String)
object Main extends App{
implicit val system = ActorSystem("MyActorSystem")
implicit val materializer = ActorMaterializer()
val users = Seq(User("alice"),User("bob"),User("charlie"))
val sink = Sink.fold[Seq[User], User](Seq())(
(seq, elem) =>
{println(s"elem => ${elem} \t| seq => ${seq}");seq:+elem})
val src = Source(users.to[scala.collection.immutable.Seq])
// val src = Source(()=>users.iterator) // this also works
val fut = src.runWith(sink) // Future[Seq[User]]
fut.onSuccess({
case x=>{
println(s"result => ${x}")
}
})
}
The output of the code above is
elem => User(alice) | seq => List()
elem => User(bob) | seq => List(User(alice))
elem => User(charlie) | seq => List(User(alice), User(bob))
result => List(User(alice), User(bob), User(charlie))
If you need just Future[Seq[Users]] dont use akka streams but
futures
import scala.concurrent._
import ExecutionContext.Implicits.global
val session = socialNetwork.createSessionFor("user", credentials)
val f: Future[List[Friend]] = Future {
session.getFriends()
}
I have methods in my Play app that query database tables with over hundred columns. I can't define case class for each such query, because it would be just ridiculously big and would have to be changed with each alter of the table on the database.
I'm using this approach, where result of the query looks like this:
Map(columnName1 -> columnVal1, columnName2 -> columnVal2, ...)
Example of the code:
implicit val getListStringResult = GetResult[List[Any]] (
r => (1 to r.numColumns).map(_ => r.nextObject).toList
)
def getSomething(): Map[String, Any] = DB.withSession {
val columns = MTable.getTables(None, None, None, None).list.filter(_.name.name == "myTable").head.getColumns.list.map(_.column)
val result = sql"""SELECT * FROM myTable LIMIT 1""".as[List[Any]].firstOption.map(columns zip _ toMap).get
}
This is not a problem when query only runs on a single database and single table. I need to be able to use multiple tables and databases in my query like this:
def getSomething(): Map[String, Any] = DB.withSession {
//The line below is no longer valid because of multiple tables/databases
val columns = MTable.getTables(None, None, None, None).list.filter(_.name.name == "table1").head.getColumns.list.map(_.column)
val result = sql"""
SELECT *
FROM db1.table1
LEFT JOIN db2.table2 ON db2.table2.col1 = db1.table1.col1
LIMIT 1
""".as[List[Any]].firstOption.map(columns zip _ toMap).get
}
The same approach can no longer be used to retrieve column names. This problem doesn't exist when using something like PHP PDO or Java JDBCTemplate - these retrieve column names without any extra effort needed.
My question is: how do I achieve this with Slick?
import scala.slick.jdbc.{GetResult,PositionedResult}
object ResultMap extends GetResult[Map[String,Any]] {
def apply(pr: PositionedResult) = {
val rs = pr.rs // <- jdbc result set
val md = rs.getMetaData();
val res = (1 to pr.numColumns).map{ i=> md.getColumnName(i) -> rs.getObject(i) }.toMap
pr.nextRow // <- use Slick's advance method to avoid endless loop
res
}
}
val result = sql"select * from ...".as(ResultMap).firstOption
Another variant that produces map with not null columns (keys in lowercase):
private implicit val getMap = GetResult[Map[String, Any]](r => {
val metadata = r.rs.getMetaData
(1 to r.numColumns).flatMap(i => {
val columnName = metadata.getColumnName(i).toLowerCase
val columnValue = r.nextObjectOption
columnValue.map(columnName -> _)
}).toMap
})