inserting a list of objects not working in slick - scala

I am using slick3.1.1
I want to insert a list of objects in DB (postgres)
I have written following code which works
for(user<-x.userList)
{
val users = userClass(user.name,user.id)
val userAction = DBAccess.userTable.insertOrUpdate(users)
val f = DBAccess.db.run(DBIO.seq(userAction))
Await.result(f, Duration.Inf)
}
However I am running multiple DB queries. So I was looking some way to call only a single db.run.
So I wrote something like below
val userQuery = TableQuery[dbuserTable]
for(user<-x.userList)
{
val users = userClass(user.name,user.id)
userQuery += users
}
val f = DBAccess.db.run(DBIO.seq(userQuery.result))
Await.result(f, Duration.Inf)
However this second piece does not write to DB. Can someone point out where I am going wrong?

I know this old but since I just fell on it I'll give an updated answer for this.
You can use ++= to insert a list.
val usersTable = TableQuery[dbuserTable]
val listToInsert = x.userList.map(userClass(_.name, _.id))
val action = usersTable ++= listToInsert
DBAccess.db.run(action)
This will only do one request that insert everything at once.

+= doesn't mutate your userQuery. Every iteration of your for loop creates an insert action and then promptly discards it.
Try accumulating the insert actions instead of discarding them (NB use of yield):
val usersTable = TableQuery[dbuserTable]
val inserts = for (user <- x.userList) yield {
val userRow = userClass(user.name, user.id)
usersTable += userRow
}
DBAccess.db.run(DBIO.seq(inserts: _*))

Related

Print out all the data within a TableQuery[Restaurants]

def displayTable(table: TableQuery[Restaurants]): Unit = {
val tablequery = table.map(_.id)
val action = tablequery.result
val result = db.run(action)
result.foreach(id => id.foreach(new_id => println(new_id)))
total_points = total_points + 10
}
I have tried to print out all the data to the screen but I have gotten no where. My question is why does nothing print out. I am using Scala and JDBC connection aka Slick. If you remove new_id => println(new_id), you get:
def displayTable(table: TableQuery[Restaurants]): Unit = {
val tablequery = table.map(_.id)
val action = tablequery.result
val result = db.run(action)
result.foreach(id => println(id))
total_points = total_points + 10
}
This code produces an out put like the following: "Vector()". Can someone please help me print out all the data out? I loaded it in using the following code:
def fillTable(): TableQuery[Restaurants] ={
println("Table filled.")
val restaurants = TableQuery[Restaurants]
val setup = DBIO.seq(
restaurants.schema.create
)
val setupFuture = db.run(setup)
val bufferedSource = io.Source.fromFile("src/main/scala/Restaurants.csv")
for (line <- bufferedSource.getLines) {
val cols = line.split(",").map(_.trim)
var restaurant = new Restaurant(s"${cols(0)}",s"${cols(1)}", s"${cols(2)}",
s"${cols(3)}", s"${cols(4)}",s"${cols(5)}",s"${cols(6)}",
s"${cols(7)}",s"${cols(8)}",s"${cols(9)}")
restaurants.forceInsert(s"${cols(0)}",s"${cols(1)}", s"${cols(2)}",
s"${cols(3)}", s"${cols(4)}",s"${cols(5)}",s"${cols(6)}",
s"${cols(7)}",s"${cols(8)}",s"${cols(9)}")
total_rows = total_rows + 1
This is my first question so I apologize for the format.
The fact that Vector() is your output in the second version of displayTable is a strong hint that your query is returning an empty result, and therefore has no id's to print out. I haven't run your code myself, but I suspect this is because restaurants.forceInsert returns an action, and you need to db.run() it to actually execute the query.
I'm also curious why you create var restaurant = ... but then ignore it, and call forceInsert recreating the tuple from the csv values again. Why not restaurants.forceInsert(restaurant)?

Spark: show and collect-println giving different outputs

I am using Spark 2.2
I feel like I have something odd going on here. Basic premise is that
I have a set of KIE/Drools rules running through a Dataset of profile objects
I am then trying to show/collect-print the resulting output
I then cast the output as a tuple to flatmap it later
Code below
implicit val mapEncoder = Encoders.kryo[java.util.HashMap[String, Any]]
implicit val recommendationEncoder = Encoders.kryo[Recommendation]
val mapper = new ObjectMapper()
val kieOuts = uberDs.map(profile => {
val map = mapper.convertValue(profile, classOf[java.util.HashMap[String, Any]])
val profile = Profile(map)
// setup the kie session
val ks = KieServices.Factory.get
val kContainer = ks.getKieClasspathContainer
val kSession = kContainer.newKieSession() //TODO: stateful session, how to do stateless?
// insert profile object into kie session
val kCmds = ks.getCommands
val cmds = new java.util.ArrayList[Command[_]]()
cmds.add(kCmds.newInsert(profile))
cmds.add(kCmds.newFireAllRules("outFired"))
// fire kie rules
val results = kSession.execute(kCmds.newBatchExecution(cmds))
val fired = results.getValue("outFired").toString.toInt
// collect the inserted recommendation objects and create uid string
import scala.collection.JavaConversions._
var gresults = kSession.getObjects
gresults = gresults.drop(1) // drop the inserted profile object which also gets collected
val recommendations = scala.collection.mutable.ListBuffer[Recommendation]()
gresults.toList.foreach(reco => {
val recommendation = reco.asInstanceOf[Recommendation]
recommendations += recommendation
})
kSession.dispose
val uIds = StringBuilder.newBuilder
if(recommendations.size > 0) {
recommendations.foreach(recommendation => {
uIds.append(recommendation.getOfferId + "_" + recommendation.getScore)
uIds.append(";")
})
uIds.deleteCharAt(uIds.size - 1)
}
new ORecommendation(profile.getAttributes().get("cId").toString.toLong, fired, uIds.toString)
})
println("======================Output#1======================")
kieOuts.show(1000, false)
println("======================Output#2======================")
kieOuts.collect.foreach(println)
//separating cid and and each uid into individual rows
val kieOutsDs = kieOuts.as[(Long, Int, String)]
println("======================Output#3======================")
kieOutsDs.show(1000, false)
(I have sanitized/shortened the id's below, they are much bigger but with a similar format)
What I am seeing as outputs
Output#1 has a set of uIds(as String) come up
+----+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|cId |rulesFired | eligibleUIds |
|842 | 17|123-25_2.0;12345678-48_9.0;28a-ad_5.0;123-56_10.0;123-27_2.0;123-32_3.0;c6d-e5_5.0;123-26_2.0;123-51_10.0;8e8-c1_5.0;123-24_2.0;df8-ad_5.0;123-36_5.0;123-16_2.0;123-34_3.0|
+----+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Output#2 has mostly a similar set of uIds show up(usually off by 1 element)
ORecommendation(842,17,123-36_5.0;123-24_2.0;8e8-c1_5.0;df8-ad_5.0;28a-ad_5.0;660-73_5.0;123-34_3.0;123-48_9.0;123-16_2.0;123-51_10.0;123-26_2.0;c6d-e5_5.0;123-25_2.0;123-56_10.0;123-32_3.0)
Output#3 is same as #Output1
+----+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|842 | 17 |123-32_3.0;8e8-c1_5.0;123-51_10.0;123-48_9.0;28a-ad_5.0;c6d-e5_5.0;123-27_2.0;123-16_2.0;123-24_2.0;123-56_10.0;123-34_3.0;123-36_5.0;123-6_2.0;123-25_2.0;660-73_5.0|
Every time I run it the difference between Output#1 and Output#2 is 1 element but never the same element (In the above example, Output#1 has 123-27_2.0 but Output#2 has 660-73_5.0)
Should they not be the same? I am still new to Scala/Spark and feel like I am missing something very fundamental
I think I figured this out, adding cache to kieOuts atleast got me identical outputs between show and collect.
I will be looking at why KIE gives me different output for every run of the same input but that is a different issue

How can this be done concurrently in scala

So I have this chunk of code
dbs.foreach({
var map = scala.collection.mutable.Map[String, mutable.MutableList[String]]()
db =>
val resultList = getTables(hive, db)
map+=(db -> resultList)
})
What this does is loops through a list of dbs, does a show tables in db call for each db, then adds the db -> table to a map. How can this be done concurrently since there is about a 5 seconds wait time for the hive query to return?
update code --
def getAllTablesConcurrent(hive: JdbcHive, dbs: mutable.MutableList[String]): Map[String, mutable.MutableList[String]] = {
implicit val context:ExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10))
val futures = dbs.map {
db =>
Future(db, getTables(hive, db))
}
val map = Await.result( Future.sequence(futures), Duration(10, TimeUnit.SECONDS) ).toMap
map
}
Don't use vars and mutable state, especially if you want concurrency.
val result: Future[Map[String, Seq[String]] = Future
.traverse(dbs) { name =>
Future(name -> getTables(hive, name) )
}.map(_.toMap)
if you want more control (how much time do you want to wait, how many threads do you want to use, what happens if all your threads are busy, etc) you can use ThreadPollExecutor and Future
implicit val context:ExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10))
val dbs = List("db1", "db2", "db3")
val futures = dbs.map {
name => Future(name, getables(hive, name))
}
val result = Await.result( Future.sequence(futures), Duration(TIMEOUT, TimeUnit.MILLISECONDS) ).toMap
just remember not to create a new ExecutionContext every time you need it
You can use .par on any Scala collection to perform the next transformation in parallel (using default parallelism which depends on number of cores).
Also - easier and cleaner to map into an (immutable) map instead of updating a mutable one.
val result = dbs.par.map(db => db -> getTables(hive, db)).toMap
To have more control on the number of concurrent threads used, see https://docs.scala-lang.org/overviews/parallel-collections/configuration.html

Unable to convert a ResultSet into a List in Cassandra datastax driver

In the following Cassandra code, I am querying a database and expect multiple values. The function takes and id and should return Option[List[M]] where M is my model. I have a function rowToModel(row: Row): MyModel which could take a row from ResultSet and convert it into instance of my model.
My issue is that the List I am returning is always empty even though ResultSet has data. I checked it by adding debug prints in rowToModel
def getRowsByPartitionKeyId(id:I):Option[List[M]] = {
val whereClause = whereConditions(tablename, id);
val resultSet = session.execute(whereClause) //resultSet is an iterator
val it = resultSet.iterator();
val resultList:List[M] = List();
if(it.hasNext){
while(it.hasNext) {
val item:M = rowToModel(it.next())
resultList.:+(item)
}
Some(resultList) //THIS IS ALWAYS List()
}
else
None
}
I suspect that as resultList is a val, its value is not getting changed in the while loop. I probably should use yield or something else but I don't know what and how.
Solved it by converting Java Iterator to Scala and then using toList
import collection.JavaConverters._
val it = resultSet.iterator();
if(it.hasNext){
val resultSetAsList:List[Row] = asScalaIterator(it).toList
val resultSetAsModelList = resultSetAsList.map((row:Row) => rowToModel(row))
Some(resultSetAsModelList)

Slick 2.1: Return query results as a map [duplicate]

I have methods in my Play app that query database tables with over hundred columns. I can't define case class for each such query, because it would be just ridiculously big and would have to be changed with each alter of the table on the database.
I'm using this approach, where result of the query looks like this:
Map(columnName1 -> columnVal1, columnName2 -> columnVal2, ...)
Example of the code:
implicit val getListStringResult = GetResult[List[Any]] (
r => (1 to r.numColumns).map(_ => r.nextObject).toList
)
def getSomething(): Map[String, Any] = DB.withSession {
val columns = MTable.getTables(None, None, None, None).list.filter(_.name.name == "myTable").head.getColumns.list.map(_.column)
val result = sql"""SELECT * FROM myTable LIMIT 1""".as[List[Any]].firstOption.map(columns zip _ toMap).get
}
This is not a problem when query only runs on a single database and single table. I need to be able to use multiple tables and databases in my query like this:
def getSomething(): Map[String, Any] = DB.withSession {
//The line below is no longer valid because of multiple tables/databases
val columns = MTable.getTables(None, None, None, None).list.filter(_.name.name == "table1").head.getColumns.list.map(_.column)
val result = sql"""
SELECT *
FROM db1.table1
LEFT JOIN db2.table2 ON db2.table2.col1 = db1.table1.col1
LIMIT 1
""".as[List[Any]].firstOption.map(columns zip _ toMap).get
}
The same approach can no longer be used to retrieve column names. This problem doesn't exist when using something like PHP PDO or Java JDBCTemplate - these retrieve column names without any extra effort needed.
My question is: how do I achieve this with Slick?
import scala.slick.jdbc.{GetResult,PositionedResult}
object ResultMap extends GetResult[Map[String,Any]] {
def apply(pr: PositionedResult) = {
val rs = pr.rs // <- jdbc result set
val md = rs.getMetaData();
val res = (1 to pr.numColumns).map{ i=> md.getColumnName(i) -> rs.getObject(i) }.toMap
pr.nextRow // <- use Slick's advance method to avoid endless loop
res
}
}
val result = sql"select * from ...".as(ResultMap).firstOption
Another variant that produces map with not null columns (keys in lowercase):
private implicit val getMap = GetResult[Map[String, Any]](r => {
val metadata = r.rs.getMetaData
(1 to r.numColumns).flatMap(i => {
val columnName = metadata.getColumnName(i).toLowerCase
val columnValue = r.nextObjectOption
columnValue.map(columnName -> _)
}).toMap
})