How to get optional result in insert statements with Doobie? - scala

I have an optional insert query:
val q = sql"insert into some_table (some_field) select 42 where ...(some condition)"
Running this query with:
q.update.withUniqueGeneratedKeys[Option[Long]]("id")
fails with
Result set exhausted: more rows expected
then condition is false.
How to get Optional[Long] result from insert statements with Doobie?
UPD
.withGeneratedKeys[Long]("id") gives just Long in for comprehension
val q = sql"insert into some_table (some_field) select 42 where ...(some condition)"
for {
id <- q.update.withGeneratedKeys[Long]("id") // id is long
_ <- if (<id is present>) <some other inserts> else <nothing>
} yield id
How to check id?

As #Thilo commented, you can use the use withGeneratedKeys which gives you back a Stream[F, Long] (where F is your effect type)
val result = q.update.withGeneratedKeys[Long]("id")
Here is the doc.

Related

Scala Doobie. Creating and Inserting into a Temp Table

I am relatively new to Scala and also new to Doobie. I am connecting to SQL Server 2014 and need to create a temp table and subsequently insert into that temp table. In SQL Server, when you create a temp table, and the connection is severed, the temp table is automatically deleted.
In the following snippet, I am getting this exception:
Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '#temp'
The snippet:
val create: doobie.ConnectionIO[Int] = sql"CREATE TABLE #temp (tmp CHAR(20))".update.run
val insert: doobie.ConnectionIO[Int] = sql"INSERT INTO #temp values ('abc'), ('def')".update.run
val query: doobie.ConnectionIO[List[String]] = sql"select * from #temp ".query[String].to[List]
def wrapper(): ConnectionIO[List[String]] = {
for {
c <- create
i <- insert
q <- query
} yield q
}
wrapper().transact(xa).debug.as(ExitCode.Success)
I believe this is telling me that Doobie is dropping the connection between the create and insert statements?
The expected/desired behavior is that it will return a List("abc","def").
Thanks in advance for any help!
Update:
Here's a small example of what I know is in fact working:
val create = sql"CREATE TABLE #temp (tmp CHAR(20))"
val insert: doobie.ConnectionIO[Int] = sql"INSERT INTO #temp values ('abc'), ('def')"
(create ++ insert).update.run.transact(xa).debug.as(ExitCode.Success)
(Note that it only works with the create and insert part and not the query part)
After 1 week of banging my head against my laptop...I finally figured it out. Doobie will actually perform "update" commands when doing .query:
val create: Fragment = sql"CREATE TABLE #temp (tmp CHAR(20))"
val insert: Fragment = sql"INSERT INTO #temp values ('abc'), ('def')"
val query: Fragment = sql"select * from #temp "
(create ++ insert ++ query).query[String].to[List].transact(xa).debug.as(ExitCode.Success)
Outputs:
List(abc ,def )

Scala Slick combining Rep sub queries into one re

I sum up totals in two different database tables:
val sum1Query: Rep[Int] = tableQuery1.map(_.amount).sum.ifNull(0)
val sum2Query: Rep[Int] = tableQuery2.map(_.amount).sum.ifNull(0)
for {
sum1 <- sum1Query.result
sum2 <- sum2Query.result
} yield {
sum1 + sum2
}
This runs 2 SQL queries to the database each time .result is called. I am looking for a way to make it use only one SQL query.
Something like this doesn't work:
for {
sum1 <- sum1Query
sum2 <- sum2Query
} yield {
sum1 + sum2
}.result
Any ideas on how to do it in Slick other than using plain SQL query?
Each call to .result creates a DBIO action which is a SQL statement. The trick to reducing the number of actions is to find a way to combine two queries (or two Reps) together into one action.
In your case you could zip the two queries:
val sum1 = table1.map(_.amount).sum.ifNull(0)
val sum2 = table2.map(_.amount).sum.ifNull(0)
val query = sum1.zip(sum2)
When run the query.result you'll execute a single query something like:
select ifnull(x2.x3,0), ifnull(x4.x5,0)
from
(select sum("amount") as x3 from "table_1") x2,
(select sum("amount") as x5 from "table_2") x4
...which will result in a tuple of the two values.
However, as you've already just got a Rep[Int] you can use + in the database:
val query = sum1 + sum2
...which will be a query along the lines of:
select ifnull(x2.x3,0) + ifnull(x4.x5,0)
from
(select sum("amount") as x3 from "table_1") x2,
(select sum("amount") as x5 from "table_2") x4
Just put it like this:
(sum1 + sum2).result

How to get table names from SQL query?

I want to get all the tables names from a sql query in Spark using Scala.
Lets say user sends a SQL query which looks like:
select * from table_1 as a left join table_2 as b on a.id=b.id
I would like to get all tables list like table_1 and table_2.
Is regex the only option ?
Thanks a lot #Swapnil Chougule for the answer. That inspired me to offer an idiomatic way of collecting all the tables in a structured query.
scala> spark.version
res0: String = 2.3.1
def getTables(query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
logicalPlan.collect { case r: UnresolvedRelation => r.tableName }
}
val query = "select * from table_1 as a left join table_2 as b on a.id=b.id"
scala> getTables(query).foreach(println)
table_1
table_2
Hope it will help you
Parse the given query using spark sql parser (spark internally does same). You can get sqlParser from session's state. It will give Logical plan of query. Iterate over logical plan of query & check whether it is instance of UnresolvedRelation (leaf logical operator to represent a table reference in a logical query plan that has yet to be resolved) & get table from it.
def getTables(query: String) : Seq[String] ={
val logical : LogicalPlan = localsparkSession.sessionState.sqlParser.parsePlan(query)
val tables = scala.collection.mutable.LinkedHashSet.empty[String]
var i = 0
while (true) {
if (logical(i) == null) {
return tables.toSeq
} else if (logical(i).isInstanceOf[UnresolvedRelation]) {
val tableIdentifier = logical(i).asInstanceOf[UnresolvedRelation].tableIdentifier
tables += tableIdentifier.unquotedString.toLowerCase
}
i = i + 1
}
tables.toSeq
}
I had some complicated sql queries with nested queries and iterated on #Jacek Laskowski's answer to get this
def getTables(spark: SparkSession, query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
var tables = new ListBuffer[String]()
var i: Int = 0
while (logicalPlan(i) != null) {
logicalPlan(i) match {
case t: UnresolvedRelation => tables += t.tableName
case _ =>
}
i += 1
}
tables.toList
}
def __sqlparse2table(self, query):
'''
#description: get table name from table
'''
plan = self.spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
plan_string = plan.toString().replace('`.`', '.')
unr = re.findall(r"UnresolvedRelation `(.*?)`", plan_string)
cte = re.findall(r"CTE \[(.*?)\]", plan.toString())
cte = [tt.strip() for tt in cte[0].split(',')] if cte else cte
schema = set()
tables = set()
for table_name in unr:
if table_name not in cte:
schema.update([table_name.split('.')[0]])
tables.update([table_name])
return schema, tables
Since you need to list all the columns names listed in table1 and table2, what you can do is to show tables in db.table_name in your hive db.
val tbl_column1 = sqlContext.sql("show tables in table1");
val tbl_column2 = sqlContext.sql("show tables in table2");
You will get list of columns in both the table.
tbl_column1.show
name
id
data
unix did the trick, grep 'INTO\|FROM\|JOIN' .sql | sed -r 's/.?(FROM|INTO|JOIN)\s?([^ ])./\2/g' | sort -u
grep 'overwrite table' .txt | sed -r 's/.?(overwrite table)\s?([^ ])./\2/g' | sort -u

How to make aggregations with slick

I want to force slick to create queries like
select max(price) from coffees where ...
But slick's documentation doesn't help
val q = Coffees.map(_.price) //this is query Query[Coffees.type, ...]
val q1 = q.min // this is Column[Option[Double]]
val q2 = q.max
val q3 = q.sum
val q4 = q.avg
Because those q1-q4 aren't queries, I can't get the results but can use them inside other queries.
This statement
for {
coffee <- Coffees
} yield coffee.price.max
generates right query but is deprecated (generates warning: " method max in class ColumnExtensionMethods is deprecated: Use Query.max instead").
How to generate such query without warnings?
Another issue is to aggregate with group by:
"select name, max(price) from coffees group by name"
Tried to solve it with
for {
coffee <- Coffees
} yield (coffee.name, coffee.price.max)).groupBy(x => x._1)
which generates
select x2.x3, x2.x3, x2.x4 from (select x5."COF_NAME" as x3, max(x5."PRICE") as x4 from "coffees" x5) x2 group by x2.x3
which causes obvious db error
column "x5.COF_NAME" must appear in the GROUP BY clause or be used in an aggregate function
How to generate such query?
As far as I can tell is the first one simply
Query(Coffees.map(_.price).max).first
And the second one
val maxQuery = Coffees
.groupBy { _.name }
.map { case (name, c) =>
name -> c.map(_.price).max
}
maxQuery.list
or
val maxQuery = for {
(name, c) <- Coffees groupBy (_.name)
} yield name -> c.map(_.price).max
maxQuery.list

scalaquery retrieve values

I have few tables, lets say 2 for simplicity. I can create them in this way,
...
val tableA = new Table[(Int,Int)]("tableA"){
def a = column[Int]("a")
def b = column[Int]("b")
}
val tableB = new Table[(Int,Int)]("tableB"){
def a = column[Int]("a")
def b = column[Int]("b")
}
Im going to have a query to retrieve value 'a' from tableA and value 'a' from tableB as a list inside the results from 'a'
my result should be:
List[(a,List(b))]
so far i came upto this point in query,
def createSecondItr(b1:NamedColumn[Int]) = for(
b2 <- tableB if b1 === b1.b
) yield b2.a
val q1 = for (
a1 <- tableA
listB = createSecondItr(a1.b)
) yield (a1.a , listB)
i didn't test the code so there might be errors in the code. My problem is I cannot retrieve data from the results.
to understand the question, take trains and classes of it. you search the trains after 12pm and you need to have a result set where the train name and the classes which the train have as a list inside the train's result.
I don't think you can do this directly in ScalaQuery. What I would do is to do a normal join and then manipulate the result accordingly:
import scala.collection.mutable.{HashMap, Set, MultiMap}
def list2multimap[A, B](list: List[(A, B)]) =
list.foldLeft(new HashMap[A, Set[B]] with MultiMap[A, B]){(acc, pair) => acc.addBinding(pair._1, pair._2)}
val q = for (
a <- tableA
b <- tableB
if (a.b === b.b)
) yield (a.a, b.a)
list2multimap(q.list)
The list2multimap is taken from https://stackoverflow.com/a/7210191/66686
The code is written without assistance of an IDE, compiler or similar. Consider the debugging free training :-)