Scala Doobie. Creating and Inserting into a Temp Table - scala

I am relatively new to Scala and also new to Doobie. I am connecting to SQL Server 2014 and need to create a temp table and subsequently insert into that temp table. In SQL Server, when you create a temp table, and the connection is severed, the temp table is automatically deleted.
In the following snippet, I am getting this exception:
Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '#temp'
The snippet:
val create: doobie.ConnectionIO[Int] = sql"CREATE TABLE #temp (tmp CHAR(20))".update.run
val insert: doobie.ConnectionIO[Int] = sql"INSERT INTO #temp values ('abc'), ('def')".update.run
val query: doobie.ConnectionIO[List[String]] = sql"select * from #temp ".query[String].to[List]
def wrapper(): ConnectionIO[List[String]] = {
for {
c <- create
i <- insert
q <- query
} yield q
}
wrapper().transact(xa).debug.as(ExitCode.Success)
I believe this is telling me that Doobie is dropping the connection between the create and insert statements?
The expected/desired behavior is that it will return a List("abc","def").
Thanks in advance for any help!
Update:
Here's a small example of what I know is in fact working:
val create = sql"CREATE TABLE #temp (tmp CHAR(20))"
val insert: doobie.ConnectionIO[Int] = sql"INSERT INTO #temp values ('abc'), ('def')"
(create ++ insert).update.run.transact(xa).debug.as(ExitCode.Success)
(Note that it only works with the create and insert part and not the query part)

After 1 week of banging my head against my laptop...I finally figured it out. Doobie will actually perform "update" commands when doing .query:
val create: Fragment = sql"CREATE TABLE #temp (tmp CHAR(20))"
val insert: Fragment = sql"INSERT INTO #temp values ('abc'), ('def')"
val query: Fragment = sql"select * from #temp "
(create ++ insert ++ query).query[String].to[List].transact(xa).debug.as(ExitCode.Success)
Outputs:
List(abc ,def )

Related

How to execute hql file in spark with arguments

I have a hql file which accepts several arguments and I then in stand alone spark application, I am calling this hql script to create a dataframe.
This is a sample hql code from my script:
select id , name, age, country , created_date
from ${db1}.${table1} a
inner join ${db2}.${table2} b
on a.id = b.id
And in this is how I am calling it in my Spark script:
import scala.io.Source
val queryFile = `path/to/my/file`
val db1 = 'cust_db'
val db2 = 'cust_db2'
val table1 = 'customer'
val table2 = 'products'
val query = Source.fromFile(queryFile).mkString
val df = spark.sql(query)
When I am using this way, I am getting:
org.apache.spark.sql.catylyst.parser.ParserException
Is there a way to pass arguments directly to my hql file and then create a df out of the hive code.
Parameters can be injected with such code:
val parametersMap = Map("db1" -> db1, "db2" -> db2, "table1" -> table1, "table2" -> table2)
val injectedQuery = parametersMap.foldLeft(query)((acc, cur) => acc.replace("${" + cur._1 + "}", cur._2))

How to get optional result in insert statements with Doobie?

I have an optional insert query:
val q = sql"insert into some_table (some_field) select 42 where ...(some condition)"
Running this query with:
q.update.withUniqueGeneratedKeys[Option[Long]]("id")
fails with
Result set exhausted: more rows expected
then condition is false.
How to get Optional[Long] result from insert statements with Doobie?
UPD
.withGeneratedKeys[Long]("id") gives just Long in for comprehension
val q = sql"insert into some_table (some_field) select 42 where ...(some condition)"
for {
id <- q.update.withGeneratedKeys[Long]("id") // id is long
_ <- if (<id is present>) <some other inserts> else <nothing>
} yield id
How to check id?
As #Thilo commented, you can use the use withGeneratedKeys which gives you back a Stream[F, Long] (where F is your effect type)
val result = q.update.withGeneratedKeys[Long]("id")
Here is the doc.

Insert from several temporary tables

I want to rewrite the following query using jooq:
with first_temp as (
select a.id as lie_id
from first_table a
where a.some_Field = 100160
), second_temp as (
select b.id as ben_id
from second_table b
where b.email = 'some.email#gmail.com'
) insert into third_table (first_table_id, second_table_id)
select a.lie_id, b.ben_id from first_temp a, second_temp b;
I was trying something like the following:
DriverManager.getConnection(url, login, password).use {
val create = DSL.using(it, SQLDialect.POSTGRES)
create.with("first_temp").`as`(create.select(FIRST_TABLE.ID.`as`("lie_id")))
.with("second_temp").`as`(create.select(SECOND_TABLE.ID.`as`("ben_id")))
.insertInto(THIRD_TABLE, THIRD_TABLE.FIRST_TABLE_ID, THIRD_TABLE.SECOND_TABLE_ID)
.select(create.select().from("first_temp", "second_temp"), create.select().from("second_temp")))
}
But without success.
Your fixed query
// You forgot FROM and WHERE clauses in your CTEs!
create.with("first_temp").`as`(
create.select(FIRST_TABLE.ID.`as`("lie_id"))
.from(FIRST_TABLE)
.where(FIRST_TABLE.SOME_FIELD.eq(100160)))
.with("second_temp").`as`(
create.select(SECOND_TABLE.ID.`as`("ben_id"))
.from(SECOND_TABLE)
.where(SECOND_TABLE.EMAIL.eq("some.email#gmail.com")))
.insertInto(THIRD_TABLE, THIRD_TABLE.FIRST_TABLE_ID, THIRD_TABLE.SECOND_TABLE_ID)
// You had too many queries in this part of the statement, and
// didn't project the two columns you were interested int
.select(create.select(
field(name("first_temp", "lie_id")),
field(name("second_temp", "ben_id")))
.from("first_temp", "second_temp"))
// Don't forget this ;-)
.execute();
But frankly, why even use CTE at all? Your query would be much simpler like this, both in SQL and in jOOQ (assuming that you really want this cartesian product):
Better SQL Version
insert into third_table (first_table_id, second_table_id)
select a.id, b.id
from first_table a, second_table b
where a.some_field = 100160
and b.email = 'some.email#gmail.com';
Better jOOQ Version
create.insertInto(THIRD_TABLE, THIRD_TABLE.FIRST_TABLE_ID, THIRD_TABLE.SECOND_TABLE_ID)
.select(create.select(FIRST_TABLE.ID, SECOND_TABLE.ID)
.from(FIRST_TABLE, SECOND_TABLE)
.where(FIRST_TABLE.SOME_FIELD.eq(100160))
.and(SECOND_TABLE.EMAIL.eq("some_email#gmail.com")))
.execute();

How to get table names from SQL query?

I want to get all the tables names from a sql query in Spark using Scala.
Lets say user sends a SQL query which looks like:
select * from table_1 as a left join table_2 as b on a.id=b.id
I would like to get all tables list like table_1 and table_2.
Is regex the only option ?
Thanks a lot #Swapnil Chougule for the answer. That inspired me to offer an idiomatic way of collecting all the tables in a structured query.
scala> spark.version
res0: String = 2.3.1
def getTables(query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
logicalPlan.collect { case r: UnresolvedRelation => r.tableName }
}
val query = "select * from table_1 as a left join table_2 as b on a.id=b.id"
scala> getTables(query).foreach(println)
table_1
table_2
Hope it will help you
Parse the given query using spark sql parser (spark internally does same). You can get sqlParser from session's state. It will give Logical plan of query. Iterate over logical plan of query & check whether it is instance of UnresolvedRelation (leaf logical operator to represent a table reference in a logical query plan that has yet to be resolved) & get table from it.
def getTables(query: String) : Seq[String] ={
val logical : LogicalPlan = localsparkSession.sessionState.sqlParser.parsePlan(query)
val tables = scala.collection.mutable.LinkedHashSet.empty[String]
var i = 0
while (true) {
if (logical(i) == null) {
return tables.toSeq
} else if (logical(i).isInstanceOf[UnresolvedRelation]) {
val tableIdentifier = logical(i).asInstanceOf[UnresolvedRelation].tableIdentifier
tables += tableIdentifier.unquotedString.toLowerCase
}
i = i + 1
}
tables.toSeq
}
I had some complicated sql queries with nested queries and iterated on #Jacek Laskowski's answer to get this
def getTables(spark: SparkSession, query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
var tables = new ListBuffer[String]()
var i: Int = 0
while (logicalPlan(i) != null) {
logicalPlan(i) match {
case t: UnresolvedRelation => tables += t.tableName
case _ =>
}
i += 1
}
tables.toList
}
def __sqlparse2table(self, query):
'''
#description: get table name from table
'''
plan = self.spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
plan_string = plan.toString().replace('`.`', '.')
unr = re.findall(r"UnresolvedRelation `(.*?)`", plan_string)
cte = re.findall(r"CTE \[(.*?)\]", plan.toString())
cte = [tt.strip() for tt in cte[0].split(',')] if cte else cte
schema = set()
tables = set()
for table_name in unr:
if table_name not in cte:
schema.update([table_name.split('.')[0]])
tables.update([table_name])
return schema, tables
Since you need to list all the columns names listed in table1 and table2, what you can do is to show tables in db.table_name in your hive db.
val tbl_column1 = sqlContext.sql("show tables in table1");
val tbl_column2 = sqlContext.sql("show tables in table2");
You will get list of columns in both the table.
tbl_column1.show
name
id
data
unix did the trick, grep 'INTO\|FROM\|JOIN' .sql | sed -r 's/.?(FROM|INTO|JOIN)\s?([^ ])./\2/g' | sort -u
grep 'overwrite table' .txt | sed -r 's/.?(overwrite table)\s?([^ ])./\2/g' | sort -u

How to make aggregations with slick

I want to force slick to create queries like
select max(price) from coffees where ...
But slick's documentation doesn't help
val q = Coffees.map(_.price) //this is query Query[Coffees.type, ...]
val q1 = q.min // this is Column[Option[Double]]
val q2 = q.max
val q3 = q.sum
val q4 = q.avg
Because those q1-q4 aren't queries, I can't get the results but can use them inside other queries.
This statement
for {
coffee <- Coffees
} yield coffee.price.max
generates right query but is deprecated (generates warning: " method max in class ColumnExtensionMethods is deprecated: Use Query.max instead").
How to generate such query without warnings?
Another issue is to aggregate with group by:
"select name, max(price) from coffees group by name"
Tried to solve it with
for {
coffee <- Coffees
} yield (coffee.name, coffee.price.max)).groupBy(x => x._1)
which generates
select x2.x3, x2.x3, x2.x4 from (select x5."COF_NAME" as x3, max(x5."PRICE") as x4 from "coffees" x5) x2 group by x2.x3
which causes obvious db error
column "x5.COF_NAME" must appear in the GROUP BY clause or be used in an aggregate function
How to generate such query?
As far as I can tell is the first one simply
Query(Coffees.map(_.price).max).first
And the second one
val maxQuery = Coffees
.groupBy { _.name }
.map { case (name, c) =>
name -> c.map(_.price).max
}
maxQuery.list
or
val maxQuery = for {
(name, c) <- Coffees groupBy (_.name)
} yield name -> c.map(_.price).max
maxQuery.list