combining slick queries into single query - scala

Using Slick 3.1, how do I combine multiple Queries into a single query for the same type? This is not a join or a union, but combining query "segments" to create a single query request. These "segments" can be any individually valid query.
val query = TableQuery[SomeThingValid]
// build up pieces of the query in various parts of the application logic
val q1 = query.filter(_.value > 10)
val q2 = query.filter(_.value < 40)
val q3 = query.sortBy(_.date.desc)
val q4 = query.take(5)
// how to combine these into a single query ?
val finalQ = ??? q1 q2 q3 q4 ???
// in order to run in a single request
val result = DB.connection.run(finalQ.result)
EDIT:
the expected sql should be something like:
SELECT * FROM "SomeThingValid" WHERE "SomeThingValid"."value" > 10 AND "SomeThingValid"."valid" < 40 ORDER BY "MemberFeedItem"."date" DESC LIMIT 5

val q1 = query.filter(_.value > 10)
val q2 = q1.filter(_.value < 40)
val q3 = q2.sortBy(_.date.desc)
val q4 = q3.take(5)
I think you should do something like the above (and pass around Querys) but if you insist on passing around query "segments", something like this could work:
type QuerySegment = Query[SomeThingValid, SomeThingValid, Seq] => Query[SomeThingValid, SomeThingValid, Seq]
val q1: QuerySegment = _.filter(_.value > 10)
val q2: QuerySegment = _.filter(_.value < 40)
val q3: QuerySegment = _.sortBy(_.date.desc)
val q4: QuerySegment = _.take(5)
val finalQ = Function.chain(Seq(q1, q2, q3, q4))(query)

I've used this "pattern" with slick2.0
val query = TableQuery[SomeThingValid]
val flag1, flag3 = false
val flag2, flag4 = true
val bottomFilteredQuery = if(flag1) query.filter(_.value > 10) else query
val topFilteredQuery = if(flag2) bottomFilteredQuery.filter(_.value < 40) else bottomFilteredQuery
val sortedQuery = if(flag3) topFilteredQuery.soryBy(_.date.desc) else topFilteredQuery
val finalQ = if(flag4) sortedQuery.take(5) else sortedQuery

It's Just a worth remark to mention here from the slick essential book, it seems that you might need to avoid combining multiple queries in one single query.
Combining actions to sequence queries is a powerful feature of Slick.
However, you may be able to reduce multiple queries into a single
database query. If you can do that, you’re probably better off doing
it.

I think, it should work. But I didn't test it yet.
val users = TableQuery[Users]
val filter1: Query[Users, User, Seq] = users.filter(condition1)
val filter2: Query[Users, User, Seq] = users.filter(condition2)
(filter1 ++ filter2).result.headOption

Related

Scala Slick combining Rep sub queries into one re

I sum up totals in two different database tables:
val sum1Query: Rep[Int] = tableQuery1.map(_.amount).sum.ifNull(0)
val sum2Query: Rep[Int] = tableQuery2.map(_.amount).sum.ifNull(0)
for {
sum1 <- sum1Query.result
sum2 <- sum2Query.result
} yield {
sum1 + sum2
}
This runs 2 SQL queries to the database each time .result is called. I am looking for a way to make it use only one SQL query.
Something like this doesn't work:
for {
sum1 <- sum1Query
sum2 <- sum2Query
} yield {
sum1 + sum2
}.result
Any ideas on how to do it in Slick other than using plain SQL query?
Each call to .result creates a DBIO action which is a SQL statement. The trick to reducing the number of actions is to find a way to combine two queries (or two Reps) together into one action.
In your case you could zip the two queries:
val sum1 = table1.map(_.amount).sum.ifNull(0)
val sum2 = table2.map(_.amount).sum.ifNull(0)
val query = sum1.zip(sum2)
When run the query.result you'll execute a single query something like:
select ifnull(x2.x3,0), ifnull(x4.x5,0)
from
(select sum("amount") as x3 from "table_1") x2,
(select sum("amount") as x5 from "table_2") x4
...which will result in a tuple of the two values.
However, as you've already just got a Rep[Int] you can use + in the database:
val query = sum1 + sum2
...which will be a query along the lines of:
select ifnull(x2.x3,0) + ifnull(x4.x5,0)
from
(select sum("amount") as x3 from "table_1") x2,
(select sum("amount") as x5 from "table_2") x4
Just put it like this:
(sum1 + sum2).result

How to get table names from SQL query?

I want to get all the tables names from a sql query in Spark using Scala.
Lets say user sends a SQL query which looks like:
select * from table_1 as a left join table_2 as b on a.id=b.id
I would like to get all tables list like table_1 and table_2.
Is regex the only option ?
Thanks a lot #Swapnil Chougule for the answer. That inspired me to offer an idiomatic way of collecting all the tables in a structured query.
scala> spark.version
res0: String = 2.3.1
def getTables(query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
logicalPlan.collect { case r: UnresolvedRelation => r.tableName }
}
val query = "select * from table_1 as a left join table_2 as b on a.id=b.id"
scala> getTables(query).foreach(println)
table_1
table_2
Hope it will help you
Parse the given query using spark sql parser (spark internally does same). You can get sqlParser from session's state. It will give Logical plan of query. Iterate over logical plan of query & check whether it is instance of UnresolvedRelation (leaf logical operator to represent a table reference in a logical query plan that has yet to be resolved) & get table from it.
def getTables(query: String) : Seq[String] ={
val logical : LogicalPlan = localsparkSession.sessionState.sqlParser.parsePlan(query)
val tables = scala.collection.mutable.LinkedHashSet.empty[String]
var i = 0
while (true) {
if (logical(i) == null) {
return tables.toSeq
} else if (logical(i).isInstanceOf[UnresolvedRelation]) {
val tableIdentifier = logical(i).asInstanceOf[UnresolvedRelation].tableIdentifier
tables += tableIdentifier.unquotedString.toLowerCase
}
i = i + 1
}
tables.toSeq
}
I had some complicated sql queries with nested queries and iterated on #Jacek Laskowski's answer to get this
def getTables(spark: SparkSession, query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
var tables = new ListBuffer[String]()
var i: Int = 0
while (logicalPlan(i) != null) {
logicalPlan(i) match {
case t: UnresolvedRelation => tables += t.tableName
case _ =>
}
i += 1
}
tables.toList
}
def __sqlparse2table(self, query):
'''
#description: get table name from table
'''
plan = self.spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
plan_string = plan.toString().replace('`.`', '.')
unr = re.findall(r"UnresolvedRelation `(.*?)`", plan_string)
cte = re.findall(r"CTE \[(.*?)\]", plan.toString())
cte = [tt.strip() for tt in cte[0].split(',')] if cte else cte
schema = set()
tables = set()
for table_name in unr:
if table_name not in cte:
schema.update([table_name.split('.')[0]])
tables.update([table_name])
return schema, tables
Since you need to list all the columns names listed in table1 and table2, what you can do is to show tables in db.table_name in your hive db.
val tbl_column1 = sqlContext.sql("show tables in table1");
val tbl_column2 = sqlContext.sql("show tables in table2");
You will get list of columns in both the table.
tbl_column1.show
name
id
data
unix did the trick, grep 'INTO\|FROM\|JOIN' .sql | sed -r 's/.?(FROM|INTO|JOIN)\s?([^ ])./\2/g' | sort -u
grep 'overwrite table' .txt | sed -r 's/.?(overwrite table)\s?([^ ])./\2/g' | sort -u

SPARK SQL: Implement AND condition inside a CASE statement

I am aware of how to implement a simple CASE-WHEN-THEN clause in SPARK SQL using Scala. I am using Version 1.6.2. But, I need to specify AND condition on multiple columns inside the CASE-WHEN clause. How to achieve this in SPARK using Scala ?
Thanks in advance for your time and help!
Here's the SQL query that I have:
select sd.standardizationId,
case when sd.numberOfShares = 0 and
isnull(sd.derivatives,0) = 0 and
sd.holdingTypeId not in (3,10)
then
8
else
holdingTypeId
end
as holdingTypeId
from sd;
First read table as dataframe
val table = sqlContext.table("sd")
Then select with expression. There align syntaxt according to your database.
val result = table.selectExpr("standardizationId","case when numberOfShares = 0 and isnull(derivatives,0) = 0 and holdingTypeId not in (3,10) then 8 else holdingTypeId end as holdingTypeId")
And show result
result.show
An alternative option, if it's wanted to avoid using the full string expression, is the following:
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions._
val sd = sqlContext.table("sd")
val conditionedColumn: Column = when(
(sd("numberOfShares") === 0) and
(coalesce(sd("derivatives"), lit(0)) === 0) and
(!sd("holdingTypeId").isin(Seq(3,10): _*)), 8
).otherwise(sd("holdingTypeId")).as("holdingTypeId")
val result = sd.select(sd("standardizationId"), conditionedColumn)

How to make aggregations with slick

I want to force slick to create queries like
select max(price) from coffees where ...
But slick's documentation doesn't help
val q = Coffees.map(_.price) //this is query Query[Coffees.type, ...]
val q1 = q.min // this is Column[Option[Double]]
val q2 = q.max
val q3 = q.sum
val q4 = q.avg
Because those q1-q4 aren't queries, I can't get the results but can use them inside other queries.
This statement
for {
coffee <- Coffees
} yield coffee.price.max
generates right query but is deprecated (generates warning: " method max in class ColumnExtensionMethods is deprecated: Use Query.max instead").
How to generate such query without warnings?
Another issue is to aggregate with group by:
"select name, max(price) from coffees group by name"
Tried to solve it with
for {
coffee <- Coffees
} yield (coffee.name, coffee.price.max)).groupBy(x => x._1)
which generates
select x2.x3, x2.x3, x2.x4 from (select x5."COF_NAME" as x3, max(x5."PRICE") as x4 from "coffees" x5) x2 group by x2.x3
which causes obvious db error
column "x5.COF_NAME" must appear in the GROUP BY clause or be used in an aggregate function
How to generate such query?
As far as I can tell is the first one simply
Query(Coffees.map(_.price).max).first
And the second one
val maxQuery = Coffees
.groupBy { _.name }
.map { case (name, c) =>
name -> c.map(_.price).max
}
maxQuery.list
or
val maxQuery = for {
(name, c) <- Coffees groupBy (_.name)
} yield name -> c.map(_.price).max
maxQuery.list

scalaquery retrieve values

I have few tables, lets say 2 for simplicity. I can create them in this way,
...
val tableA = new Table[(Int,Int)]("tableA"){
def a = column[Int]("a")
def b = column[Int]("b")
}
val tableB = new Table[(Int,Int)]("tableB"){
def a = column[Int]("a")
def b = column[Int]("b")
}
Im going to have a query to retrieve value 'a' from tableA and value 'a' from tableB as a list inside the results from 'a'
my result should be:
List[(a,List(b))]
so far i came upto this point in query,
def createSecondItr(b1:NamedColumn[Int]) = for(
b2 <- tableB if b1 === b1.b
) yield b2.a
val q1 = for (
a1 <- tableA
listB = createSecondItr(a1.b)
) yield (a1.a , listB)
i didn't test the code so there might be errors in the code. My problem is I cannot retrieve data from the results.
to understand the question, take trains and classes of it. you search the trains after 12pm and you need to have a result set where the train name and the classes which the train have as a list inside the train's result.
I don't think you can do this directly in ScalaQuery. What I would do is to do a normal join and then manipulate the result accordingly:
import scala.collection.mutable.{HashMap, Set, MultiMap}
def list2multimap[A, B](list: List[(A, B)]) =
list.foldLeft(new HashMap[A, Set[B]] with MultiMap[A, B]){(acc, pair) => acc.addBinding(pair._1, pair._2)}
val q = for (
a <- tableA
b <- tableB
if (a.b === b.b)
) yield (a.a, b.a)
list2multimap(q.list)
The list2multimap is taken from https://stackoverflow.com/a/7210191/66686
The code is written without assistance of an IDE, compiler or similar. Consider the debugging free training :-)