I would expect to always receive a resultset with one row on a SELECT COUNT, but results.next() always returns false. This is on HSQLDB 2.5.1.
The code below prints:
number of columns: 1. First column C1 with type INTEGER
No COUNT results
statement = connection.createStatement();
// check if table empty
statement.executeQuery("SELECT COUNT(*) FROM mytable");
ResultSet results = statement.getResultSet();
System.out.println("number of columns: " + results.getMetaData().getColumnCount() + ". First column " +results.getMetaData().getColumnName(1) + " with type " +results.getMetaData().getColumnTypeName(1) );
int numberOfRows = 0;
boolean hasResults = results.next();
if (hasResults){
numberOfRows = results.getInt(1);
System.out.println("Table size " + numberOfRows );
}else{
System.out.println("No COUNT results" );
}
statement.close();
Executing the same SQL statement in my SQL console works fine:
C1
104
Other JDBC actions on this database work fine as well.
Is there something obvious I'm missing?
The getResultSet method is applicable to execute, but not executeQuery which returns a ResultSet. That is the one you need to refer to, at the moment you are losing it as you are not assigning it to anything.
See https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeQuery(java.lang.String) and https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getResultSet()
ResultSet results = statement.executeQuery("SELECT COUNT(*) FROM mytable");
again, I'm struggeling with Linq or EF6.
I have a table with columns (DateTime)[BeginOfWork] and (DateTime)[EndOfWork].
Now, I need to substract the BeginOfWork-value from the EndOfWork-value. From this result, I need to build a total sum.
In SQL, it looks like this:
SELECT Timesheets.EmployeeId, Sum(DateDiff("h",[Timesheets].[BeginOfWork],[Timesheets].[EndOfWork])) AS HoursOfWork
FROM Timesheets
GROUP BY Timesheets.EmployeeId
HAVING (((Timesheets.EmployeeId)=1));
How do I do this in EF6?
Actually, not working some variations of that:
int employeeId = 1;
var timesheetSum = (from ts in db.Timesheets
where (ts.EmployeeId == employeeId)
select new
{
hoursOfWork = DbFunctions.DiffHours(ts.BeginOfWork, ts.EndOfWork)
}).Sum(ts => ts.hoursOfWork);
The above results in integer rounded hoursOfWork, that don't include minutes.
So I tried that:
var timesheetSum = (from ts in db.Timesheets
where (ts.EmployeeId == employeeId)
select new
{
hoursOfWork = DbFunctions.DiffMinutes(ts.BeginOfWork, ts.EndOfWork) / 60
}).Sum(ts => ts.hoursOfWork);
But here, hoursOfWork seems to be rounded (integer), so 2,5 hours will result in 2. Perhaps a conversion of the result would work but I don't get it run.
`(double)hoursOfWork` results in an error.
Perhaps someone has a link to a complete guide, how to convert SQL to Linq-queries.
SOLUTION
var timesheetSum = (from ts in db.Timesheets
where (ts.EmployeeId == employeeId && ts.TimesheetDate <= DateTime.Today)
select new
{
hoursOfWork = DbFunctions.DiffMinutes(ts.BeginOfWork, ts.EndOfWork)/60m
}).Sum(ts => ts.hoursOfWork);
decimal result = 0;
if(timesheetSum!=null)
result = (decimal)timesheetSum;
Thanks a lot
Carsten
I want to get all the tables names from a sql query in Spark using Scala.
Lets say user sends a SQL query which looks like:
select * from table_1 as a left join table_2 as b on a.id=b.id
I would like to get all tables list like table_1 and table_2.
Is regex the only option ?
Thanks a lot #Swapnil Chougule for the answer. That inspired me to offer an idiomatic way of collecting all the tables in a structured query.
scala> spark.version
res0: String = 2.3.1
def getTables(query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
logicalPlan.collect { case r: UnresolvedRelation => r.tableName }
}
val query = "select * from table_1 as a left join table_2 as b on a.id=b.id"
scala> getTables(query).foreach(println)
table_1
table_2
Hope it will help you
Parse the given query using spark sql parser (spark internally does same). You can get sqlParser from session's state. It will give Logical plan of query. Iterate over logical plan of query & check whether it is instance of UnresolvedRelation (leaf logical operator to represent a table reference in a logical query plan that has yet to be resolved) & get table from it.
def getTables(query: String) : Seq[String] ={
val logical : LogicalPlan = localsparkSession.sessionState.sqlParser.parsePlan(query)
val tables = scala.collection.mutable.LinkedHashSet.empty[String]
var i = 0
while (true) {
if (logical(i) == null) {
return tables.toSeq
} else if (logical(i).isInstanceOf[UnresolvedRelation]) {
val tableIdentifier = logical(i).asInstanceOf[UnresolvedRelation].tableIdentifier
tables += tableIdentifier.unquotedString.toLowerCase
}
i = i + 1
}
tables.toSeq
}
I had some complicated sql queries with nested queries and iterated on #Jacek Laskowski's answer to get this
def getTables(spark: SparkSession, query: String): Seq[String] = {
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
var tables = new ListBuffer[String]()
var i: Int = 0
while (logicalPlan(i) != null) {
logicalPlan(i) match {
case t: UnresolvedRelation => tables += t.tableName
case _ =>
}
i += 1
}
tables.toList
}
def __sqlparse2table(self, query):
'''
#description: get table name from table
'''
plan = self.spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
plan_string = plan.toString().replace('`.`', '.')
unr = re.findall(r"UnresolvedRelation `(.*?)`", plan_string)
cte = re.findall(r"CTE \[(.*?)\]", plan.toString())
cte = [tt.strip() for tt in cte[0].split(',')] if cte else cte
schema = set()
tables = set()
for table_name in unr:
if table_name not in cte:
schema.update([table_name.split('.')[0]])
tables.update([table_name])
return schema, tables
Since you need to list all the columns names listed in table1 and table2, what you can do is to show tables in db.table_name in your hive db.
val tbl_column1 = sqlContext.sql("show tables in table1");
val tbl_column2 = sqlContext.sql("show tables in table2");
You will get list of columns in both the table.
tbl_column1.show
name
id
data
unix did the trick, grep 'INTO\|FROM\|JOIN' .sql | sed -r 's/.?(FROM|INTO|JOIN)\s?([^ ])./\2/g' | sort -u
grep 'overwrite table' .txt | sed -r 's/.?(overwrite table)\s?([^ ])./\2/g' | sort -u
I'm trying to select multiple matches from a field, I have the query working as SQL but am at a loss when it comes to the jOOQ version. Here's the SQL:
SELECT
array_to_string(array(select array_to_string(regexp_matches(m.content, '#([a-zA-Z0-9]+)', 'g'), ' ')), ' ') hashtags
FROM tweets
But I can't work out how to pass in the 'g' flag to regexp_matches, and Postgres doesn't support g: style embedded flags. At the moment I'm using (in Scala):
val hashtags = DSL.field(
"array_to_string(array(select array_to_string(regexp_matches(tweets.content, '#([a-zA-Z0-9]+)', 'g'), ' ')), ' ')",
classOf[String])
but that seems kind of gross (but it works, so there's that! 😀).
Current Approach
I have an enum with fields associated with it, like this:
MentionUri(MENTIONS.URL),
Content(MENTIONS.CONTENT),
Hashtags(DSL.field(
"array(select array_to_string(regexp_matches(mentions.content, '#([a-zA-Z0-9]+)', 'g'), ' '))").as("hashtags")),
MediaLinks(DSL.field("json_agg(twitter_media)").as("media_links")),
Location(MENTIONS.LOCATION),
the 'g' flag is needed because I want to get all of the matching fragments from the target text (all of the hashtags from the mention content in this case, where mentions are Tweets, Facebook posts, &c.)
and a class with a bunch of optional search parameters, like this:
minFollowing: Option[Int] = None,
maxFollowing: Option[Int] = None,
workflowStates: Set[WorkflowState] = WorkflowState.values.toSet,
onlyVerifiedUsers: Boolean = false,
and then I build up the query based on a list of these fields
private val fields = v.fields.toSet
override def query(sql: DSLContext): Query = {
val fields = v.fields.map(_.field) ++ Seq(M.PROJECT_ID)
var query = (sql
select fields.toSet.asJava
from M
leftJoin MTM on (M.PROJECT_ID === MTM.PROJECT_ID) and (M.ID === MTM.MENTION_ID)
leftJoin TM on (MTM.URL === TM.URL)
where (M.PROJECT_ID in v.criteria.projectIds.asJava))
if (v.criteria.channels.size < numChannels)
query = query and (M.CHANNEL in v.criteria.channels.asJava)
if (v.criteria.mentionTypes.size < numMentionTypes)
query = query and (M.MENTION_TYPE in v.criteria.mentionTypes.asJava)
// ... more critera get added here, finally ...
if (v.max.isDefined)
query groupBy (M.PROJECT_ID, M.ID, M.CHANNEL, M.USERNAME) limit v.max.get
else
query groupBy (M.PROJECT_ID, M.ID, M.CHANNEL, M.USERNAME)
I am currently facing a weird problem.
Whenever a user types something into the search bar that starts with an 's', the request crashes.
What you see next is a sample sql code generated by the search engine I programmed for this project.
SELECT Profiles.ProfileID,Profiles.Nickname,Profiles.Email,Profiles.Status,Profiles.Role,Profiles.Credits, Profiles.Language,Profiles.Created,Profiles.Modified,Profiles.Cover,Profiles.Prename, Profiles.Lastname,Profiles.BirthDate,Profiles.Country,Profiles.City,Profiles.Phone,Profiles.Website, Profiles.Description, Profiles.Affair,Scores.AvgScore, coalesce(Scores.NumScore, 0) AS NumScore, coalesce(Scores.NumScorer, 0) AS NumScorer, (
(SELECT count(*)
FROM Likes
JOIN Comments using(CommentID)
WHERE Comments.ProfileID = Profiles.ProfileID)) NumLikes, (
(SELECT count(*)
FROM Likes
JOIN Comments using(CommentID)
WHERE Comments.ProfileID = Profiles.ProfileID) /
(SELECT coalesce(nullif(count(*), 0), 1)
FROM Comments
WHERE Comments.ProfileID = Profiles.ProfileID)) AvgLikes, Movies.MovieID, Movies.Caption, Movies.Description, Movies.Language, Movies.Country, Movies.City, Movies.Kind, Movies.Integration,
(SELECT cast(least(25 + 5.000000 * round((75 * ((0.500000 * SIZE/1024.0/1024.0 * 0.001250) + (0.500000 * Duration/60.0 * 0.050000))) / 5.000000), 100) AS signed int)
FROM Streams
WHERE MovieID = Movies.MovieID
AND Tag = "main"
AND ENCODING = "mp4") AS ChargeMain,
(SELECT cast(least(25 + 10.000000 * round((75 * ((0.200000 * SIZE/1024.0/1024.0 * 0.001000) + (0.800000 * Duration/60.0 * 0.016667))) / 10.000000), 100) AS signed int)
FROM Streams
WHERE MovieID = Movies.MovieID
AND Tag = "notes"
AND ENCODING = "mp4") AS ChargeNotes,
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "main") AS MainViews,
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "notes") AS NotesViews,
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "trailer") AS TrailerViews,
(SELECT coalesce(greatest(
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "trailer"),
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "main")), 0)) AS MaxMainTrailerViews,
(SELECT avg(Score)
FROM Scores
WHERE Scores.MovieID = Movies.MovieID) AS Score,
(SELECT coalesce(group_concat(cast(Score AS signed int)), "")
FROM Scores
WHERE Scores.MovieID = Movies.MovieID) AS Scores, Movies.Cover, Movies.Locked, Movies.Created, Movies.Modified,
(SELECT coalesce(group_concat(name separator ','),"")
FROM Tags
JOIN TagLinks using(TagID)
WHERE TagLinks.MovieID = Movies.MovieID
ORDER BY name ASC) AS Tags,
(SELECT count(*)
FROM Purchases
WHERE MovieID = Movies.MovieID
AND ProfileID = %s
AND TYPE = "main") AS PurchasedMain,
(SELECT count(*)
FROM Purchases
WHERE MovieID = Movies.MovieID
AND ProfileID = %s
AND TYPE = "notes") AS PurchasedNotes,
(SELECT count(*)
FROM Watchlist
WHERE MovieID = Movies.MovieID
AND ProfileID = %s) AS Watchlist,
(SELECT count(*)
FROM Scores
WHERE MovieID = Movies.MovieID
AND ProfileID = %s) AS Rated,
(SELECT count(*)
FROM Comments
WHERE MovieID = Movies.MovieID
AND Deleted IS NULL) AS Comments,
(SELECT sum(Duration)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Tag IN ("main",
"notes")
AND Streams.ENCODING = "mp4") AS Runtime,
(SELECT cast(count(*) AS signed int)
FROM Movies
JOIN Profiles ON Profiles.ProfileID = Movies.ProfileID
WHERE ((Movies.Locked = 0
AND
(SELECT count(*)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Status <> "ready") = 0
AND Profiles.Status = "active")
OR (%s = 1)
OR (Movies.ProfileID = %s))
AS Movies,
(SELECT cast(ceil(count(*) / %s) AS signed int)
FROM Movies
JOIN Profiles using(ProfileID)
WHERE ((Movies.Locked = 0
AND
(SELECT count(*)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Status <> "ready") = 0
AND Profiles.Status = "active")
OR (%s = 1)
OR (Movies.ProfileID = %s))
AS Pages
FROM Movies
JOIN Profiles using(ProfileID)
LEFT JOIN
(SELECT Movies.ProfileID AS ProfileID,
avg(Scores.Score) AS AvgScore,
count(*) AS NumScore,
count(DISTINCT Scores.ProfileID) AS NumScorer
FROM Scores
JOIN Movies using(MovieID)
GROUP BY Movies.ProfileID) AS Scores using(ProfileID)
WHERE ((Movies.Locked = 0
AND
(SELECT count(*)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Status <> "ready") = 0
AND Profiles.Status = "active")
OR (%s = 1)
OR (Movies.ProfileID = %s))
ORDER BY Score DESC LIMIT %s,
%s
After countless hours of investigating and comparing possible user inputs with the generated sql code I finally nailed the problem down to some really strange behaviour of the JDBC driver which I consider a serious bug - yet I am not sure:
I spent another few hours trying to reproduce the problem with as less sql code as possible and ended up with the following:
SQL("""select * from Movies where "s" like "%s" and MovieID = {a} """)
.on('a -> 1).as(scalar[Long]*)
[SQLException: Parameter index out of range (1 > number of parameters, which is 0).]
SQL("""select * from Movies where "s" like "%samuel" and MovieID = {a} """)
.on('a -> 1).as(scalar[Long]*)
[SQLException: Parameter index out of range (1 > number of parameters, which is 0).]
SQL("""select * from Movies where "s" like "%flower" and MovieID = {a} """)
.on('a -> 1).as(scalar[Long]*)
[OK]
SQL("""select * from Movies where "s" like "%samuel" and MovieID = 1 """)
.on('a -> 1).as(scalar[Long]*)
[OK]
SQL("""select * from Movies where "s" like "%s" and MovieID = "{a}" """)
.on('a -> 1).as(scalar[Long]*)
[OK]
SQL("""select * from Movies where MovieID = {a} and "s" like "%s" """)
.on('a -> 1).as(scalar[Long]*)
[OK]
I believe to see a pattern here:
Under the exact condition that there is a %s sequence (quoted or unquoted) anywhere in a sql code, followed by a non quoted named parameter with arbitrary name and arbitrary distance
to the %s sequence, jdbc (or anorm) crashes. The crash seems to occur in JDBC, however its also possible that Anorm submits invalid values to JDBC.
Do you guys have any suggestions?
I think I found an enduring solution for the problem meanwhile. Since my sql generator needs to stay very flexible I somehow need a way to pass along sql fragments with their corresponding parameters without evaluating them right away. Instead the generator must be able to assemble and compose various sql fragments into bigger fragments at any time - just as he does now - but now with the acompanying, not yet evaluated parameters. I came up with this prototype:
DB.withConnection("betterdating") { implicit connection =>
case class SqlFragment(Fragment: String, Args: NamedParameter*)
val aa = SqlFragment("select MovieID from Movies")
val bb = SqlFragment("join Profiles using(ProfileID)")
val cc = SqlFragment("where Caption like \"%{a}\" and MovieID = {b}", 'a -> "s", 'b -> 5)
// combine all fragments
val v1 = SQL(Seq(aa, bb, cc).map(_.Fragment).mkString(" "))
.on((aa.Args ++ bb.Args ++ cc.Args): _*)
// better solution
val v2 = Seq(aa, bb, cc).unzip(frag => (frag.Fragment, frag.Args)) match {
case (frags, args) => SQL(frags.mkString(" ")).on(args.flatten: _*)
}
// works
println(v1.as(scalar[Long].singleOpt))
println(v2.as(scalar[Long].singleOpt))
}
It seems to work great! :-)
I then rewrote the last part of the freetext filter as follow:
// finally transform the expression
// list a single sql fragment
expressions.zipWithIndex.map { case (expr, index) =>
s"""
(concat(Movies.Caption, " ", Movies.Description, " ", Movies.Kind, " ", Profiles.Nickname, " ",
(select coalesce(group_concat(Tags.Name), "") from Tags join TagLinks using (TagID)
where TagLinks.MovieID = Movies.MovieID)) like "%{expr$index}%"))
""" -> (s"expr$index" -> expr)
}.unzip match { case (frags, args) => SqlFragment(frags.mkString(" and "), args.flatten: _*)
What do you think?
This is how it is being implemented right now:
/**
* This private helper method transforms a content filter string into an sql expression
* for searching within movies, owners and kinds and tags.
* #author Samuel Lörtscher
*/
private def contentFilterToSql(value: String) = {
// trim and clean and the parametric value from any possible anomalies
// (those include strange spacing and non closed quotes)
val cleaned = value.trim match {
case trimmed if trimmed.count(_ == '"') % 2 != 0 =>
if (trimmed.last == '"') trimmed.dropRight(1).trim
else trimmed + '"'
case trimmed =>
trimmed
};
// transform the cleaned value into a list of expressions
// (words between quotes are considered being one expression)
// empty expressions between quotes are being removed
// expressions will contain no quotes as they are being stripped during evaluation -
// thus counter measures for sql injection should be obsolete
// (we put an empty space at the end because it makes the lexer algorithm much
// more efficient as it will not need to check for end of file in every iteration)
val expressions = (cleaned + " ").foldLeft((List[String](), "", false)) { case ((list, expr, quoted), char) =>
// perform the lexer operation for the current character
if (char == ' ' && !quoted) (expr :: list, "", false)
else if (char == '"') (expr :: list, "", !quoted)
else (list, expr + char, quoted)
}._1.filter(_.nonEmpty).map(_.trim)
// finally transform the expression
// list into a variable length sql condition statement
expressions.map { expr =>
s"""
(concat(Movies.Caption, " ", Movies.Description, " ", Movies.Kind, " ", Profiles.Nickname, " ",
(select coalesce(group_concat(Tags.Name), "")
from Tags join TagLinks using (TagID) where TagLinks.MovieID = Movies.MovieID)) like "%$expr%")
"""
}.mkString(" and ")
}
Since the number of search expressions is variable, I cannot use Anorm arguments here. :-/
I found a simple solution now, but I am not exactly happy being forced to apply such crappy hacks.
Since putting a %s character sequence seems to trigger the bug, I was looking for possibilities to submit the same semantical outcome without directly passing this character sequence. I finally ended up replacing like "%$expr%" by like concat("%", "$expr%"). Since concat is being evaluated by the MySql Server engine BEFORE "like", he will put the original pattern back together before processing it by "like" - and without the sequence %s ever being transmitted through the anorm, jdbc data processors.
// finally transform the expression
// list into a variable length sql condition statement
// (freaking concat("%", "$expr%")) is required due to a freaking bug in either anorm or JDBC
// which results into a crash when %s is anyway submitted)
expressions.map { expr =>
s"""
(concat(Movies.Caption, " ", Movies.Description, " ", Movies.Kind, " ", Profiles.Nickname, " ",
(select coalesce(group_concat(Tags.Name), "")
from Tags join TagLinks using (TagID) where TagLinks.MovieID = Movies.MovieID)) like concat("%", "$expr%"))
"""
}.mkString(" and ")