Possible Bug in JDBC? - scala

I am currently facing a weird problem.
Whenever a user types something into the search bar that starts with an 's', the request crashes.
What you see next is a sample sql code generated by the search engine I programmed for this project.
SELECT Profiles.ProfileID,Profiles.Nickname,Profiles.Email,Profiles.Status,Profiles.Role,Profiles.Credits, Profiles.Language,Profiles.Created,Profiles.Modified,Profiles.Cover,Profiles.Prename, Profiles.Lastname,Profiles.BirthDate,Profiles.Country,Profiles.City,Profiles.Phone,Profiles.Website, Profiles.Description, Profiles.Affair,Scores.AvgScore, coalesce(Scores.NumScore, 0) AS NumScore, coalesce(Scores.NumScorer, 0) AS NumScorer, (
(SELECT count(*)
FROM Likes
JOIN Comments using(CommentID)
WHERE Comments.ProfileID = Profiles.ProfileID)) NumLikes, (
(SELECT count(*)
FROM Likes
JOIN Comments using(CommentID)
WHERE Comments.ProfileID = Profiles.ProfileID) /
(SELECT coalesce(nullif(count(*), 0), 1)
FROM Comments
WHERE Comments.ProfileID = Profiles.ProfileID)) AvgLikes, Movies.MovieID, Movies.Caption, Movies.Description, Movies.Language, Movies.Country, Movies.City, Movies.Kind, Movies.Integration,
(SELECT cast(least(25 + 5.000000 * round((75 * ((0.500000 * SIZE/1024.0/1024.0 * 0.001250) + (0.500000 * Duration/60.0 * 0.050000))) / 5.000000), 100) AS signed int)
FROM Streams
WHERE MovieID = Movies.MovieID
AND Tag = "main"
AND ENCODING = "mp4") AS ChargeMain,
(SELECT cast(least(25 + 10.000000 * round((75 * ((0.200000 * SIZE/1024.0/1024.0 * 0.001000) + (0.800000 * Duration/60.0 * 0.016667))) / 10.000000), 100) AS signed int)
FROM Streams
WHERE MovieID = Movies.MovieID
AND Tag = "notes"
AND ENCODING = "mp4") AS ChargeNotes,
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "main") AS MainViews,
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "notes") AS NotesViews,
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "trailer") AS TrailerViews,
(SELECT coalesce(greatest(
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "trailer"),
(SELECT coalesce(count(*), 0)
FROM Views
WHERE Views.MovieID = Movies.MovieID
AND Tag = "main")), 0)) AS MaxMainTrailerViews,
(SELECT avg(Score)
FROM Scores
WHERE Scores.MovieID = Movies.MovieID) AS Score,
(SELECT coalesce(group_concat(cast(Score AS signed int)), "")
FROM Scores
WHERE Scores.MovieID = Movies.MovieID) AS Scores, Movies.Cover, Movies.Locked, Movies.Created, Movies.Modified,
(SELECT coalesce(group_concat(name separator ','),"")
FROM Tags
JOIN TagLinks using(TagID)
WHERE TagLinks.MovieID = Movies.MovieID
ORDER BY name ASC) AS Tags,
(SELECT count(*)
FROM Purchases
WHERE MovieID = Movies.MovieID
AND ProfileID = %s
AND TYPE = "main") AS PurchasedMain,
(SELECT count(*)
FROM Purchases
WHERE MovieID = Movies.MovieID
AND ProfileID = %s
AND TYPE = "notes") AS PurchasedNotes,
(SELECT count(*)
FROM Watchlist
WHERE MovieID = Movies.MovieID
AND ProfileID = %s) AS Watchlist,
(SELECT count(*)
FROM Scores
WHERE MovieID = Movies.MovieID
AND ProfileID = %s) AS Rated,
(SELECT count(*)
FROM Comments
WHERE MovieID = Movies.MovieID
AND Deleted IS NULL) AS Comments,
(SELECT sum(Duration)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Tag IN ("main",
"notes")
AND Streams.ENCODING = "mp4") AS Runtime,
(SELECT cast(count(*) AS signed int)
FROM Movies
JOIN Profiles ON Profiles.ProfileID = Movies.ProfileID
WHERE ((Movies.Locked = 0
AND
(SELECT count(*)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Status <> "ready") = 0
AND Profiles.Status = "active")
OR (%s = 1)
OR (Movies.ProfileID = %s))
AS Movies,
(SELECT cast(ceil(count(*) / %s) AS signed int)
FROM Movies
JOIN Profiles using(ProfileID)
WHERE ((Movies.Locked = 0
AND
(SELECT count(*)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Status <> "ready") = 0
AND Profiles.Status = "active")
OR (%s = 1)
OR (Movies.ProfileID = %s))
AS Pages
FROM Movies
JOIN Profiles using(ProfileID)
LEFT JOIN
(SELECT Movies.ProfileID AS ProfileID,
avg(Scores.Score) AS AvgScore,
count(*) AS NumScore,
count(DISTINCT Scores.ProfileID) AS NumScorer
FROM Scores
JOIN Movies using(MovieID)
GROUP BY Movies.ProfileID) AS Scores using(ProfileID)
WHERE ((Movies.Locked = 0
AND
(SELECT count(*)
FROM Streams
WHERE Streams.MovieID = Movies.MovieID
AND Streams.Status <> "ready") = 0
AND Profiles.Status = "active")
OR (%s = 1)
OR (Movies.ProfileID = %s))
ORDER BY Score DESC LIMIT %s,
%s
After countless hours of investigating and comparing possible user inputs with the generated sql code I finally nailed the problem down to some really strange behaviour of the JDBC driver which I consider a serious bug - yet I am not sure:
I spent another few hours trying to reproduce the problem with as less sql code as possible and ended up with the following:
SQL("""select * from Movies where "s" like "%s" and MovieID = {a} """)
.on('a -> 1).as(scalar[Long]*)
[SQLException: Parameter index out of range (1 > number of parameters, which is 0).]
SQL("""select * from Movies where "s" like "%samuel" and MovieID = {a} """)
.on('a -> 1).as(scalar[Long]*)
[SQLException: Parameter index out of range (1 > number of parameters, which is 0).]
SQL("""select * from Movies where "s" like "%flower" and MovieID = {a} """)
.on('a -> 1).as(scalar[Long]*)
[OK]
SQL("""select * from Movies where "s" like "%samuel" and MovieID = 1 """)
.on('a -> 1).as(scalar[Long]*)
[OK]
SQL("""select * from Movies where "s" like "%s" and MovieID = "{a}" """)
.on('a -> 1).as(scalar[Long]*)
[OK]
SQL("""select * from Movies where MovieID = {a} and "s" like "%s" """)
.on('a -> 1).as(scalar[Long]*)
[OK]
I believe to see a pattern here:
Under the exact condition that there is a %s sequence (quoted or unquoted) anywhere in a sql code, followed by a non quoted named parameter with arbitrary name and arbitrary distance
to the %s sequence, jdbc (or anorm) crashes. The crash seems to occur in JDBC, however its also possible that Anorm submits invalid values to JDBC.
Do you guys have any suggestions?

I think I found an enduring solution for the problem meanwhile. Since my sql generator needs to stay very flexible I somehow need a way to pass along sql fragments with their corresponding parameters without evaluating them right away. Instead the generator must be able to assemble and compose various sql fragments into bigger fragments at any time - just as he does now - but now with the acompanying, not yet evaluated parameters. I came up with this prototype:
DB.withConnection("betterdating") { implicit connection =>
case class SqlFragment(Fragment: String, Args: NamedParameter*)
val aa = SqlFragment("select MovieID from Movies")
val bb = SqlFragment("join Profiles using(ProfileID)")
val cc = SqlFragment("where Caption like \"%{a}\" and MovieID = {b}", 'a -> "s", 'b -> 5)
// combine all fragments
val v1 = SQL(Seq(aa, bb, cc).map(_.Fragment).mkString(" "))
.on((aa.Args ++ bb.Args ++ cc.Args): _*)
// better solution
val v2 = Seq(aa, bb, cc).unzip(frag => (frag.Fragment, frag.Args)) match {
case (frags, args) => SQL(frags.mkString(" ")).on(args.flatten: _*)
}
// works
println(v1.as(scalar[Long].singleOpt))
println(v2.as(scalar[Long].singleOpt))
}
It seems to work great! :-)
I then rewrote the last part of the freetext filter as follow:
// finally transform the expression
// list a single sql fragment
expressions.zipWithIndex.map { case (expr, index) =>
s"""
(concat(Movies.Caption, " ", Movies.Description, " ", Movies.Kind, " ", Profiles.Nickname, " ",
(select coalesce(group_concat(Tags.Name), "") from Tags join TagLinks using (TagID)
where TagLinks.MovieID = Movies.MovieID)) like "%{expr$index}%"))
""" -> (s"expr$index" -> expr)
}.unzip match { case (frags, args) => SqlFragment(frags.mkString(" and "), args.flatten: _*)
What do you think?

This is how it is being implemented right now:
/**
* This private helper method transforms a content filter string into an sql expression
* for searching within movies, owners and kinds and tags.
* #author Samuel Lörtscher
*/
private def contentFilterToSql(value: String) = {
// trim and clean and the parametric value from any possible anomalies
// (those include strange spacing and non closed quotes)
val cleaned = value.trim match {
case trimmed if trimmed.count(_ == '"') % 2 != 0 =>
if (trimmed.last == '"') trimmed.dropRight(1).trim
else trimmed + '"'
case trimmed =>
trimmed
};
// transform the cleaned value into a list of expressions
// (words between quotes are considered being one expression)
// empty expressions between quotes are being removed
// expressions will contain no quotes as they are being stripped during evaluation -
// thus counter measures for sql injection should be obsolete
// (we put an empty space at the end because it makes the lexer algorithm much
// more efficient as it will not need to check for end of file in every iteration)
val expressions = (cleaned + " ").foldLeft((List[String](), "", false)) { case ((list, expr, quoted), char) =>
// perform the lexer operation for the current character
if (char == ' ' && !quoted) (expr :: list, "", false)
else if (char == '"') (expr :: list, "", !quoted)
else (list, expr + char, quoted)
}._1.filter(_.nonEmpty).map(_.trim)
// finally transform the expression
// list into a variable length sql condition statement
expressions.map { expr =>
s"""
(concat(Movies.Caption, " ", Movies.Description, " ", Movies.Kind, " ", Profiles.Nickname, " ",
(select coalesce(group_concat(Tags.Name), "")
from Tags join TagLinks using (TagID) where TagLinks.MovieID = Movies.MovieID)) like "%$expr%")
"""
}.mkString(" and ")
}
Since the number of search expressions is variable, I cannot use Anorm arguments here. :-/
I found a simple solution now, but I am not exactly happy being forced to apply such crappy hacks.
Since putting a %s character sequence seems to trigger the bug, I was looking for possibilities to submit the same semantical outcome without directly passing this character sequence. I finally ended up replacing like "%$expr%" by like concat("%", "$expr%"). Since concat is being evaluated by the MySql Server engine BEFORE "like", he will put the original pattern back together before processing it by "like" - and without the sequence %s ever being transmitted through the anorm, jdbc data processors.
// finally transform the expression
// list into a variable length sql condition statement
// (freaking concat("%", "$expr%")) is required due to a freaking bug in either anorm or JDBC
// which results into a crash when %s is anyway submitted)
expressions.map { expr =>
s"""
(concat(Movies.Caption, " ", Movies.Description, " ", Movies.Kind, " ", Profiles.Nickname, " ",
(select coalesce(group_concat(Tags.Name), "")
from Tags join TagLinks using (TagID) where TagLinks.MovieID = Movies.MovieID)) like concat("%", "$expr%"))
"""
}.mkString(" and ")

Related

How do I access the VALUE using CosmosQueryableExtensions

EFCore Cosmos provider does not implement subquery yet and so I have implemented the query using the following FromRawSql as per this post:
SqlParameter userMasterGuidParam = new("userMasterGuid", userMasterGuid);
SqlParameter statusNewParam = new("statusNew", CaseStatusGuids.New);
SqlParameter statusInProgressParam = new("statusInProgress", CaseStatusGuids.InProgress);
SqlParameter statusOnHoldParam = new("statusOnHold", CaseStatusGuids.OnHold);
const string TICKET_SQL =
#"SELECT * FROM c " +
"WHERE c.StatusGuid IN (#statusNewParam, #statusInProgress, #statusOnHold) " +
"AND EXISTS ( " +
"SELECT VALUE n FROM n IN c.caseservicepartner_collection " +
"WHERE n.PartnerAssignedUserGuid = #userMasterGuid) ";
// Use CosmosQueryableExtensions instead of _context.Cases.FromSqlRaw to avoid ambiguous namespace.
// https://github.com/dotnet/efcore/issues/26502
return CosmosQueryableExtensions
.FromSqlRaw(_contextCosmos.Tickets, TICKET_SQL, statusNewParam, statusInProgressParam, statusOnHoldParam, userMasterGuidParam)
.OrderByDescending(t => t.CreatedDateTime)
.ToListAsync();
When I execute this query in the Cosmos Data Explorer I get a valid result - an array of items.
SELECT * FROM c WHERE c.StatusGuid IN ('63295b5e-de34-4555-b736-408dae18aaa0', '55d05dde-6b71-475f-8ee5-5549e2187423', 'e5267754-d416-4d1f-b42f-700dc5bb13d3') AND EXISTS ( SELECT VALUE n FROM n IN c.caseservicepartner_collection WHERE n.PartnerAssignedUserGuid = 'f3e9dd05-c580-4390-8998-61ce915d2da3')
[
{
"CreatedDateTime": "2022-08-17T08:22:54.017000+00:00",
"CaseNumber": 111,
"AssignedTeamGuid": null,
"TicketTypeGuid": "18ba2bba-557f-4bbd-9b45-029194761980",
...
},
{
...
}
]
However, when I execute this using EFCore, it returns no data. Looking at the EFCore log, it seems to wrap this query in an outer select, as follows:
-- EFCore adds this
SELECT c
FROM (
-- My Query
SELECT * FROM c WHERE c.StatusGuid IN (#statusNewParam, #statusInProgress, #statusOnHold) AND EXISTS ( SELECT VALUE n FROM n IN c.caseservicepartner_collection WHERE n.PartnerAssignedUserGuid = #userMasterGuid)
) c
...which when I plug into the Data Explorer, returns a nested structure like this:
[
{
"c": {
"CreatedDateTime": "2022-08-17T08:22:54.017000+00:00",
"CaseNumber": 111,
"AssignedTeamGuid": null,
"TicketTypeGuid": "18ba2bba-557f-4bbd-9b45-029194761980",
...
}
},
]
I suspect this is why the data is not being returned, perhaps due to a type mismatch.
Is there a way to fix this so the array is returned at the root, rather than nested within the c value?
Thanks
UPDATE
I removed the SqlParameters and instead used the string format-like option to pass parameters. That sorted out my issue and date is being returned now.
string TICKET_SQL =
"SELECT * FROM c " +
"WHERE c.StatusGuid IN ({0}, {1}, {2}) " +
"AND EXISTS (SELECT VALUE n FROM n IN c.caseservicepartner_collection WHERE n.PartnerAssignedUserGuid = {3})";
return CosmosQueryableExtensions
.FromSqlRaw(contextCosmos.Tickets, TICKET_SQL, CaseStatusGuids.New, CaseStatusGuids.InProgress, CaseStatusGuids.OnHold, userMasterGuid)
.OrderByDescending(t => t.CreatedDateTime);
.ToList();

converting sql statement back to lambda expression

I have the query below, and its sql code. It's running really slow, so it was re written in sql, now I'm just not sure how to convert the sql back to a lambda expression.
This is the part of the expression giving me the problems, somewhere in
r.RecordProducts.Any()
records = records
.Include(r => r.Employer)
.Include(r => r.Contractor)
.Include(r => r.RecordProducts)
.ThenInclude(rp => rp.ProductDefendant.Defendant)
.Where(r => EF.Functions.Like(r.Employer.DefendantCode, "%" + input.DefendantCode + "%")
|| EF.Functions.Like(r.Contractor.DefendantCode, "%" + input.DefendantCode + "%")
|| r.RecordProducts.Any(rp => EF.Functions.Like(rp.ProductDefendant.Defendant.DefendantCode, "%" + input.DefendantCode + "%") && rp.IsActive == true));
the any clause does an exist and some funky stuff in the sql where clause below
SELECT [t].[Id], [t].[StartDate], [t].[EndDate], [t].[WitnessName], [t].[SourceCode], [t].[JobsiteName], [t].[ShipName], [t].[EmployerCode]
FROM (
SELECT DISTINCT [r].[RecordID] AS [Id], [r].[StartDate], [r].[EndDate], [r.Witness].[FullName] AS [WitnessName], CASE
WHEN [r].[SourceID] IS NOT NULL
THEN [r.Source].[SourceCode] ELSE N'zzzzz'
END AS [SourceCode], CASE
WHEN [r].[JobsiteID] IS NOT NULL
THEN [r.Jobsite].[JobsiteName] ELSE N'zzzzz'
END AS [JobsiteName], CASE
WHEN [r].[ShipID] IS NOT NULL
THEN [r.Ship].[ShipName] ELSE N'zzzzz'
END AS [ShipName], CASE
WHEN [r].[EmployerID] IS NOT NULL
THEN [r.Employer].[DefendantCode] ELSE N'zzzzz'
END AS [EmployerCode]
FROM [Records] AS [r]
LEFT JOIN [Ships] AS [r.Ship] ON [r].[ShipID] = [r.Ship].[ShipID]
LEFT JOIN [Jobsites] AS [r.Jobsite] ON [r].[JobsiteID] = [r.Jobsite].[JobsiteID]
LEFT JOIN [Sources] AS [r.Source] ON [r].[SourceID] = [r.Source].[SourceID]
LEFT JOIN [Witnesses] AS [r.Witness] ON [r].[WitnessID] = [r.Witness].[WitnessID]
LEFT JOIN [Defendants] AS [r.Contractor] ON [r].[ContractorID] = [r.Contractor].[DefendantID]
LEFT JOIN [Defendants] AS [r.Employer] ON [r].[EmployerID] = [r.Employer].[DefendantID]
WHERE ([r].[IsActive] = 1) AND (([r.Employer].[DefendantCode] LIKE (N'%' + 'cert') + N'%' OR [r.Contractor].[DefendantCode] LIKE (N'%' + 'cert') + N'%') OR EXISTS (
SELECT 1
FROM [Records_Products] AS [rp]
INNER JOIN [Product_Defendant] AS [rp.ProductDefendant] ON [rp].[DefendantProductID] = [rp.ProductDefendant].[DefendantProductID]
INNER JOIN [Defendants] AS [rp.ProductDefendant.Defendant] ON [rp.ProductDefendant].[DefendantID] = [rp.ProductDefendant.Defendant].[DefendantID]
WHERE ([rp.ProductDefendant.Defendant].[DefendantCode] LIKE (N'%' + 'cert') + N'%' AND ([rp].[IsActive] = 1)) AND ([r].[RecordID] = [rp].[RecordID])))
) AS [t]
ORDER BY [t].[SourceCode]
OFFSET 0 ROWS FETCH NEXT 500 ROWS ONLY
Here is the new sql that works better, just not sure how to convert it back to a lambda expression
SELECT [t].[Id]
,[t].[StartDate]
,[t].[EndDate]
,[t].[WitnessName]
,[t].[SourceCode]
,[t].[JobsiteName]
,[t].[ShipName]
,[t].[EmployerCode]
FROM (
SELECT DISTINCT [r].[RecordID] AS [Id]
,[r].[StartDate]
,[r].[EndDate]
,[r.Witness].[FullName] AS [WitnessName]
,CASE
WHEN [r].[SourceID] IS NOT NULL
THEN [r.Source].[SourceCode]
ELSE N'zzzzz'
END AS [SourceCode]
,CASE
WHEN [r].[JobsiteID] IS NOT NULL
THEN [r.Jobsite].[JobsiteName]
ELSE N'zzzzz'
END AS [JobsiteName]
,CASE
WHEN [r].[ShipID] IS NOT NULL
THEN [r.Ship].[ShipName]
ELSE N'zzzzz'
END AS [ShipName]
,CASE
WHEN [r].[EmployerID] IS NOT NULL
THEN [r.Employer].[DefendantCode]
ELSE N'zzzzz'
END AS [EmployerCode]
FROM [Records] AS [r]
LEFT JOIN [Ships] AS [r.Ship] ON [r].[ShipID] = [r.Ship].[ShipID]
LEFT JOIN [Jobsites] AS [r.Jobsite] ON [r].[JobsiteID] = [r.Jobsite].[JobsiteID]
LEFT JOIN [Sources] AS [r.Source] ON [r].[SourceID] = [r.Source].[SourceID]
LEFT JOIN [Witnesses] AS [r.Witness] ON [r].[WitnessID] = [r.Witness].[WitnessID]
LEFT JOIN [Defendants] AS [r.Contractor] ON [r].[ContractorID] = [r.Contractor].[DefendantID]
LEFT JOIN [Defendants] AS [r.Employer] ON [r].[EmployerID] = [r.Employer].[DefendantID]
LEFT JOIN (
SELECT [rp].[RecordID]
FROM [Records_Products] AS [rp]
INNER JOIN [Product_Defendant] AS [rp.ProductDefendant] ON [rp].[DefendantProductID] = [rp.ProductDefendant].[DefendantProductID]
INNER JOIN [Defendants] AS [rp.ProductDefendant.Defendant] ON [rp.ProductDefendant].[DefendantID] = [rp.ProductDefendant.Defendant].[DefendantID]
WHERE (
[rp.ProductDefendant.Defendant].[DefendantCode] LIKE (N'%' + 'cert') + N'%'
AND ([rp].[IsActive] = 1)
)
) AS RecordProduct ON [r].[RecordID] = RecordProduct.[RecordID]
WHERE ([r].[IsActive] = 1)
AND (
(
[r.Employer].[DefendantCode] LIKE (N'%' + 'cert') + N'%'
OR [r.Contractor].[DefendantCode] LIKE (N'%' + 'cert') + N'%'
)
OR RecordProduct.RecordID IS NOT NULL --OR EXISTS ( -- SELECT 1 -- FROM [Records_Products] AS [rp] -- INNER JOIN [Product_Defendant] AS [rp.ProductDefendant] ON [rp].[DefendantProductID] = [rp.ProductDefendant].[DefendantProductID] -- INNER JOIN [Defendants] AS [rp.ProductDefendant.Defendant] ON [rp.ProductDefendant].[DefendantID] = [rp.ProductDefendant.Defendant].[DefendantID] -- WHERE ([rp.ProductDefendant.Defendant].[DefendantCode] LIKE (N'%' + 'cert') + N'%' -- AND ([rp].[IsActive] = 1)) AND ([r].[RecordID] = [rp].[RecordID]) -- ) )) AS [t]ORDER BY [t].[SourceCode]OFFSET 0 ROWS FETCH NEXT 500 ROWS ONLY
)
)
The linq expression you supplied and the SQL generated do not match. For one, the linq expression is performing an Include on the various related tables which would have included all of those entity columns in the top-level SELECT which are not present in your example SQL. I also don't see conditions in the Linq expression for the Take 500 & OrderBy, or IsActive assertion on Record.
To be able to help determine the source of any performance concern we need to see the complete Linq expression and the resulting SQL.
Looking at the basis of the Linq expression you provided:
records = records
.Include(r => r.Employer)
.Include(r => r.Contractor)
.Include(r => r.RecordProducts)
.ThenInclude(rp => rp.ProductDefendant.Defendant)
.Where(r => EF.Functions.Like(r.Employer.DefendantCode, "%" + input.DefendantCode + "%")
|| EF.Functions.Like(r.Contractor.DefendantCode, "%" + input.DefendantCode + "%")
|| r.RecordProducts.Any(rp => EF.Functions.Like(rp.ProductDefendant.Defendant.DefendantCode, "%" + input.DefendantCode + "%") && rp.IsActive == true));
There are a few suggestions I can make:
There is no need for the Functions.Like. You should be able to achieve the same with Contains.
Avoid using Include and instead utilize Select to retrieve the columns from the resulting structure that you actually need. Populate these into ViewModels or consume them in the code. The less data you pull back, the better optimized the SQL can be for indexing, and the less data pulled across the wire. Consuming entities also leads to unexpected lazy-load scenarios as systems mature and someone forgets to Include a new relation.
.
records = records
.Where(r => r.IsActive
&& (r.Employer.DefendantCode.Contains(input.DefendantCode)
|| r.Contractor.DefendantCode.Contains(input.DefendantCode)
|| r.RecordProducts.Any(rp => rp.IsActive
&& rp.ProductDefendant.Defendant.DefendantCode.Contains(input.DefendantCode))
.OrderBy(r => r.SourceCode)
.Select(r => new RecordViewModel
{
// Populate the data you want here.
}).Take(500).ToList();
This also adds the IsActive check, OrderBy, and Take(500) based on your sample SQL.

List in the Case-When Statement in Spark SQL

I'm trying to convert a dataframe from long to wide as suggested at How to pivot DataFrame?
However, the SQL seems to misinterpret the Countries list as a variable from the table. The below are the messages I saw from the console and the sample data and codes from the above link. Anyone knows how to resolve the issues?
Messages from the scala console:
scala> val myDF1 = sqlc2.sql(query)
org.apache.spark.sql.AnalysisException: cannot resolve 'US' given input columns >id, tag, value;
id tag value
1 US 50
1 UK 100
1 Can 125
2 US 75
2 UK 150
2 Can 175
and I want:
id US UK Can
1 50 100 125
2 75 150 175
I can create a list with the value I want to pivot and then create a string containing the sql query I need.
val countries = List("US", "UK", "Can")
val numCountries = countries.length - 1
var query = "select *, "
for (i <- 0 to numCountries-1) {
query += "case when tag = " + countries(i) + " then value else 0 end as " + countries(i) + ", "
}
query += "case when tag = " + countries.last + " then value else 0 end as " + countries.last + " from myTable"
myDataFrame.registerTempTable("myTable")
val myDF1 = sqlContext.sql(query)
Country codes are literals and should be enclosed in quotes otherwise SQL parser will treat these as the names of the columns:
val caseClause = countries.map(
x => s"""CASE WHEN tag = '$x' THEN value ELSE 0 END as $x"""
).mkString(", ")
val aggClause = countries.map(x => s"""SUM($x) AS $x""").mkString(", ")
val query = s"""
SELECT id, $aggClause
FROM (SELECT id, $caseClause FROM myTable) tmp
GROUP BY id"""
sqlContext.sql(query)
Question is why even bother with building SQL strings from scratch?
def genCase(x: String) = {
when($"tag" <=> lit(x), $"value").otherwise(0).alias(x)
}
def genAgg(f: Column => Column)(x: String) = f(col(x)).alias(x)
df
.select($"id" :: countries.map(genCase): _*)
.groupBy($"id")
.agg($"id".alias("dummy"), countries.map(genAgg(sum)): _*)
.drop("dummy")

Mysql Aliased tables in Anorm are not recognized

Update
Created a runnable demo for this problem.
https://github.com/narayanjr/anorm_test
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am unable to access fields on an aliased table. I keep getting error messages saying the field is not an option and the available fields are the base field name, or the `table_name'.field_name. But not the aliased field name. This makes it impossible to JOIN the same table twice and access all the fields.
var vendor_client_parser_1 = SqlParser.long("vid") ~ SqlParser.str("vname") ~ SqlParser.long("cid") ~ SqlParser.str("cname") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
var vendor_client_parser_2 = SqlParser.long("v.business_id") ~ SqlParser.str("v.name") ~ SqlParser.long("c.business_id") ~ SqlParser.str("c.name") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
var vendor_client_parser_3 = SqlParser.long(1) ~ SqlParser.str(2) ~ SqlParser.long(3) ~ SqlParser.str(4) map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
DB.withConnection
{
implicit c =>
var results =
SQL"""
SELECT v.business_id AS vid, v.name AS vname, c.business_id AS cid, c.name AS cname
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".as(vendor_client_parser.*)
}
Expected result:
1, Vendor A, 10, Vendor K
2, Vendor B, 11, Vendor L
2, Vendor B, 1, Vendor A
12, Vendor M, 3, Vendor C
Result from vendor_client_parser_1:
10, Vendor K, 10, Vendor K
11, Vendor L, 11, Vendor L
1, Vendor A, 1, Vendor A
3, Vendor C, 3, Vendor C
Result from vendor_client_parser_2:
Execution exception[[AnormException: 'v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name]]
Result from vendor_client_parser_3: (Same as expected)
1, Vendor A, 10, Vendor K
2, Vendor B, 11, Vendor L
2, Vendor B, 1, Vendor A
12, Vendor M, 3, Vendor C
vendor_client_parser_3 works but it relies on using index instead of names. I dis like using indexes because If I mess up an index I might still get a valid response back and not notice. If I mess up a name the column wont exist and I will know something is wrong.
Is there something I am missing? Is there any way to achieve the results I need without having to rely on using the index?
Play Scala 2.4.1
Anorm 2.5.0
Update:
If I do not alias the columns and use vendor_client_parser_2 I get the same result as when the columns are alias.
Modified Query:
SQL"""
SELECT v.business_id, v.name, c.business_id, c.name
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".as(vendor_client_parser_2.*)
Result with vendor_client_parser_2.*:
Execution exception[[AnormException: 'v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name]]
I also tested it with a single table aliased and it refuses to see the aliased table name
Single Table Test:
SQL"""
SELECT v.business_id, v.name, business_id, name
FROM #$BUSINESS_TABLE AS v
LIMIT 20
""".as(test_parser.*)
test_parser:
var test_parser = SqlParser.long("v.business_id") ~ SqlParser.str("v.name") ~ SqlParser.long("business_id") ~ SqlParser.str("name") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
Result:
[AnormException: 'v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name]
I then tested if aliased columns are accessible by both their original and aliased names.
Test Aliased columns:
SQL"""
SELECT business_id AS vid, name AS vname
FROM #$BUSINESS_TABLE
LIMIT 20
""".as(test_parser_2.*)
test_parser_2:
var test_parser_2 = SqlParser.long("business_id") ~ SqlParser.str("name") ~ SqlParser.long("vid") ~ SqlParser.str("vname") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
This test did not error out and it properly pulled in values as business_id and vid. As well as name and vname.
I forced it to error out so it would give me a list of column names. And it does seem like Anorm doesn't offer the non-aliased names as suggestions but they do work in this case.
[AnormException: 'forceError' not found, available columns: business.business_id, vid, business.name, vname]
I also tried not using SqlParser.
var businesses = SQL"""
SELECT v.business_id AS vid, v.name AS vname, c.business_id AS cid, c.name AS cname
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".fold(List[(Long, String, Long, String)]())
{
(list, row) =>
list :+ (row[Long]("v.business_id"), row[String]("v.name"), row[Long]("c.business_id"), row[String]("c.name")) //attempt_1
//list :+ (row[Long]("vid"), row[String]("vname"), row[Long]("cid"), row[String]("cname")) //attempt_2
}
If I use attempt_1 I get this error which as you suggested shouldn't work.
Left('v.business_id' not found, available columns: business.business_id, vid, business.name, vname, business.business_id, cid, business.name, cname)))
If I use attempt_2, I get the same results as vendor_client_parser_1
10, Vendor K, 10, Vendor K
11, Vendor L, 11, Vendor L
1, Vendor A, 1, Vendor A
3, Vendor C, 3, Vendor C
If I do not alias the columns and use this same method
SQL"""
SELECT v.business_id, v.name, c.business_id, c.name
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".fold(List[(Long, String, Long, String)]())
{
(list, row) =>
list :+ (row[Long]("v.business_id"), row[String]("v.name"), row[Long]("c.business_id"), row[String]("c.name")) //Attempt_3
}
Using this query without aliasing the columns causes this error,
Left('v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name)))
I then tested a simple aliased tabled using this method
SQL"""
SELECT v.business_id, v.name
FROM #$BUSINESS_TABLE AS v
LIMIT 20
""".fold(List[(Long, String)]())
{
(list, row) =>
list :+ (row[Long]("v.business_id"), row[String]("v.name")) //simple_attempt_1
}
I get the same error
Left(List(java.lang.RuntimeException: Left('v.business_id' not found, available columns: business.business_id, business_id, business.name, name)))
So as far as I can tell it isn't possible to access fields that are part of an aliased table if the same table is used twice using the field names instead of index.
Update 2:
I tried reversing the order of fields in the SQL so it was c.business_id AS cid, c.name AS cname, v.business_id AS vid, v.name AS vname and rerunning vendor_client_parser_1. It gave me the inverse results
Result from vendor_client_parser_1 with mysql fields switched:
1, Vendor A, 1, Vendor A
2, Vendor B, 2, Vendor B
2, Vendor B, 2, Vendor B
12, Vendor M, 12, Vendor M
When I force an error and it shows me possible fields I get these,
Fields in original order:
Left('forceError' not found, available columns: business.business_id, vid, business.name, vname, business.business_id, cid, business.name, cname)
Fields in switched order:
Left('forceError' not found, available columns: business.business_id, cid, business.name, cname, business.business_id, vid, business.name, vname)
This makes me think this scenario is happening.
In case several columns are found with same name in query result, for example columns named code in both Country and CountryLanguage tables, there can be ambiguity. By default a mapping like following one will use the last column:
https://www.playframework.com/documentation/2.4.1/ScalaAnorm
If you look at the suggested fields business.business_id and business.name occur twice because the table is referenced twice. It seems like Anorm is associating the last occurrence of business.business_id and business.name with both aliases.
Update
Created a runnable demo for this problem.
https://github.com/narayanjr/anorm_test
I have not tried it in 2.4, but you could possibly use a parser method to generate the column name for you, rather than a variable. My approach would be something like the following:
case class BusinessConnection(vendor_id: Int, client_id: Int)
case class Business(id: Int, name: String)
case class VendorClient(vendor: Business, client: Business)
object VendorClient {
val businessConnectionP =
get[Int]("vendor_id") ~
get[Int]("client_id") map {
case vendor_id ~ client_id =>
BusinessConnection(vendor_id, client_id)
}
def businessP(alias: String) =
getAliased[Int](alias + ".id") ~
getAliased[String](alias + ".name") map {
case id ~ name =>
Business(id, name)
}
val vendorClientP =
businessP("v") ~ businessP("c") map {
case business ~ client =>
VendorClient(business, client)
}
val sqlSelector = "v.id, v.name, c.id, c.name"
def all() = DB.withConnection { implicit c =>
SQL(s"""
select $sqlSelector
from businessConnection
join business as v on vendor_id=v.id
join business as c on client_id=c.id
""").as(vendorClientP *)
}
}

T-SQL: returning VARCHAR in a derived column

I am having problems returning a VARCHAR out of a derived column.
Below are extremely simplified code examples.
I have been able to do this before:
SELECT *, message =
CASE
WHEN (status = 0)
THEN 'aaa'
END
FROM products
But when I introduce a Common Table Expression or Derived Table:
WITH CTE_products AS (SELECT * from products)
SELECT *, message =
CASE WHEN (status = 0)
THEN 'aaa'
END
FROM CTE_products
this seems to fail with the following message:
Conversion failed when converting the varchar value 'aaa' to data type int.
When I tweak the line to say:
WITH CTE_products AS (SELECT * from products)
SELECT *, message =
CASE WHEN (status = 0)
THEN '123'
END
FROM CTE_products
It returns correctly.
...
When I remove all the other clauses prior to it, it also works fine returning 'aaa'.
My preference would be to keep this as a single, stand-alone query.
The problem is that the column is an integer dataype and sql server is trying to convert 'aaa' to integer
one way
WITH CTE_products AS (SELECT * from products)
SELECT *, message =
CASE WHEN (status = 0)
THEN 'aaa' else convert(varchar(50),status)
END
FROM CTE_products
I actually ended up finding the answer.
One of my CASE/WHEN clauses used a derived column from the CTE and that ended up causing the confusion.
Before:
WITH CTE_products AS (SELECT *, qty_a + qty_b as qty_total FROM products)
SELECT *, message =
CASE WHEN (status = 0)
THEN 'Status is 0, the total is: ' + qty_total + '!'
END
FROM CTE_products
Corrected:
WITH CTE_products AS (SELECT *, qty_a + qty_b as qty_total FROM products)
SELECT *, message =
CASE WHEN (status = 0)
THEN 'Status is 0, the total is: ' + CAST(qty_total AS VARCHAR) + '!'
END
FROM CTE_products
I ended up removing WHEN/THEN clauses within the CASE statement right afterwards to see if it was a flukey parentheses error when I realized that in the absence of any of the WHEN/THEN clauses that included the derived column from the CTE, it was able to return VARCHAR.