Dynamic FROM clause in Postgres - postgresql

Using PostgreSQL 9.1.13 I've written the followed query to calculate some data:
WITH windowed AS (
SELECT a.person_id, a.category_id,
CAST(dense_rank() OVER w AS float) / COUNT(*) OVER (ORDER BY category_id) * 100.0 AS percentile
FROM (
SELECT DISTINCT ON (person_id, category_id) *
FROM performances s
-- Want to insert a FROM clause here
INNER JOIN person p ON s.person_id = p.ident
ORDER BY person_id, category_id, created DESC
) a
WINDOW w AS (PARTITION BY category_id ORDER BY score)
)
SELECT category_id,percentile FROM windowed
WHERE person_id = 1;
I now want to turn this into a stored procedure but my issue is that in the middle there, where I showed the comment, I need to place a dynamic WHERE clause. For example, I'd like to add something like:
WHERE p.weight > 110 OR p.weight IS NULL
The calling application let's people pick filters and so I want to be able to pass the appropriate filters into the query. There could be 0 or many filters, depending on the caller, but I could pass it all in as a properly formatted where clause as a string parameter, for example.
The calling application just sends values to a webservice, which then builds the string and calls the stored procedure, so SQL injection attacks won't really be an issue.

The calling application just sends values to a webservice, which then
builds the string and calls the stored procedure, so SQL injection
attacks won't really be an issue.
Too many cooks spoil the broth.
Either let your webserive build the SQL statement or let Postgres do it. Don't use both on the same query. That leaves two possible weak spots for SQL injection attacks and makes debugging and maintenance a lot harder.
Here is full code example for a plpgsql function that builds and executes an SQL statement dynamically while making SQL injection impossible (just from two days ago):
Robust approach for building SQL queries programmatically
Details heavily depend on exact requirements.

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

Feasibility of recreating complex SQL query in Crystal Reports XI

I have about 10 fairly complex SQL queries on SQL Server 2008 - but the client wants to be able to run them from their internal network (as opposed to from the non-local web app) through Crystal Reports XI.
The client's internal network does not allow us to (a) have write access to their proprietary db, nor (b) allow us to set up an intermediary SQL server (meaning we can not set up stored procedures or other data cleaning).
The SQL contains multiple instances of row_number() over (partition by col1, col2), group by col1, col2 with cube|rollup, and/or (multiple) pivots.
Can this even be done? Everything I've read seems to indicate that this is only feasible via stored procedure and I would still need to pull the data from the proprietary db first.
Following is a stripped back version of one of the queries (eg, JOINs not directly related to functionality, WHERE clauses, and half a dozen columns have been removed)...
select sum(programID)
, sum([a.Asian]) as [Episodes - Asian], sum([b.Asian]) as [Eps w/ Next Svc - Asian], sum([c.Asian])/sum([b.Asian]) as [Avg Days to Next Svc - Asian]
, etc... (repeats for each ethnicity)
from (
select programID, 'a.' + ethnicity as ethnicityA, 'b.' + ethnicity as ethnicityB, 'c.' + ethnicity as ethnicityC
, count(*) as episodes, count(daysToNextService) as episodesWithNextService, sum(daysToNextService) as daysToNextService
from (
select programID, ethnicity, datediff(dateOfDischarge, nextDateOfService) as daysToNextService from (
select t1.userID, t1.programID, t1.ethnicity, t1.dateOfDischarge, t1.dateOfService, min(t2.dateOfService) as nextDateOfService
from TABLE1 as t1 left join TABLE1 as t2
on datediff(d, t1.dateOfService, t2.dateOfService) between 1 and 31 and t1.userID = t2.userID
group by t1.userID, t1.programID, t1.ethnicity, t1.dateOfDischarge, t1.dateOfService
) as a
) as a
group by programID
) as a
pivot (
max(episodes) for ethnicityA in ([A.Asian],[A.Black],[A.Hispanic],[A.Native American],[A.Native Hawaiian/ Pacific Isl.],[A.White],[A.Unknown])
) as pA
pivot (
max(episodesWithNextService) for ethnicityB in ([B.Asian],[B.Black],[B.Hispanic],[B.Native American],[B.Native Hawaiian/ Pacific Isl.],[B.White],[B.Unknown])
) as pB
pivot (
max(daysToNextService) for ethnicityC in ([C.Asian],[C.Black],[C.Hispanic],[C.Native American],[C.Native Hawaiian/ Pacific Isl.],[C.White],[C.Unknown])
) as pC
group by programID with rollup
Sooooooo.... can something like this even be translated into Crystal Reports XI?
Thanks!
When you create your report instead of selecting a table or stored procedure choose add command
This will allow you to put whatever valid TSQL statement in there that you want. Using Common Table Expressions (CTE's) and inline Views I've managed to create some rather large complex statements (excess of 400 lines) against Oracle and SQL Server so it is indeed feasible, however if you use parameters you should consider using sp_executesql you'll have to figure out how to avoid SQL injection.

SQL DateTime Conversion Fails when No Conversion Should be Taking Place

I'm modifying an existing query for a client, and I've encountered a somewhat baffling issue.
Our client uses SQL Server 2008 R2 and the database in question provides the user the ability to specify custom fields for one of its tables by making use of an EAV structure. All of the values stored in this structure are varchar(255), and several of the fields are intended to store dates. The query in question is being modified to use two of these fields and compare them (one is a start, the other is an end) against the current date to determine which row is "current".
The issue I'm having is that part of the query does a CONVERT(DateTime, eav.Value) in order to turn the varchar into a DateTime. The conversions themselves all succedd and I can include the value as part of the SELECT clause, but part of the question is giving me a conversion error:
Conversion failed when converting date and/or time from character string.
The real kicker is this: if I define the base for this query (getting a list of entities with the two custom field values flattened into a single row) as a view and select against the view and filter the view by getdate(), then it works correctly, but it fails if I add a join to a second table using one of the (non-date) fields from the view. I realize that this might be somewhat hard to follow, so I can post an example query if desired, but this question is already getting a little long.
I've tried recreating the basic structure in another database and including sample data, but the new database behaves as expected, so I'm at a loss here.
EDIT In case it's useful, here's the statement for the view:
create view Festival as
select
e.EntityId as FestivalId,
e.LookupAs as FestivalName,
convert(Date, nvs.Value) as ActivityStart,
convert(Date, nve.Value) as ActivityEnd
from tblEntity e
left join CustomControl ccs on ccs.ShortName = 'Activity Start Date'
left join CustomControl cce on cce.ShortName = 'Activity End Date'
left join tblEntityNameValue nvs on nvs.CustomControlId = ccs.IdCustomControl and nvs.EntityId = e.EntityId
left join tblEntityNameValue nve on nve.CustomControlId = cce.IdCustomControl and nve.EntityId = e.EntityId
where e.EntityType = 'Festival'
The failing query is this:
select *
from Festival f
join FestivalAttendeeAll fa on fa.FestivalId = f.FestivalId
where getdate() between f.ActivityStart and f.ActivityEnd
Yet this works:
select *
from Festival f
where getdate() between f.ActivityStart and f.ActivityEnd
(EntityId/FestivalId are int columns)
I've encountered this type of error before, it's due to the "order of operations" performed by the execution plan.
You are getting that error message because the execution plan for your statement (generated by the optimizer) is performing the CONVERT() operation on rows that contain string values that can't be converted to DATETIME.
Basically, you do not have control over which rows the optimizer performs that conversion on. You know that you only need that conversion done on certain rows, and you have predicates (WHERE or ON clauses) that exclude those rows (limit the rows to those that need the conversion), but your execution plan is performing the CONVERT() operation on rows BEFORE those rows are excluded.
(For example, the optimizer may be electing to a do a table scan, and performing that conversion on every row, before any predicate is being applied.)
I can't give a specific answer, without a specific question and specific SQL that is generating the error.
One simple approach to addressing the problem would be to use the ISDATE() function to test whether the string value can be converted to a date.
That is, replace:
CONVERT(DATETIME,eav.Value)
with:
CASE WHEN ISDATE(eav.Value) > 0 THEN CONVERT(DATETIME, eav.Value) ELSE NULL END
or:
CONVERT(DATETIME, CASE WHEN ISDATE(eav.Value) > 0 THEN eav.Value ELSE NULL END)
Note that the ISDATE() function is subject to some significant limitations, such as being affected by the DATEFORMAT and LANGUAGE settings of the session.
If there is some other indication on the eav row, you could use some other test, to conditionally perform the conversion.
CASE WHEN eav.ValueIsDateTime=1 THEN CONVERT(DATETIME, eav.Value) ELSE NULL END
The other approach I've used is to try to gain some modicum of control over the order of operations of the optimizer, using inline views or Common Table Expressions, with operations that force the optimizer to materialize them and apply predicates, so that happens BEFORE any conversion in the outer query.

Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094

SELECT DISTINCT tblJobReq.JobReqId
, tblJobReq.JobStatusId
, tblJobClass.JobClassId
, tblJobClass.Title
, tblJobReq.JobClassSubTitle
, tblJobAnnouncement.JobClassDesc
, tblJobAnnouncement.EndDate
, blJobAnnouncement.AgencyMktgVerbage
, tblJobAnnouncement.SpecInfo
, tblJobAnnouncement.Benefits
, tblSalary.MinRateSal
, tblSalary.MaxRateSal
, tblSalary.MinRateHour
, tblSalary.MaxRateHour
, tblJobClass.StatementEval
, tblJobReq.ApprovalDate
, tblJobReq.RecruiterId
, tblJobReq.AgencyId
FROM ((tblJobReq
LEFT JOIN tblJobAnnouncement ON tblJobReq.JobReqId = tblJobAnnouncement.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary ON tblJobClass.SalaryCode = tblSalary.SalaryCode
WHERE (tblJobReq.JobClassId in (SELECT JobClassId
from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
When i try to execute the query it results in the following error.
Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094
I checked and didn't find any solution. The only way is to truncate (substring())the "tblJobAnnouncement.JobClassDesc" in the query which has column size of around 8000.
Do we have any work around so that i need not truncate the values. Or Can this query be optimised? Any setting in SQL Server 2000?
The [non obvious] reason why SQL needs to SORT is the DISTINCT keyword.
Depending on the data and underlying table structures, you may be able to do away with this DISTINCT, and hence not trigger this error.
You readily found the alternative solution which is to truncate some of the fields in the SELECT list.
Edit: Answering "Can you please explain how DISTINCT would be the reason here?"
Generally, the fashion in which the DISTINCT requirement is satisfied varies with
the data context (expected number of rows, presence/absence of index, size of row...)
the version/make of the SQL implementation (the query optimizer in particular receives new or modified heuristics with each new version, sometimes resulting in alternate query plans for various constructs in various contexts)
Yet, all the possible plans associated with a "DISTINCT query" involve *some form* of sorting of the qualifying records. In its simplest form, the plan "fist" produces the list of qualifying rows (records) (the list of records which satisfy the WHERE/JOINs/etc. parts of the query) and then sorts this list (which possibly includes some duplicates), only retaining the very first occurrence of each distinct row. In other cases, for example when only a few columns are selected and when some index(es) covering these columns is(are) available, no explicit sorting step is used in the query plan but the reliance on an index implicitly implies the "sortability" of the underlying columns. In other cases yet, steps involving various forms of merging or hashing are selected by the query optimizer, and these too, eventually, imply the ability of comparing two rows.
Bottom line: DISTINCT implies some sorting.
In the specific case of the question, the error reported by SQL Server and preventing the completion of the query is that "Sorting is not possible on rows bigger than..." AND, the DISTINCT keyword is the only apparent reason for the query to require any sorting (BTW many other SQL constructs imply sorting: for example UNION) hence the idea of removing the DISTINCT (if it is logically possible).
In fact you should remove it, for test purposes, to assert that, without DISTINCT, the query completes OK (if only including some duplicates). Once this fact is confirmed, and if effectively the query could produce duplicate rows, look into ways of producing a duplicate-free query without the DISTINCT keyword; constructs involving subqueries can sometimes be used for this purpose.
An unrelated hint, is to use table aliases, using a short string to avoid repeating these long table names. For example (only did a few tables, but you get the idea...)
SELECT DISTINCT JR.JobReqId, JR.JobStatusId,
tblJobClass.JobClassId, tblJobClass.Title,
JR.JobClassSubTitle, JA.JobClassDesc, JA.EndDate, JA.AgencyMktgVerbage,
JA.SpecInfo, JA.Benefits,
S.MinRateSal, S.MaxRateSal, S.MinRateHour, S.MaxRateHour,
tblJobClass.StatementEval,
JR.ApprovalDate, JR.RecruiterId, JR.AgencyId
FROM (
(tblJobReq AS JR
LEFT JOIN tblJobAnnouncement AS JA ON JR.JobReqId = JA.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary AS S ON tblJobClass.SalaryCode = S.SalaryCode
WHERE (JR.JobClassId in
(SELECT JobClassId from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
FYI, running this SQL command on your DB can fix the problem if it is caused by space that needs to be reclaimed after dropping variable length columns:
DBCC CLEANTABLE (0,[dbo.TableName])
See: http://msdn.microsoft.com/en-us/library/ms174418.aspx
This is a limitation of SQL Server 2000. You can:
Split it into two queries and combine elsewhere
SELECT ID, ColumnA, ColumnB FROM TableA JOIN TableB
SELECT ID, ColumnC, ColumnD FROM TableA JOIN TableB
Truncate the columns appropriately
SELECT LEFT(LongColumn,2000)...
Remove any redundant columns from the SELECT
SELECT ColumnA, ColumnB, --IDColumnNotUsedInOutput
FROM TableA
Migrate off of SQL Server 2000

Maximum amount of where clauses in stored procedure

Me and my colleagues have a question regarding SQL Server 2008 query length and the SQL Server Optimizer.
We are planning to generate some stored procedures that potentially have a lot of parameters. Inside of our stored procedure we will simply select some values from a table joining other tables.
Our stored procedures will look like this
CREATE PROCEDURE QueryTable
#Parameter001 nvarchar(20),
#Parameter002 int,
#Parameter003 datetime,
#Parameter004 decimal(11,2),
#Parameter005 date,
#Parameter006 varchar(150),
#Parameter007 int,
#Parameter008 decimal(5,2),
#Parameter009 nvarchar(10),
#Parameter010 nvarchar(200),
#Parameter011 nvarchar(50) --,
--...and so on, there are probably 50 to 100 parameters here
AS
BEGIN
SET NOCOUNT ON;
SELECT ID, COL01, COL02, COL03, COL04, COL05 from TestTable T
LEFT JOIN AnotherTable A On T.SomeColomn = A.SomeColumn
LEFT JOIN AThirdTable ATT On A.ThirdTableID = ATT.Id
--and so on, probably 5-10 Tables joined here
WHERE
T.Col02 = #Parameter001 AND
T.Col05 = #Parameter004 AND
ATT.SomeColumnContainingData = #Parameter027
A.AnotherID = #Parameter050
--probably 50 to 100 conditions here (Number of conditions equals number of parameters)
END
GO
Our questions:
Is there a limit on the amount of where-conditions that the Query Optimizer and the SQL Server Cache can take into account?
If there is not such a technical limit, is there a best practice on how many conditions can and should be used in such cases?
A limit of the number of WHERE clauses is not going to be your problem.
Parameter sniffing and poor (or incorrect) query plans cached might be.
This can be somewhat avoided using OPTIMIZE FOR
Obviously, the less complex you can make the WHERE clause, the better.
The SQL Server Books On Line has a list of implementation limits, including aspects of queries. This is the list of SQL Server 2005.
In case anyone is interested...
We solved this issue by generating dynamic sql that is executed on the server. In this way only relevant where clauses are used and the statement is relatively short.