I have about 10 fairly complex SQL queries on SQL Server 2008 - but the client wants to be able to run them from their internal network (as opposed to from the non-local web app) through Crystal Reports XI.
The client's internal network does not allow us to (a) have write access to their proprietary db, nor (b) allow us to set up an intermediary SQL server (meaning we can not set up stored procedures or other data cleaning).
The SQL contains multiple instances of row_number() over (partition by col1, col2), group by col1, col2 with cube|rollup, and/or (multiple) pivots.
Can this even be done? Everything I've read seems to indicate that this is only feasible via stored procedure and I would still need to pull the data from the proprietary db first.
Following is a stripped back version of one of the queries (eg, JOINs not directly related to functionality, WHERE clauses, and half a dozen columns have been removed)...
select sum(programID)
, sum([a.Asian]) as [Episodes - Asian], sum([b.Asian]) as [Eps w/ Next Svc - Asian], sum([c.Asian])/sum([b.Asian]) as [Avg Days to Next Svc - Asian]
, etc... (repeats for each ethnicity)
from (
select programID, 'a.' + ethnicity as ethnicityA, 'b.' + ethnicity as ethnicityB, 'c.' + ethnicity as ethnicityC
, count(*) as episodes, count(daysToNextService) as episodesWithNextService, sum(daysToNextService) as daysToNextService
from (
select programID, ethnicity, datediff(dateOfDischarge, nextDateOfService) as daysToNextService from (
select t1.userID, t1.programID, t1.ethnicity, t1.dateOfDischarge, t1.dateOfService, min(t2.dateOfService) as nextDateOfService
from TABLE1 as t1 left join TABLE1 as t2
on datediff(d, t1.dateOfService, t2.dateOfService) between 1 and 31 and t1.userID = t2.userID
group by t1.userID, t1.programID, t1.ethnicity, t1.dateOfDischarge, t1.dateOfService
) as a
) as a
group by programID
) as a
pivot (
max(episodes) for ethnicityA in ([A.Asian],[A.Black],[A.Hispanic],[A.Native American],[A.Native Hawaiian/ Pacific Isl.],[A.White],[A.Unknown])
) as pA
pivot (
max(episodesWithNextService) for ethnicityB in ([B.Asian],[B.Black],[B.Hispanic],[B.Native American],[B.Native Hawaiian/ Pacific Isl.],[B.White],[B.Unknown])
) as pB
pivot (
max(daysToNextService) for ethnicityC in ([C.Asian],[C.Black],[C.Hispanic],[C.Native American],[C.Native Hawaiian/ Pacific Isl.],[C.White],[C.Unknown])
) as pC
group by programID with rollup
Sooooooo.... can something like this even be translated into Crystal Reports XI?
Thanks!
When you create your report instead of selecting a table or stored procedure choose add command
This will allow you to put whatever valid TSQL statement in there that you want. Using Common Table Expressions (CTE's) and inline Views I've managed to create some rather large complex statements (excess of 400 lines) against Oracle and SQL Server so it is indeed feasible, however if you use parameters you should consider using sp_executesql you'll have to figure out how to avoid SQL injection.
Related
Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html
I've just started working with PostgreSQL, I've used to work with SQL Server and I'm currently migrating some of the existing processes.
The current issue which I'm facing is the performance for an Update statement.
I'm trying to update all records from one table (e.g. MyTable_History) and set new values for some columns.
In Sql Server I've used the following syntax:
declare #NewEndDate datetime = (select dateadd(minute, -1, getdate()))
update MyTable_History
set isLastestVersion=0, ValidTo=#NewEndDate , ModifiedBy='TestSCriptSql',ModifiedTime=GETDATE()
The code which i could come up with (since I don't know how to simply use variables, therefore used a temp tbl) for PostgreSQL is:
CREATE TEMP TABLE dates AS VALUES (current_timestamp + (-1 ||' minutes')::interval);
with d as (
select th.validto as validto, th.islatestversion as islatestversion,
th.modifiedby as modifiedby, th.modifiedtime as modifiedtime, d.column1 as newvalidto
from MyTable_History th, dates d
)
update MyTable_History
set validto = d.newvalidto, islatestversion=false, modifiedby='test_update_script', modifiedtime=current_timestamp
from d
The Sql Server runs localy on my laptop (not a super config) and the PosgreSQL server runs on AWS as RDS (i don't know the exact specs).
My question is am I doing something wrong in the PostgreSql update statement? Because on a 5000+ dataset sample on Sql Server the statement is instantly performed, while on PostgreSql it takes around 50 secs to successfully finish.
Also, from my point of view it seems I've over engineered, since on Sql Server I was having 3 lines of code, while on postgreSql i'm using a CTE.
Regrards,
I don't see why you would need a variable to begin with. current_timestamp returns the same value throughout a transaction as documented in the manual and thus will have the same value for all updated rows.
update mytable_history
set islastestversion = 0,
validto = current_timestamp - interval '1 minute',
modifiedby = 'test_update_script',
modifiedtime = current_timestamp;
But your usage of FROM in the UPDATE statement is wrong. The semantics of using FROM in an UPDATE statement are very different between Postgres and SQL Server
The way you use it, creates a cross join between the CTE and mytable_history. (so essentially a cross join of the table with itself).
You need to have a join condition in the WHERE clause on the primary key:
with d as (...)
update MyTable_History
set validto = d.newvalidto, islatestversion=false,
modifiedby='test_update_script', modifiedtime=current_timestamp
from d
where d.pk_column = MyTable_History.pk_column;
But if you really want to simulate something like variables, you don't need the CTE:
update mytable_history
set islastestversion = 0,
validto = t.newvalidto
modifiedby = 'test_update_script',
modifiedtime = current_timestamp
from (
values (current_timestamp - interval '1 minute')
) t (newvalidto);
The above still creates a "cross join" but as the joined table (from (values ...)) only contains a single row, it's not really a cross join.
Using PostgreSQL 9.1.13 I've written the followed query to calculate some data:
WITH windowed AS (
SELECT a.person_id, a.category_id,
CAST(dense_rank() OVER w AS float) / COUNT(*) OVER (ORDER BY category_id) * 100.0 AS percentile
FROM (
SELECT DISTINCT ON (person_id, category_id) *
FROM performances s
-- Want to insert a FROM clause here
INNER JOIN person p ON s.person_id = p.ident
ORDER BY person_id, category_id, created DESC
) a
WINDOW w AS (PARTITION BY category_id ORDER BY score)
)
SELECT category_id,percentile FROM windowed
WHERE person_id = 1;
I now want to turn this into a stored procedure but my issue is that in the middle there, where I showed the comment, I need to place a dynamic WHERE clause. For example, I'd like to add something like:
WHERE p.weight > 110 OR p.weight IS NULL
The calling application let's people pick filters and so I want to be able to pass the appropriate filters into the query. There could be 0 or many filters, depending on the caller, but I could pass it all in as a properly formatted where clause as a string parameter, for example.
The calling application just sends values to a webservice, which then builds the string and calls the stored procedure, so SQL injection attacks won't really be an issue.
The calling application just sends values to a webservice, which then
builds the string and calls the stored procedure, so SQL injection
attacks won't really be an issue.
Too many cooks spoil the broth.
Either let your webserive build the SQL statement or let Postgres do it. Don't use both on the same query. That leaves two possible weak spots for SQL injection attacks and makes debugging and maintenance a lot harder.
Here is full code example for a plpgsql function that builds and executes an SQL statement dynamically while making SQL injection impossible (just from two days ago):
Robust approach for building SQL queries programmatically
Details heavily depend on exact requirements.
I have a table with roughly 7,000,000 records.
It's very flat and for sake of argument it has 3 columns I wish to aggregate on. This aggregation should very simply create a count /pivot of each instance of that value.
E.G
Company Status Year
Hatstand Open 2011
Hatstand Closed 2011
Moonbase Open 2011
Would produce
Count of Hatstand **2**
Count of Hatstand Open **1**
Count of Hatstand Open 2011 **1**
So it's a very simple count of each "branch" of data.
My first choice was to use a SSRS Matrix control. Which when testing with a small dataset worked really well. However when using a the full data-set would not run.
What is the "correct" way to approach this problem?
Should I pre-aggregate via a stored procedure or SSIS job?
Or should I continue with the SSRS route and try to refine my query?
Thanks
T-SQL is preferred way to go if You are using SQL Server 2005 or 2008.
If You are using SQL Server 2005 or 2008 You can try:
SELECT *, COUNT(*) FROM TBL
GROUP BY COMPANY, STATUS, YEAR
WITH ROLLUP -- or WITH CUBE
If You are using SQL Server 2008 You can try with :
SELECT *, COUNT(*) FROM A
GROUP BY
GROUPING SETS (
(COMPANY),
(COMPANY, STATUS),
(COMPANY, STATUS, YEAR)
)
For details about GROUP BY WITH ROLLUP/CUBE and GROUPING SETS take a look at GROUP BY.
I'm trying to help my power users have more access to our data so I don't have to interrupt my work (playing Pac-Man) 25 times a day writing Ad Hoc Queries and such.
I'm trying to use Data Source Views, Data Models, and Report Builder 2 and 3 to allow them to have access to cleansed data in which they can safely do their own basic analysis. I want to create generic Report Models covering business processes rather than a specific report model for each ad hoc report they would need.
I have to create the Data Source View (DSV) with a named query because the source database lacks primary keys, but does have unique clustered indexes on identity_columns.
Here's my problem. When I use a relatively simple query like this:
SELECT SOM.FSONO AS SalesNo
, SOM.FCUSTNO AS CustNo
,SLC.fcompany as CustName
, SOM.FCUSTPONO AS CustPONo
, SOM.fsoldby AS SalesPerson
, SOR.FENUMBEr AS ItemNo
, SOR.finumber AS IntItemNo
, SOR.frelease AS Rels
, SOI.fprodcl AS ProdClass
, SOI.fgroup AS GroupCode
, rtrim(SOR.FPARTNO) AS PartNo
, SOR.fpartrev AS PartRev
, cast(SOI.fdesc AS VARCHAR(20)) AS PartDescription
,SOM.forderdate as OrderDate
,SOR.fduedate as DueDate
, SOR.FORDERQTY AS QtyOrd
, SOR.FUNETPRICE AS NetUnitPrice
, (SOR.FORDERQTY * SOR.funetprice) AS NetAmountOrdered
FROM slcdpm SLC inner join
somast SOM on SLC.fcustno = SOM.fcustno
LEFT OUTER JOIN soitem SOI
ON (SOM.fsono = SOI.fsono)
LEFT OUTER JOIN sorels SOR
ON (SOI.fsono = SOR.fsono)
AND (SOI.finumber = SOR.finumber)
Let's assume the user takes the Report Model in Report Builder 3 and only requests SalesNo, PartNo, PartRev, OrderDate, and TotalNetAmount for their dataset.
The SQL Generated to pull that data is:
SET DATEFIRST 7
SELECT
CAST(1 AS BIT) [c0_is_agg],
CAST(1 AS BIT) [c1_is_agg],
CAST(1 AS BIT) [c2_is_agg],
CAST(1 AS BIT) [c3_is_agg],
4 [agg_row_count],
[CustomerSales].[TotalNetAmountOrdered] [TotalNetAmountOrdered],
[CustomerSales].[SalesNo] [SalesNo],
[CustomerSales].[PartNo] [PartNo],
[CustomerSales].[PartRev] [PartRev],
[CustomerSales].[OrderDate] [OrderDate]
FROM
(
SELECT
SUM([CustomerSales].[NetAmountOrdered]) [TotalNetAmountOrdered],
[CustomerSales].[SalesNo] [SalesNo],
[CustomerSales].[PartNo] [PartNo],
[CustomerSales].[PartRev] [PartRev],
[CustomerSales].[OrderDate] [OrderDate]
FROM
(
SELECT SOM.fsono AS SalesNo, SOM.fcustno AS CustNo, SLC.fcompany AS CustName, SOM.fcustpono AS CustPONo, SOM.fsoldby AS SalesPerson,
SOR.fenumber AS ItemNo, SOR.finumber AS IntItemNo, SOR.frelease AS Rels, SOI.fprodcl AS ProdClass, SOI.fgroup AS GroupCode, RTRIM(SOR.fpartno) AS PartNo,
SOR.fpartrev AS PartRev, CAST(SOI.fdesc AS VARCHAR(20)) AS PartDescription, SOM.forderdate AS OrderDate, SOR.fduedate AS DueDate, SOR.forderqty AS QtyOrd,
SOR.funetprice AS NetUnitPrice, SOR.forderqty * SOR.funetprice AS NetAmountOrdered
FROM slcdpm AS SLC INNER JOIN
somast AS SOM ON SLC.fcustno = SOM.fcustno LEFT OUTER JOIN
soitem AS SOI ON SOM.fsono = SOI.fsono LEFT OUTER JOIN
sorels AS SOR ON SOI.fsono = SOR.fsono AND SOI.finumber = SOR.finumber
) [CustomerSales]
WHERE
CAST(1 AS BIT) = 1
GROUP BY
[CustomerSales].[SalesNo], [CustomerSales].[PartNo], [CustomerSales].[PartRev], [CustomerSales].[OrderDate]
) [CustomerSales]
ORDER BY
[SalesNo], [PartNo], [PartRev], [OrderDate]
I would have expected only the fields pulled which the user requests in the report and not every single field in the DSV. Also, if parameters are created which constrain the data such as a beginning and ending date for OrderDate, the full data set is returned anyway.
Am I doing something wrong here?
Is there a better way to approach this?
Do other administrators find themselves with performance issues when using Report Models?
There are sometimes performance issues when dealing with Report Models. This is one of the reasons that report models are not meant for rolling out to all of your users to replace all reports. The queries generated by the semantic query engine behind reports models are not tunable and are often totally NOT the way you yourself woudl write them.
The engine essentially treats the named query as a view, which it expands into the underlying query, just as it would a view. This is often an issue when building a model directly overlaying your database.
The ideal situation, from my perspective, is to have a separate database (datawarehouse possibly) that is preferrably housed on a separate server. This dw would be flattenned out such that you could optimize it for read performance. Then, you could use those tables directly in your data source view and the semantic query engine behind the model should be able to make better queries.
This ideal is often not possible due to economic or other constraints. Could you try having a job more or less ETL from your base tables into a new set of tables that you could optimize for reporting to support your model?