How to use variable in OVER clause in SQL Server - tsql

I would like to use a variable for the number of rows used in an 'OVER clause' statement. Up to now I only get it working by creation of the sql statement in a string and then execute it.
While the final purpose is to also use it in SSIS this does not work while that does not recognizes the fields in the dynamic query.
What works is:
select
[GUID_Fund], [Date], [Close],
avg([Close]) over (order by [GUID_Fund], [Date] rows 7 preceding) as MA_Low
from fundrates
group by [GUID_Fund], [Date], [Close]
order by [GUID_Fund] asc, [Date] desc;
The number 7 needs to be a variable so I was trying to do something like this:
declare #var_MA_Low as int;
select distinct #var_MA_Low = [Value1]
from Variables
where [Name]='MA_Low';
select
[GUID_Fund], [Date], [Close],
avg([Close]) over (order by [GUID_Fund], [Date] rows #var_MA_Low preceding) as MA_Low
from fundrates
group by [GUID_Fund], [Date], [Close]
order by [GUID_Fund] asc, [Date] desc;
This results in a syntax error at #var_MA_Low just after 'rows'.
What works is the same statement as above, but than I cannot use it as source in SSIS:
declare #MA as nvarchar(max);
declare #var_MA_Low as nvarchar(max);
select distinct #var_MA_Low = [Value1] from Variables where [Name]='MA_Low';
set #MA = N'select [GUID_Fund], [Date], [Close], avg([Close])
over (order by [GUID_Fund], [Date] rows '+#var_MA_Low+' preceding) as MA_Low
from fundrates
group by [GUID_Fund], [Date], [Close] order by [GUID_Fund] asc, [Date] desc;'
execute sp_executesql #MA;
Has anybody an idea how to pass the number of rows as a variable into the second option?

what if you create a stored procedure with working query and use that SP as source?

I might try to improve this answer, but if you take your solution that works using the dynamic SQL and combine it with a temp table and the "insert into ... exec ... " syntax, https://stackoverflow.com/a/24073229/3591870 , and then return back to SSIS just the "select * from #holdertable", SSIS should be able to determine the columns being returned and generate your source. I don't really like the fact of you being required to use dynamic SQL to solve this however.
According to the docs, http://msdn.microsoft.com/en-us/library/ms189461(v=sql.120).aspx , it really does specify "unsigned integer literal", so I think dynamic SQL is going to be the only way.

Related

T-SQL - Pivot/Crosstab - variable number of values

I have a simple data set that looks like this:
Name Code
A A-One
A A-Two
B B-One
C C-One
C C-Two
C C-Three
I want to output it so it looks like this:
Name Code1 Code2 Code3 Code4 Code...n ...
A A-One A-Two
B B-One
C C-One C-Two C-Three
For each of the 'Name' values, there can be an undetermined number of 'Code' values.
I have been looking at various examples of Pivot SQL [including simple Pivot sql and sql using the XML function?] but I have not been able to figure this out - or to understand if it is even possible.
I would appreciate any help or pointers.
Thanks!
Try it like this:
DECLARE #tbl TABLE([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO #tbl VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
SELECT p.*
FROM
(
SELECT *
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM #tbl
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (Code1,Code2,Code3,Code4,Code5 /*add as many as you need*/)
)p;
This line
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
will use a partitioned ROW_NUMBER in order to create numbered column names per code. The rest is simple PIVOT...
UPDATE: A dynamic approach to reflect the max amount of codes per group
CREATE TABLE TblTest([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO TblTest VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
DECLARE #cols VARCHAR(MAX);
WITH GetMaxCount(mc) AS(SELECT TOP 1 COUNT([Code]) FROM TblTest GROUP BY [Name] ORDER BY COUNT([Code]) DESC)
SELECT #cols=STUFF(
(
SELECT CONCAT(',Code',Nmbr)
FROM
(SELECT TOP((SELECT mc FROM GetMaxCount)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) t(Nmbr)
FOR XML PATH('')
),1,1,'');
DECLARE #sql VARCHAR(MAX)=
'SELECT p.*
FROM
(
SELECT *
,CONCAT(''Code'',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM TblTest
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (' + #cols + ')
)p;';
EXEC(#sql);
GO
DROP TABLE TblTest;
As you can see, the only part which will change in order to reflect the actual amount of columns is the list in PIVOTs IN() clause.
You can create a string, which looks like Code1,Code2,Code3,...CodeN and build the statement dynamically. This can be triggered with EXEC().
I'd prefer the first approach. Dynamically created SQL is very mighty, but can be a pain in the neck too...

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1

Oracle convertion to sql for DECODE

Hi all I have query in oracle as follows
DECLARE in_variable Varchar;
Select Row_Number()
OVER
(
Order By
Decode(in_variable,'column_name ASC',t.column_name) Asc) b
From table t
Converted to sql server as follows
DECLARE #in_variable NVARCHAR(100)
SELECT ROW_NUMBER() OVER
(
ORDER BY
IIF ( #in_sort_by <> '', 'column_name ASC', t.column_name ) ASC )
FROM table t
Is it the correct one or am I doing wrong when I give the value for #in_variable I am getting conversion exception in sql so can some one help me
Rather than using either DECODE or IIF, you'd be better of using CASE. For SQL Server, this would be:
SELECT ROW_NUMBER() OVER
( ORDER BY
CASE WHEN #in_sort_by <> ''
THEN 'column_name ASC'
ELSE t.column_name END ASC )
FROM table t
If you're getting a type conversion error, that would imply that t.column_name is an int. SQL Server will try to convert the static string 'column_name ASC' to match the data type of the column it is being used in place of. To fix this, you can try using CAST to convert the column to VARCHAR:
SELECT ROW_NUMBER() OVER
( ORDER BY
CASE WHEN #in_sort_by <> ''
THEN 'column_name ASC'
ELSE CAST(t.column_name as varchar) END ASC )
FROM table t
However, I think you're probably pursuing the wrong solution here. It looks like you're trying to make the analytic function sort differently based on the variable provided. Providing the alternate column name and sort order as a string is not going to do that. You should probably look questions related to dynamic sorting for how to do this correctly.

SQL Server SUM() for DISTINCT records

I have a field called "Users", and I want to run SUM() on that field that returns the sum of all DISTINCT records. I thought that this would work:
SELECT SUM(DISTINCT table_name.users)
FROM table_name
But it's not selecting DISTINCT records, it's just running as if I had run SUM(table_name.users).
What would I have to do to add only the distinct records from this field?
Use count()
SELECT count(DISTINCT table_name.users)
FROM table_name
SQLFiddle demo
This code seems to indicate sum(distinct ) and sum() return different values.
with t as (
select 1 as a
union all
select '1'
union all
select '2'
union all
select '4'
)
select sum(distinct a) as DistinctSum, sum(a) as allSum, count(distinct a) as distinctCount, count(a) as allCount from t
Do you actually have non-distinct values?
select count(1), users
from table_name
group by users
having count(1) > 1
If not, the sums will be identical.
You can see for yourself that distinct works with the following example. Here I create a subquery with duplicate values, then I do a sum distinct on those values.
select DistinctSum=sum(distinct x), RegularSum=Sum(x)
from
(
select x=1
union All
select 1
union All
select 2
union All
select 2
) x
You can see that the distinct sum column returns 3 and the regular sum returns 6 in this example.
You can use a sub-query:
select sum(users)
from (select distinct users from table_name);
SUM(DISTINCTROW table_name.something)
It worked for me (innodb).
Description - "DISTINCTROW omits data based on entire duplicate records, not just duplicate fields." http://office.microsoft.com/en-001/access-help/all-distinct-distinctrow-top-predicates-HA001231351.aspx
;WITH cte
as
(
SELECT table_name.users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM table_name
)
SELECT SUM(users)
FROM cte
WHERE rn = 1
SQL Fiddle
Try here yourself
TEST
DECLARE #table_name Table (Users INT );
INSERT INTO #table_name Values (1),(1),(1),(3),(3),(5),(5);
;WITH cte
as
(
SELECT users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM #table_name
)
SELECT SUM(users) DisSum
FROM cte
WHERE rn = 1
Result
DisSum
9
If circumstances make it difficult to weave a "distinct" into the sum clause, it will usually be possible to add an extra "where" clause to the entire query - something like:
select sum(t.ColToSum)
from SomeTable t
where (select count(*) from SomeTable t1 where t1.ColToSum = t.ColToSum and t1.ID < t.ID) = 0
May be a duplicate to
Trying to sum distinct values SQL
As per Declan_K's answer:
Get the distinct list first...
SELECT SUM(SQ.COST)
FROM
(SELECT DISTINCT [Tracking #] as TRACK,[Ship Cost] as COST FROM YourTable) SQ

How to write a multi-parameter CTE script?

I am trying to write a TSQL script for an SSRS report that uses a CTE to select records based on the parameters chosen. I'm looking for the most efficient way to do this, either all in TSQL and/or SSRS. I have 4 parameters which can be set to NULL (All values) or one specific value. Then in my CTE, I have the following line:
ROW_NUMBER() over(partition by G.[program_providing_service],G.people_id
order by G.[actual_date] desc) as rowID
This above CTE is for the case when Program is NULL and People is not null. My 4 parameters are:
Program, Facility, Staff, and People.
So I only want to partition values when they are NULL. Currently I implement this by one CTE depending on the parameter values. For example, if they choose NULL for all parameters except People, then this CTE would look like:
ROW_NUMBER() over(partition by G.people_id
order by G.[actual_date] desc) as rowID
Or if all 5 parameters are null:
ROW_NUMBER() over(partition by G.[program_providing_service], G.[site_providing_service], G.staff_id, G.people_id
order by G.[actual_date] desc) as rowID
If they do not choose NULL for any of the 4 parameters, then I probably do not need to partition by any field since I just want the top 1 record ordered by actual_date descending. This is what my CTE looks like:
;with cte as
(
Select distinct
G.[actual_date],
G.[site_providing_service],
p.[program_name],
G.[staff_id],
G.program_providing_service,
ROW_NUMBER() over(partition by G.[program_providing_service],G.people_id
order by G.[actual_date] desc) as rowID
From
event_log_rv G With (NoLock)
WHERE
...
AND (#ClientID Is Null OR [people_id]=#ClientID)
AND (#StaffID Is Null OR [staff_id] = #StaffID)
AND (#FacilityID Is Null OR [site_providing_service] = #FacilityID)
AND (#ProgramID Is Null OR [program_providing_service] = #ProgramID)
and (#SupervisorID is NULL OR staff_id in (select staff_id from #supervisors))
)
SELECT
[actual_date],
[site_providing_service],
[program_name],
[staff_id],
program_providing_service,
people_id,
rowID
FROM cte WHERE rowid = 1
ORDER BY [Client_FullName]
where the ROW_NUMBER line varies depending on the parameters chosen. Currently I have 5 IF statements in this TSQL script that look like:
IF #ProgramID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
with one CTE in each of these IF statements:
IF #FacilityID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
IF #ProgramID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
IF #StaffID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
IF #ClientID IS NOT NULL
BEGIN
...
END
How can I code for all possible options, whether they choose NULL or else specific values?
OMG.... it took me long time to try to understand what you want to do. There is some contradiction in your description. Pleas revist your description. Like you said you only want to partition values when they are NULL; then you also said, when they choose NULL for all parameter except for people, then you partition on people....
No matter what way you want to achieve, partition on 'null' or 'not null', you can construct dynamic sql to achieve this, instead of adding a lot of [if...else]
Following code is pseudo, definitely not tested. Just give you a hint. The following code has one assumption, which is your parameters have priority in partition order, for example, if Program is not null (or null), Program is in the first location.
declare #sql varchar(max)
set #sql = '
;with cte as
(
Select distinct
G.[actual_date],
G.[site_providing_service],
p.[program_name],
G.[staff_id],
G.program_providing_service,
ROW_NUMBER() over(partition by
'
if(#progarm is null)
set #sql = #sql + 'G.[program_providing_service],'
if(#facility is null)
set #sql = #sql + 'G.[site_providing_service],'
if(#staff is null )
set #sql = #sql + 'G.staff_id,'
if(#people is null)
set #sql = #sql + 'G.people_id'
set #sql = #sql + '
order by G.[actual_date] desc) as rowID
From
event_log_rv G With (NoLock)
WHERE
...
AND (#ClientID Is Null OR [people_id]=#ClientID)
AND (#StaffID Is Null OR [staff_id] = #StaffID)
AND (#FacilityID Is Null OR [site_providing_service] = #FacilityID)
AND (#ProgramID Is Null OR [program_providing_service] = #ProgramID)
and (#SupervisorID is NULL OR staff_id in (select staff_id from #supervisors))
)
SELECT
[actual_date],
[site_providing_service],
[program_name],
[staff_id],
program_providing_service,
people_id,
rowID
FROM cte WHERE rowid = 1
ORDER BY [Client_FullName]
'
exec(#sql)