T-SQL A problem with SELECT TOP (case [...]) - tsql

I have query like that:
(as You see I'd like to retrieve 50% of total rows or first 100 rows etc)
//#AllRowsSelectType is INT
SELECT TOP (
case #AllRowsSelectType
when 1 then 100 PERCENT
when 2 then 50 PERCENT
when 3 then 25 PERCENT
when 4 then 33 PERCENT
when 5 then 50
when 6 then 100
when 7 then 200
end
) ROW_NUMBER() OVER(ORDER BY [id]) AS row_num, a,b,c etc
why have I the error : "Incorrect syntax near the keyword 'PERCENT'." on line "when 1 [...]"

The syntax for TOP is:
TOP (expression) [PERCENT]
[ WITH TIES ]
The reserved keyword PERCENT cannot be included in the expression. Instead you can run two different queries: one for when you want PERCENT and another for when you don't.
If you need this to be one query you can run both queries and use UNION ALL to combine the results:
SELECT TOP (
CASE #AllRowsSelectType
WHEN 1 THEN 100
WHEN 2 THEN 50
WHEN 3 THEN 25
WHEN 4 THEN 33
ELSE 0
END) PERCENT
ROW_NUMBER() OVER(ORDER BY [id]) AS row_num, a, b, c, ...
UNION ALL
SELECT TOP (
CASE #AllRowsSelectType
WHEN 5 THEN 50
WHEN 6 THEN 100
WHEN 7 THEN 200
ELSE 0
END)
ROW_NUMBER() OVER(ORDER BY [id]) AS row_num, a, b, c, ...

You're also mixing two different types of use. The other is.
DECLARE #ROW_LIMT int
IF #AllRowsSelectType < 5
SELECT #ROW_LIMIT = COUNT(*)/#AllRowsSelectType FROM myTable -- 100%, 50%, 33%, 25%
ELSE
SELECT #ROW_LIMIT = 50 * POWER(2, #AllRowsSelectType - 5) -- 50, 100, 200...
WITH OrderedMyTable
(
select *, ROW_NUMBER() OVER (ORDER BY id) as rowNum
FROM myTable
)
SELECT * FROM OrderedMyTable
WHERE rowNum <= #ROW_LIMIT

You could do:
select top (CASE #FilterType WHEN 2 THEN 50 WHEN 3 THEN 25 WHEN 4 THEN 33 ELSE 100 END) percent * from
(select top (CASE #FilterType WHEN 5 THEN 50 WHEN 6 THEN 100 WHEN 7 THEN 200 ELSE 2147483647 END) * from
<your query here>
) t
) t
Which may be easier to read.

Related

Taking N-samples from each group in PostgreSQL

I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.

PostgreSQL cummulative sum max condition

I have the following table:
account
size id name
100 1 John
200 2 Mary
300 3 Jane
400 4 Anne
100 5 Mike
600 6 Joanne
I want to partition the rows in groups, where the sum of size <= 600.
Expected result:
account
group size id name
1 100 1 John
1 200 2 Mary
1 300 3 Jane
2 400 4 Anne
2 100 5 Mike
3 600 6 Joanne
I don't know how to do the partition and add the condition.
I cannot think of how to do this without recursion. I left the running_total in the result to make it easier to follow:
with recursive rns as ( -- Assign row numbers as rn in case of gaps in id
select *, row_number() over (order by id) as rn
from account
), sumgrp as ( -- Start with first row
select size, id, name, rn, 1 as grp, size as running_total
from rns
where rn = 1
union all
select n.size, n.id, n.name, n.rn,
case -- Increment the grp when running_total exceeds 600
when p.running_total + n.size > 600 then p.grp + 1
else p.grp
end as grp,
case -- Reset the running_total when it exceeds 600
when p.running_total + n.size > 600 then n.size
else p.running_total + n.size
end as running_total
from sumgrp p
join rns n on n.rn = p.rn + 1
)
select grp, size, id, name, running_total
from sumgrp
order by id;
db<>fiddle here

How to enumerate rows by division?

I have the following table
id num sub_id
1 3 1
1 5 2
1 1 1
1 4 2
2 1 5
2 2 5
I want to get this result
id num sub_id number
1 3 1 1
1 5 2 2
1 1 1 1
1 4 2 2
2 1 5 1
2 2 5 1
I tried to do this row_number() over (partition by id order by num,sub_id DESC) but th result is obviosly differs
I don't understand your business because you don't explain your logic and information about that, but maybe this query helps you?
Result and info: dbfiddle
with recursive
cte_r as (
select id,
num,
sub_id,
row_number() over () as rn
from test),
cte as (
select id,
num,
sub_id,
rn,
rn as grp
from cte_r
where rn = 1
union all
select cr.id,
cr.num,
cr.sub_id,
cr.rn,
case
when cr.id != c.id then 1
when cr.id = c.id and cr.sub_id = c.sub_id then c.grp
when cr.id = c.id and cr.sub_id > c.sub_id then c.grp + 1
when cr.id = c.id and cr.sub_id < c.sub_id then 1
end
from cte c,
cte_r cr
where c.rn = cr.rn - 1)
select id,
num,
sub_id,
grp
from cte
order by id
It looks like you actually want to ignore the num column and then use DENSE_RANK on sub_id:
SELECT *, dense_rank() AS number OVER (PARTITION BY id ORDER BY sub_id) FROM …;

Moving grouped MEDIAN / Get the MEDIAN of specific months from the past IN T-SQL

Let's say I have a table:
DATE
ID
VALUE
01.2010
1
100
02.2010
1
200
...
...
...
12.2010
1
300
01.2011
1
150
02.2011
1
250
...
...
...
12.2011
1
350
01.2012
1
200
02.2012
1
300
...
...
...
12.2012
1
400
I want to get a median of VALUE grouped by months i.e. get something like
DATE
ID
VALUE
MEDIAN
01.2010
1
100
100
02.2010
1
200
200
...
...
...
...
12.2010
1
300
300
01.2011
1
150
125 = (100+150)/2
02.2011
1
250
225 = (200+250)/2
...
...
...
...
12.2011
1
350
325 = (300+350)/2
01.2012
1
200
150
02.2012
1
300
250
...
...
...
...
12.2012
1
400
350
I have more ID in table so I would like to get this result for every ID.
I have tried doing
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY VALUE) OVER (PARTITION BY Id, MONTH(Date) ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
but I get "The function 'PERCENTILE_CONT' may not have a window frame.
I've also tried the following (but also without any results):
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY VALUE)
OVER (PARTITION BY Id, MONTH(Date))
FROM tab1 LEFT JOIN tab2
ON tab1.key = tab2.key
WHERE tab1.Date BETWEEN Min(Date) AND tab2.Date
EDIT
So far I have resolved it with
SELECT (CASE WHEN Date =2010 THEN PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN Date = 2010 THEN VALUE ELSE NULL) OVER (PARTITION BY Id, MONTH(Date)) ELSE 0 END) +
(CASE WHEN Date =2011 THEN PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN Date <= 2011 THEN VALUE ELSE NULL) OVER (PARTITION BY Id, MONTH(Date)) ELSE 0 END) +
(CASE WHEN Date =2012 THEN PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN Date <= 2012 THEN VALUE ELSE NULL) OVER (PARTITION BY Id, MONTH(Date)) ELSE 0 END)
FROM tab1
But to be honest, I would like to have an resolution without assumption of a priori knowledge of dates. I've thought about WHILE LOOP and updating column while #MinYear <= #MaxYear where in every iteration #MinYear = #MinYear+1 but in this case I would have to create temporary tables which I'm trying to avoid.
My idea is to use (Value1+value2)/2 as median as your requirement is little complicated.
CREATE TABLE MedianData
(
[Date] VARCHAR(100)
,ID INT
,[Value] INT
)
INSERT INTO MedianData VALUES ('01.2010', 1, 100)
,('02.2010', 1, 200)
,('12.2010', 1, 300)
,('01.2011', 1, 150)
,('02.2011', 1, 250)
,('12.2011', 1, 350)
,('01.2012', 1, 200)
,('02.2012', 1, 300)
,('12.2012', 1, 400)
SELECT *
,ROW_NUMBER() OVER ( PARTITION BY Substring([Date],1,2 ) ORDER BY [Date] ) AS [row]
,Substring([Date],1,2 ) as [MONTH]
INTO #Temp_tbl2
FROM MedianData
SELECT
A.Date
,A.ID
,A.[Value]
--Logic is applied here. I used (Value1+value2)/2 as median
,CASE WHEN A.[row] = 3 THEN ( A.[Value] + ( SELECT T.[Value] FROM #Temp_tbl2
T where T.[MONTH] = Substring(A.[Date],1,2 ) AND T.[row] = 1 ) )/2
WHEN A.[row] != 1 THEN (A.total/2)
ELSE A.total END as [Median]
INTO #Temp_table
FROM
(
SELECT *
,ROW_NUMBER() OVER ( PARTITION BY Substring([Date],1,2 ) ORDER BY [Date] ) AS [row]
,SUM ([Value] ) OVER ( PARTITION BY Substring([Date],1,2 ) ORDER BY [Date] ) AS [total]
FROM MedianData
) AS A
--to make the table data order
SELECT MedianData.*, #Temp_table.Median
FROM MedianData
INNER JOIN #Temp_table
ON MedianData.[Date] = #Temp_table.[Date]
drop table #Temp_table
drop table #Temp_tbl2

Postgres - Update running count whenever row meets a certain condition

I have a table with the following entries in them
id price quantity
1. 10 75
2. 10 75
3. 10 -150
4. 10 75
5. 10 -75
What I need to do is to update each row with a number that is the number of times the running total has been 0. In the above example, the cumulative totals would be
id. cum_total
1. 750
2. 1500
3. 0
4. 750
5. 0
Desired result
id price quantity seq
1. 10 75 1
2. 10 75 1
3. 10 -150 1
4. 10 75 2
5. 10 -75 2
I'm now lost in a spiral of CTEs and window functions and figured I'd ask the experts.
Thanks in advance :-)
Here is one option using analytic functions:
WITH cte AS (
SELECT *, CASE WHEN SUM(price*quantity) OVER (ORDER BY id) = 0 THEN 1 ELSE 0 END AS price_sum
FROM yourTable
),
cte2 AS (
SELECT *, LAG(price_sum, 1, 0) OVER (ORDER BY id) price_sum_lag
FROM cte
)
SELECT id, price, quantity, 1 + SUM(price_sum_lag) OVER (ORDER BY id) cumulative_total
FROM cte2
ORDER BY id;
Demo
You may try running each CTE in succession to see how the logic is working.
With window functions:
SELECT id, price, quantity,
coalesce(
sum(CASE WHEN iszero THEN 1 ELSE 0 END)
OVER (ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
) + 1 AS batch
FROM (SELECT id, price, quantity,
sum(price * quantity) OVER (ORDER BY id) = 0 AS iszero
FROM mytable) AS subq;