t-sql function like "filter" for sum(x) filter(condition) over (partition by - tsql

I'm trying to sum a window with a filter. I saw something similar to
sum(x) filter(condition) over (partition by...)
but it does not seem to work in t-sql, SQL Server 2017.
Essentially, I want to sum the last 5 rows that have a condition on another column.
I've tried
sum(case when condition...) over (partition...)
and sum(cast(nullif(x))) over (partition...).
I've tried left joining the table with a where condition to filter out the condition.
All of the above will add the last 5 from the starting point of the current row with the condition.
What I want is from the current row. Add the last 5 values above that meet a condition.
Date| Value | Condition | Result
1-1 10 1
1-2 11 1
1-3 12 1
1-4 13 1
1-5 14 0
1-6 15 1
1-7 16 0
1-8 17 0 sum(15+13+12+11+10)
1-9 18 1 sum(18+15+13+12+11)
1-10 19 1 sum(19+18+15+13+12)
In the above example the condition I would want would be 1, ignoring the 0 but still having the "window" size be 5 non-0 values.

This can easily be achieved using a correlated sub query:
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
[Date] Date,
[Value] int,
Condition bit
)
INSERT INTO #T ([Date], [Value], Condition) VALUES
('2019-01-01', 10, 1),
('2019-01-02', 11, 1),
('2019-01-03', 12, 1),
('2019-01-04', 13, 1),
('2019-01-05', 14, 0),
('2019-01-06', 15, 1),
('2019-01-07', 16, 0),
('2019-01-08', 17, 0),
('2019-01-09', 18, 1),
('2019-01-10', 19, 1)
The query:
SELECT [Date], [Value], Condition,
(
SELECT Sum([Value])
FROM
(
SELECT TOP 5 [Value]
FROM #T AS t1
WHERE Condition = 1
AND t1.[Date] <= t0.[Date]
-- If you want the sum to appear starting from a specific date, unremark the next row
--AND t0.[Date] > '2019-01-07'
ORDER BY [Date] DESC
) As t2
HAVING COUNT(*) = 5 -- there are at least 5 rows meeting the condition
) As Result
FROM #T As T0
Results:
Date Value Condition Result
2019-01-01 10 1
2019-01-02 11 1
2019-01-03 12 1
2019-01-04 13 1
2019-01-05 14 0
2019-01-06 15 1 61
2019-01-07 16 0 61
2019-01-08 17 0 61
2019-01-09 18 1 69
2019-01-10 19 1 77

Related

Getting ranking based on a number from CTE

I have a complex situation in PostgreSQL 11 where i need to generate a numbering based on a single figure which i get it from a CTE.
Below is the CTE
WITH pending_orders_to_be_processed_details
AS
(
SELECT ROW_NUMBER() OVER(ORDER BY so.create_date ) as queue_no
, name,so.create_date ::TIMESTAMP
FROM picking sp
LEFT JOIN order so ON so.name=sp.origin
WHERE sp.state IN('assigned','confirmed')
)
,orders_which_can_be_processed_today AS
(
-- This CTE will give me a count of orders
and its hourly average, Lets say count is 400 and hourly avg is 3
)
Now i need to number the details according to the hourly average, Means the first 3 orders need to be ranked as 1, next 3 to be ranked as 2 and so on, so that i can able to identify that these can be processed based on this ranking.
Input will be
name queu_number. create_date
so1 1 2021-03-11 12:00:00
so2 2 2021-03-11 13:00:00
so3 3 2021-03-11 14:00:00
so4 4 2021-03-11 15:00:00
so5 5 2021-03-11 16:00:00
so6 6 2021-03-11 17:00:00
so7 7 2021-03-11 18:00:00
so8 8 2021-03-11 19:00:00
so9 9 2021-03-11 20:00:00
The expected output will be
name rank
so1 1
so2 1
so3 1
so4 2
so5 2
so6 2
so7 3
so8 3
so9 3
Any help/suggestions.
Edit: I recently learned about a function, which fits well here:
demo:db<>fiddle
You can use the ntile() window function for that:
SELECT
*,
ntile(3) OVER (ORDER BY create_date)
FROM mytable
demo:db<>fiddle
Since you already created a cumulative row count, you can use this to create your expected rank:
SELECT
*,
floor((queue_no - 1) / 3) + 1 as rank
FROM my_cte
queue_no - 1 (so, 1 to 3 will be shifted to 0 to 2)
Diff by 3: so, 0 to 2 will be 0.x and 3 to 5 will be 1.x, ...
Now round these result to 0, 1, 2, ...
If you want to start with 1 instead of 0, add 1

How to get last value with condition in postgreSQL?

I have a table in postgres with three columns, one with a group, one with a date and the last with a value.
grp
mydate
value
A
2021-01-27
5
A
2021-01-23
10
A
2021-01-15
15
B
2021-01-26
7
B
2021-01-24
12
B
2021-01-15
17
I would like to create a view with a sequence of dates and the most recent value on table for each date according with group.
grp
mydate
value
A
2021-01-27
5
A
2021-01-26
10
A
2021-01-25
10
A
2021-01-24
10
A
2021-01-23
10
A
2021-01-22
15
A
2021-01-21
15
A
2021-01-20
15
A
2021-01-19
15
A
2021-01-18
15
A
2021-01-17
15
A
2021-01-16
15
A
2021-01-15
15
B
2021-01-27
7
B
2021-01-26
7
B
2021-01-25
12
B
2021-01-24
12
B
2021-01-23
17
B
2021-01-22
17
B
2021-01-21
17
B
2021-01-20
17
B
2021-01-19
17
B
2021-01-18
17
B
2021-01-17
17
B
2021-01-16
17
B
2021-01-15
17
SQL code to generate the table:
CREATE TABLE foo (
grp char(1),
mydate date,
value integer);
INSERT INTO foo VALUES
('A', '2021-01-27', 5),
('A', '2021-01-23', 10),
('A', '2021-01-15', 15),
('B', '2021-01-26', 7),
('B', '2021-01-24', 12),
('B', '2021-01-15', 17)
I have so far managed to generate a visualization with the sequence of dates joined with the distinct groups, but I am failing to get the most recent value.
SELECT DISTINCT(foo.grp), (date_trunc('day'::text, dd.dd))::date AS mydate
FROM foo, generate_series((( SELECT min(foo.mydate) AS min
FROM foo))::timestamp without time zone, (now())::timestamp without time zone, '1 day'::interval) dd(dd)
step-by-step demo:db<>fiddle
SELECT
grp,
gs::date as mydate,
value
FROM (
SELECT
*,
COALESCE( -- 2
lead(mydate) OVER (PARTITION BY grp ORDER BY mydate) - 1, -- 1
mydate
) as prev_date
FROM foo
) s,
generate_series(mydate, prev_date, interval '-1 day') as gs -- 3
ORDER BY grp, mydate DESC -- 4
lead() window function shifts the next value of an ordered group (= partition) into the current one. The group is already defined, the order is the date. This can be used to create the required date range. Since you don't want to have the last date twice (as end of the first range and beginning of the next one) the end date stops - 1 (one day before the next group starts)
This is for the very last records of the groups: They don't have a following record, so lead() yield NULL. To avoid this, COALESCE() sets them to the current record.
Now, you can create a date range with the current and the next date value using generate_series().
Finally you can generate the required order

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.
This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;

how to efficiently locate a value from one table among values from another table, with SQL

I have a problem in Postgresql which I find even difficult to describe in the title: I have two tables, containing each a range of values very similar but not identical. Suppose I have values like 0, 10, 20, 30, ... in one, and 1, 5, 6, 9, 10, 12, 19, 25, 26, ... in the second one (these are milliseconds). For each value of the second one I want to find the values immediately lower and higher in the first one. So, for the value 12 it would give me 10 and 20. I'm doing it like this :
SELECT s.*, MAX(v1."millisec") AS low_v, MIN(v2."millisec") AS high_v
FROM "signals" AS s, "tracks" AS v1, "tracks" AS v2
WHERE v1."millisec" <= s."d_time"
AND v2."millisec" > s."d_time"
GROUP BY s."d_time", s."field2"; -- this is just an example
And it works ... but it is very slow once I process several thousands of lines, even with indexes on s."d_time" and v.millisec. So, I think there must be a much better way to do it, but I fail to find one. Could anyone help me ?
Try:
select s.*,
(select millisec
from tracks t
where t.millisec <= s.d_time
order by t.millisec desc
limit 1
) as low_v,
(select millisec
from tracks t
where t.millisec > s.d_time
order by t.millisec asc
limit 1
) as high_v
from signals s;
Be sure you have an index for track.millisec;
If you had just created
the index, you'll need to analyze the table to take advantage of it.
Naive (trivial) way to find the preceding and next value.
-- the data (this could have been part of the original question)
CREATE TABLE table_one (id SERIAL NOT NULL PRIMARY KEY
, msec INTEGER NOT NULL -- index maight help
);
CREATE TABLE table_two (id SERIAL NOT NULL PRIMARY KEY
, msec INTEGER NOT NULL -- index maight help
);
INSERT INTO table_one(msec) VALUES (0), ( 10), ( 20), ( 30);
INSERT INTO table_two(msec) VALUES (1), ( 5), ( 6), ( 9), ( 10), ( 12), ( 19), ( 25), ( 26);
-- The query: find lower/higher values in table one
-- , but but with no values between "us" and "them".
--
SELECT this.msec AS this
, prev.msec AS prev
, next.msec AS next
FROM table_two this
LEFT JOIN table_one prev ON prev.msec < this.msec AND NOT EXISTS (SELECT 1 FROM table_one nx WHERE nx.msec < this.msec AND nx.msec > prev.msec)
LEFT JOIN table_one next ON next.msec > this.msec AND NOT EXISTS (SELECT 1 FROM table_one nx WHERE nx.msec > this.msec AND nx.msec < next.msec)
;
Result:
CREATE TABLE
CREATE TABLE
INSERT 0 4
INSERT 0 9
this | prev | next
------+------+------
1 | 0 | 10
5 | 0 | 10
6 | 0 | 10
9 | 0 | 10
10 | 0 | 20
12 | 10 | 20
19 | 10 | 20
25 | 20 | 30
26 | 20 | 30
(9 rows)
try this :
select * from signals s,
(select millisec low_value,
lead(millisec) over (order by millisec) high_value from tracks) intervals
where s.d_time between low_value and high_value-1
For this type of problem "Window functions" are ideal see : http://www.postgresql.org/docs/9.1/static/tutorial-window.html

How do I add totals/subtotals to a set of results without grouping the row data?

I'm constructing a SQL query for a business report. I need to have both subtotals (grouped by file number) and grand totals on the report.
I'm entering unknown SQL territory, so this is a bit of a first attempt. The query I made is almost working. The only problem is that the entries are being grouped -- I need them separated in the report.
Here is my sample data:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3B Mar 28/10 1 3
3B Mar 28/10 5 10
When I run this query
SELECT
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM SubtotalTesting
GROUP BY FileNumber, Date WITH ROLLUP
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
Date
I get the following:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 6 13 <--
3B NULL 6 13
NULL NULL 17 38
What I want is:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 1 3 <--
3B Mar 28/10 5 10 <--
3B NULL 6 13
NULL NULL 17 38
I can clearly see why the entries are being grouped, but I have no idea how to separate them while still returning the subtotals and grand total.
I'm a bit green when it comes to doing advanced SQL queries like this, so if I'm taking the wrong approach to the problem by using WITH ROLLUP, please suggest some preferred alternatives -- you don't have to write the whole query for me, I just need some direction. Thanks!
WITH SubtotalTesting (FileNumber, Date, Cost, Charge) AS
(
SELECT '3', CAST('2009-22-12' AS DATETIME), 5, 10
UNION ALL
SELECT '3', '2010-13-06', 6, 15
UNION ALL
SELECT '3B', '2010-28-03', 1, 3
UNION ALL
SELECT '3B', '2010-28-03', 5, 10
),
q AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY filenumber) AS rn
FROM SubTotalTesting
)
SELECT rn,
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM q
GROUP BY
FileNumber, Date, rn WITH ROLLUP
HAVING GROUPING(rn) <= GROUPING(Date)
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END),
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END),
Date