T-SQL: Aggregate based on condition

T-SQL: Aggregate based on condition - tsql

there is a table:
start | end | zone
------|-----|-----
1 | 5 | 3
3 | 6 | 2
1 | 3 | 1
4 | 7 | 4
Start and end represent a range. I'd like to get a zone for value of 4, like
declare #value as int = 4
select zone from table where #value >= start and #value <= end
, and I obtain
3
2
4
, but I need only one zone from predefined range, for example 2 or 3 and 3 has priority, so if 2 and 3 are present, I'd like to get 3. So, zone 3 is desired result in my example.
How can I solve it?

If it's hard coded you can do something like this:
DECLARE #Value int = 4;
SELECT TOP 1 zone
FROM table
WHERE zone IN (3,2)
AND #Value >= start
AND #Value <= [end]
ORDER BY CASE WHEN zone = 3 THEN 0 ELSE 1 END
See a live demo on rextester
Update
SELECT TOP 1 CASE WHEN zone IN (3,2) THEN zone ELSE NULL END AS zone
FROM #T
WHERE #Value >= start
AND #Value <= [end]
ORDER BY CASE WHEN zone = 3 THEN 0 ELSE 1 END
Live demo for updated version
Update 2
Even though you wrote in your comment that you found a solution thanks to my post, I still want to post a solution here for future readers that might need it:
;WITH CTE AS
(
SELECT TOP 1 zone
FROM #T
WHERE zone IN (3,2)
AND #Value >= start
AND #Value <= [end]
ORDER BY CASE WHEN zone = 3 THEN 0 ELSE 1 END
)
SELECT zone
FROM CTE
UNION ALL
SELECT NULL
WHERE NOT EXISTS(
SELECT 1
FROM CTE
)

Related

Select dates missing data in a range

I have a postgres table test_table that looks like this:
date | test_hour
------------+-----------
2000-01-01 | 1
2000-01-01 | 2
2000-01-01 | 3
2000-01-02 | 1
2000-01-02 | 2
2000-01-02 | 3
2000-01-02 | 4
2000-01-03 | 1
2000-01-03 | 2
I need to select all the dates which don't have test_hour = 1, 2, and 3, so it should return
date
------------
2000-01-03
Here is what I have tried:
SELECT date FROM test_table WHERE test_hour NOT IN (SELECT generate_series(1,3));
But that only returns dates that have extra hours beyond 1, 2, 3

You can use aggregation and conditional HAVING clauses, like so:
SELECT mydate
FROM mytable
GROUP BY mydate
HAVING
MAX(CASE WHEN test_hour = 1 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 2 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 3 THEN 1 END) != 1

Another possibility would be to join it against the series (or another subquery containing the hours) and do a [distinct] count on the hours aggregatet per date:
select date from tst
inner join (select generate_series(1,3) "hour") hours on hours.hour = tst.hour
group by tst.date
having count(distinct tst.hour) < 3;
or
select date from tst
where hour in (select generate_series(1,3))
group by date
having count(distinct tst.hour) < 3;
[You don't need the distinct if date/hour combinations in Your table are unique]

A solution using set difference, giving you exactly the rows that are missing:
(SELECT DISTINCT
date, all_hour
FROM test_table
CROSS JOIN generate_series(1,3) all_hour)
EXCEPT
(TABLE test_table)
And a solution using an array aggregate and the array contains operator:
SELECT date
FROM test_table
GROUP BY date
HAVING NOT array_agg(test_hour) #> ARRAY(SELECT generate_series(1,3))
(online demos)

Defining a custom week using days of the month or year in postgresql

I am running a simple query to get weekly revenue from our sales
SELECT date_trunc('week', payment_date) AS week, sum(payment_amount)
FROM payment
WHERE payment_date BETWEEN '2010-jan-1' AND '2016-dec-31'
GROUP BY week
Now I need my week start and end date to be static for every year. All 52 weeks of the year need to be accounted for e.g.
Week 1: Jan 1-7
Week 2: Jan8-14
Week 3: Jan15-21
Week 4: Jan22-28
Week 5: Jan29-Feb4 and so forth
I did some investigation and figured out that I need a user defined function using the payment_date as argument and returning a week value. I can then call this function in the SQL query above, in place of the date_trunc() function.
How can I use an incremental loop to assign a week value to the payment_date?
Can I also use this return value in group by clause in the SQL query?
Some explanation with detailed examples will be highly appreciated since I have basic to intermediate knowledge of SQL.
---------------Edit--------------
I'm trying to use 2 functions now to take into account the leap year, where I would still want March 4th to be included in the 9th week. Ive tried to use the function by &klin and convert it to SQL, I keep getting "syntax error at or near 'int' on line 9. My code is below.
create or replace function is_leap_year(int)
returns boolean language sql as $$
select $1 % 4 = 0 and ($1 % 100 <> 0 or $1 % 400 = 0)
$$;
create or replace function week_no(timestamp)
returns int language sql as $body$
declare
y int;
day_shift int;
begin
y = extract(year from $1);
day_shift = 1 + (is_leap_year(y) and $1 > make_date(y, 2, 28))::int;
return ((extract(doy from $1)::int)- day_shift) / 7+ 1;
end
$body$;
SELECT week_no(payment_date) as week_number, sum(payment_amount)
from payment p join payment_event pe on p.payment_event_id =
pe.payment_event_id
where payment_date between '2016-jan-1' and '2017-jan-1'
and pe.payment_event_type_id != 2
group by week_number
order by week_number

First, there are problems with your requirements.
Now I need my week start and end date to be static for every year.
They can't be. Leap years happen. February 29 will either shift start and end dates one year out of every four, or you'll need to allow one week to have eight days.
All 52 weeks of the year need to be customized for . . .
I think you mean that all 52 weeks need to be accounted for. But 52 * 7 = 364. You're missing a day.
I think the simplest expression that calculates a week number from a date is (extract(doy from payment_date)::integer / 7) as week. I don't know whether it's worth putting that into a function. Instead, I might start with creating a view that uses that expression.
But a calculation won't do anything special about February 29, or about the fact that every year has more than 52 * 7 days.
I really think your best bet here is to build a table instead of using a calculation.
create table weeks (
calendar_date date primary key,
week_num integer not null
check (week_num between 1 and 53)
);
Populate it with this dates for 2016 and 2017, and with calculated weeks, to give us a starting point. (2016 was a leap year.)
insert into weeks
select
('2016-01-01'::date + (n || ' days')::interval)::date as calendar_date
, extract(doy from ('2016-01-01'::date + (n || ' days')::interval)::date)::integer / 7 + 1 as calencar_week
from generate_series (0, 730) n;
Let's look at week 9.
select *
from weeks
where week_num = 9
order by calendar_date;
calendar_date week_num
--
2016-02-25 9
2016-02-26 9
2016-02-27 9
2016-02-28 9
2016-02-29 9
2016-03-01 9
2016-03-02 9
2017-02-25 9
2017-02-26 9
2017-02-27 9
2017-02-28 9
2017-03-01 9
2017-03-02 9
2017-03-03 9
In 2016, the calculated week 9 ran from 2016-02-25 to 2016-03-02. In 2017, it ran from 2016-02-25 to 2017-03-03. But now that all these week numbers are in a table, you can adjust them any way you like. You can even change the adjustments from year to year if it makes sense to do that.

Use doy (the day of the year) in the way like this:
create or replace function week_no(date)
returns int language sql as $$
select ((extract(doy from $1)::int)- 1) / 7+ 1
$$;
with the_table(a_date) as (
values
('2017-01-01'::date),
('2017-01-07'),
('2017-01-08'),
('2017-01-14'),
('2017-01-15'),
('2017-01-22')
)
select extract(doy from a_date)::int as doy, week_no(a_date)
from the_table;
doy | week_no
-----+---------
1 | 1
7 | 1
8 | 2
14 | 2
15 | 3
22 | 4
(6 rows)
If you want to correct the week number so that March 4th is always in 9th week (even in a leap year), use this handy function:
create or replace function is_leap_year(int)
returns boolean language sql as $$
select $1 % 4 = 0 and ($1 % 100 <> 0 or $1 % 400 = 0)
$$;
Your function may look like this (I've used the plpgsql language for better readability though this also can be coded as an sql function):
create or replace function week_no_corrected(date)
returns int language plpgsql as $$
declare
y int = extract (year from $1);
day_shift int = 1 + (is_leap_year(y) and $1 > make_date(y, 2, 28))::int;
begin
return ((extract(doy from $1)::int)- day_shift) / 7+ 1;
end;
$$;
with the_table(a_date) as (
values
('2016-03-03'::date),
('2016-03-04'),
('2016-03-05'),
('2017-03-03'),
('2017-03-04'),
('2017-03-05')
)
select a_date, week_no(a_date), week_no_corrected(a_date)
from the_table;
a_date | week_no | week_no_corrected
------------+---------+-------------------
2016-03-03 | 9 | 9
2016-03-04 | 10 | 9
2016-03-05 | 10 | 10
2017-03-03 | 9 | 9
2017-03-04 | 9 | 9
2017-03-05 | 10 | 10
(6 rows)
In an SQL function you cannot use variables, assignments may be replaced by derived tables:
create or replace function week_no_corrected(date)
returns int language sql as $$
select ((extract(doy from $1)::int)- day_shift) / 7 + 1
from (
select 1 + (is_leap_year(y) and $1 > make_date(y, 2, 28))::int as day_shift
from (
select extract (year from $1)::int as y
) s
) s
$$;

By breaking your problem down to month-day strings it will allow you to use the same logic across multiple years
mysql> SELECT "01-07" < "01-08";
+-------------------+
| "01-07" < "01-08" |
+-------------------+
| 1 |
+-------------------+
1 row in set (0.08 sec)
A simple date format of %m-%d works for comparing the payment dates to the week buckets you want to assign.
To manually assign all 52 week ranges, you can use a case statement:
SET #md_format="%m-%d";
SELECT
CASE
WHEN (date_format(`input_date`, #md_format) < "01-08") THEN 1
WHEN (date_format(`input_date`, #md_format) < "01-15") THEN 2
WHEN (date_format(`input_date`, #md_format) < "01-22") THEN 3
-- ... All other cases
ELSE 52
END;
See the docs for the syntax to define a function
Functions will allow you to do operations like:
SELECT week_bucket(payment_date) `week`, SUM(revenue) `revenue`
FROM my_table
WHERE week_bucket(payment_date) > 13
AND week_bucket(payment_date) < 15
GROUP BY `week`;

Coalesce overlapping time ranges in PostgreSQL

I have a PostgreSQL (9.4) table that contains time stamp ranges and user IDs, and I need to collapse any overlapping ranges (with the same user ID) into a single record.
I've tried a complicated set of CTEs to accomplish this, but there are some edge cases in our (40,000+ rows) real table that complicate matters. I've come to the conclusion that I probably need a recursive CTE, but I haven't had any luck writing it.
Here's some code to create a test table and populate it with data. This isn't the exact layout of our table, but it's close enough for an example.
CREATE TABLE public.test
(
id serial,
sessionrange tstzrange,
fk_user_id integer
);
insert into test (sessionrange, fk_user_id)
values
('[2016-01-14 11:57:01-05,2016-01-14 12:06:59-05]', 1)
,('[2016-01-14 12:06:53-05,2016-01-14 12:17:28-05]', 1)
,('[2016-01-14 12:17:24-05,2016-01-14 12:21:56-05]', 1)
,('[2016-01-14 18:18:00-05,2016-01-14 18:42:09-05]', 2)
,('[2016-01-14 18:18:08-05,2016-01-14 18:18:15-05]', 1)
,('[2016-01-14 18:38:12-05,2016-01-14 18:48:20-05]', 1)
,('[2016-01-14 18:18:16-05,2016-01-14 18:18:26-05]', 1)
,('[2016-01-14 18:18:24-05,2016-01-14 18:18:31-05]', 1)
,('[2016-01-14 18:18:12-05,2016-01-14 18:18:20-05]', 3)
,('[2016-01-14 19:32:12-05,2016-01-14 23:18:20-05]', 3)
,('[2016-01-14 18:18:16-05,2016-01-14 18:18:26-05]', 4)
,('[2016-01-14 18:18:24-05,2016-01-14 18:18:31-05]', 2);
I have found that I can do this to get the sessions sorted by the time they started:
select * from test order by fk_user_id, sessionrange
I could use this to determine whether an individual record overlaps with the previous, using window functions:
SELECT *, sessionrange && lag(sessionrange) OVER (PARTITION BY fk_user_id ORDER BY sessionrange)
FROM test
ORDER BY fk_user_id, sessionrange
But this only detects whether the single previous record overlaps the current one (see the record where id = 6). I need to detect all the way back to the beginning of the partition.
After that, I'd need to group any records that overlap together, to find the beginning of the earliest session and the end of the last session to terminate.
I'm sure there's a way to do this that I'm overlooking. How can I collapse these overlapping records?

It is relatively easy to merge overlapping ranges as elements of an array. For simplicity the following function returns set of tstzrange:
create or replace function merge_ranges(tstzrange[])
returns setof tstzrange language plpgsql as $$
declare
t tstzrange;
r tstzrange;
begin
foreach t in array $1 loop
if r && t then r:= r + t;
else
if r notnull then return next r;
end if;
r:= t;
end if;
end loop;
if r notnull then return next r;
end if;
end $$;
Just aggregate the ranges for a user and use the function:
select fk_user_id, merge_ranges(array_agg(sessionrange))
from test
group by 1
order by 1, 2
fk_user_id | merge_ranges
------------+-----------------------------------------------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"]
1 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"]
1 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"]
1 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"]
2 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"]
3 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"]
3 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"]
4 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"]
(8 rows)
Alternatively, the algorithm can be applied to the entire table in one function loop. I'm not sure but for a large dataset this method should be faster.
create or replace function merge_ranges_in_test()
returns setof test language plpgsql as $$
declare
curr test;
prev test;
begin
for curr in
select *
from test
order by fk_user_id, sessionrange
loop
if prev notnull and prev.fk_user_id <> curr.fk_user_id then
return next prev;
prev:= null;
end if;
if prev.sessionrange && curr.sessionrange then
prev.sessionrange:= prev.sessionrange + curr.sessionrange;
else
if prev notnull then
return next prev;
end if;
prev:= curr;
end if;
end loop;
return next prev;
end $$;
Results:
select *
from merge_ranges_in_test();
id | sessionrange | fk_user_id
----+-----------------------------------------------------+------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"] | 1
5 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"] | 1
7 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"] | 1
6 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"] | 1
4 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"] | 2
9 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"] | 3
10 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"] | 3
11 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"] | 4
(8 rows)
The problem is very interesting. I've tried to find a recursive solution but it seems the procedural attempt is most natural and efficient.
I have finally found a recursive solution. The query deletes overlapping rows and inserts their compacted equivalent:
with recursive cte (user_id, ids, range) as (
select t1.fk_user_id, array[t1.id, t2.id], t1.sessionrange + t2.sessionrange
from test t1
join test t2
on t1.fk_user_id = t2.fk_user_id
and t1.id < t2.id
and t1.sessionrange && t2.sessionrange
union all
select user_id, ids || t.id, range + sessionrange
from cte
join test t
on user_id = t.fk_user_id
and ids[cardinality(ids)] < t.id
and range && t.sessionrange
),
list as (
select distinct on(id) id, range, user_id
from cte, unnest(ids) id
order by id, upper(range)- lower(range) desc
),
deleted as (
delete from test
where id in (select id from list)
)
insert into test
select distinct on (range) id, range, user_id
from list
order by range, id;
Results:
select *
from test
order by 3, 2;
id | sessionrange | fk_user_id
----+-----------------------------------------------------+------------
1 | ["2016-01-14 17:57:01+01","2016-01-14 18:21:56+01"] | 1
5 | ["2016-01-15 00:18:08+01","2016-01-15 00:18:15+01"] | 1
7 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:31+01"] | 1
6 | ["2016-01-15 00:38:12+01","2016-01-15 00:48:20+01"] | 1
4 | ["2016-01-15 00:18:00+01","2016-01-15 00:42:09+01"] | 2
9 | ["2016-01-15 00:18:12+01","2016-01-15 00:18:20+01"] | 3
10 | ["2016-01-15 01:32:12+01","2016-01-15 05:18:20+01"] | 3
11 | ["2016-01-15 00:18:16+01","2016-01-15 00:18:26+01"] | 4
(8 rows)

Postgresql: Select value difference between values in Integer column

My question is simple. Say I have the following column:
order_in_group
integer
------
1
2
3
5
6
9
I would like the query result to be the difference between the current and next values which is bigger then 1:
value1 value2 difference
integer integer integer
------- ------- -------
3 5 2
6 9 3
Any help will be great.

Try this:
with q(i) as (
select unnest(array[1,2,3,5,6,9])
)
select prev, curr, curr- prev diff
from (
select i curr, lag(i) over (order by i) prev
from q
) s
where curr > prev+ 1;
prev | curr | diff
------+------+------
3 | 5 | 2
6 | 9 | 3
(2 rows)

You should be able to just use LAG to get the previous row to compare with;
WITH cte AS (
SELECT order_in_group value2,
LAG(order_in_group) OVER (ORDER BY order_in_group) value1
FROM mytable
)
SELECT value1, value2, value2-value1 difference
FROM cte
WHERE value2-value1 > 1;

Find total number in a specific period of time SQL

I am trying to find the total number of members in a given period. Say I have the following data:
member_id start_date end_date
1 9/1/2013 12/31/2013
2 10/1/2013 11/12/2013
3 12/1/2013 12/31/2013
4 5/1/2012 8/5/2013
5 9/1/2013 12/31/2013
6 7/1/2013 12/31/2013
7 6/6/2012 12/5/2013
8 10/1/2013 12/31/2013
9 7/8/2013 12/31/2013
10 1/1/2012 11/5/2013
In SQL I need to create a report that will list out the number of members in each month of the year. In this case something like the following:
Date Members Per Month
Jan-12 1
Feb-12 1
Mar-12 1
Apr-12 1
May-12 2
Jun-12 3
Jul-12 3
Aug-12 3
Sep-12 3
Oct-12 3
Nov-12 3
Dec-12 3
Jan-13 3
Feb-13 3
Mar-13 3
Apr-13 3
May-13 3
Jun-13 3
Jul-13 5
Aug-13 4
Sep-13 6
Oct-13 8
Nov-13 6
Dec-13 6
So there is only 1 member from Jan-12 (member id 10) until May-12 when member id 4 joins making the count 2 and so on.
The date range can be all over so I can't specify the specific dates but it is by month, meaning that even if someone ends 12-1 it is considered active for the month for Dec.

I was able to create the following stored procedure that was able to accomplish what I needed:
USE [ValueBasedSandbox]
GO
/****** Object: StoredProcedure [dbo].[sp_member_count_per_month] Script Date: 01/08/2015 12:02:37 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Create date: 2015-08-01
-- Description: Find the counts per a given date passed in
-- =============================================
CREATE PROCEDURE [dbo].[sp_member_count_per_month]
-- Add the parameters for the stored procedure here
#YEAR int
, #ENDYEAR int
AS
DECLARE #FIRSTDAYMONTH DATETIME
DECLARE #LASTDAYMONTH DATETIME
DECLARE #MONTH INT = 1;
--Drop the temporary holding table if exists
IF OBJECT_ID('tempdb.dbo.##TEMPCOUNTERTABLE', 'U') IS NOT NULL
DROP TABLE dbo.##TEMPCOUNTERTABLE
CREATE TABLE dbo.##TEMPCOUNTERTABLE (
counter INT
, start_date DATETIME2
, end_date DATETIME2
)
--Perform this loop for each year desired
WHILE #YEAR <= #ENDYEAR
BEGIN
--Perform for each month of the year
WHILE (#MONTH <= 12)
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SET #FIRSTDAYMONTH = DATEADD(MONTH, #MONTH - 1, DATEADD(YEAR, #YEAR-1900, 0))
SET #LASTDAYMONTH = DATEADD(MONTH, #MONTH, DATEADD(YEAR, #YEAR-1900, 0)-1)
INSERT INTO dbo.##TEMPCOUNTERTABLE(counter, start_date, end_date)
SELECT COUNT(*) AS counter
, #FIRSTDAYMONTH AS start_date
, #LASTDAYMONTH AS end_date
FROM dbo.member_table
WHERE start_date <= #LASTDAYMONTH
AND end_date >= #FIRSTDAYMONTH
--Increment through all the months of the year
SET #MONTH = #MONTH + 1
END -- End Monthly Loop
--Reset Month counter
SET #MONTH = 1
--Increment the desired years
SET #YEAR = #YEAR + 1
END -- End Yearly Loop
--Display the results
SELECT *
FROM dbo.##TEMPCOUNTERTABLE
-- Drop the temp table
IF OBJECT_ID('tempdb.dbo.##TEMPCOUNTERTABLE', 'U') IS NOT NULL
DROP TABLE dbo.##TEMPCOUNTERTABLE
GO

This should do the trick
with datesCte(monthStart,monthEnd) as
(
select cast('20120101' as date) as monthStart, cast('20120131' as date) as monthEnd
union all
select DATEADD(MONTH, 1, d.monthStart), dateadd(day, -1, dateadd(month, 1, d.monthStart))
from datesCte as d
where d.monthStart < '20140101'
)
select *
from datesCte as d
cross apply
(
select count(*) as cnt
from dbo.MemberDates as m
where m.startDate <= d.monthEnd and m.endDate > d.monthStart
) as x
order by d.monthStart

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

T-SQL: Aggregate based on condition - tsql

Related

Select dates missing data in a range

Defining a custom week using days of the month or year in postgresql

Coalesce overlapping time ranges in PostgreSQL

Postgresql: Select value difference between values in Integer column

Find total number in a specific period of time SQL

Categories

Resources