Group by day of the month (series) as well as a location - postgresql

I have come across the following link How to get list day of month data per month in postgresql and am building on this for my own query. Which shows a simple use of generate series for a listing of dates. I have a table that has dates, number of users and a location, which I would like to report on monthly, and on the days which have no data, simply show zero. I think the issue I am having is with the grouping of the location, as that is where my results go astray currently from what is expected.
My data (table = reserve)
Date | Users | Location
-----------------------
2021-05-02 | 3 | 1100<br>
2021-05-24 | 4 | 1000<br>
2021-05-26 | 6 | 1000<br>
2021-05-28 | 7 | 1100<br>
2021-05-29 | 4 | 1100<br>
2021-05-27 | 3 | 1000<br>
etc.
If I use the generate_series for the entire month (generate_series('2021-06-01', '2021-10-31', '1 day'::interval) and then join to the reserve table for each of the locations, the issue is that the group by will exclude the blank days on the join.
I am hoping to achieve:
Date
2021-05-01 | 0 | 1000<br>
2021-05-02 0 1000<br>
....<br>
2021-05-24 4 1000<br>
Until end of month<br>
2021-05-01 0 1100<br>
2021-05-02 3 1100<br>
....
Until end of month
Thank you in advance.

It's hard to tell exactly what you're after based on the example data from your tables but if you don't want to eliminate rows where there is no match this is exactly what a LEFT JOIN is for.
For example:
SELECT * FROM generate_series('2021-06-01', '2021-10-31', '1 day'::interval) as d
LEFT JOIN reserve r ON r.the_date = d;
This will keep all the days in the sequence and just returns null for the columns where there was no match for that day. You can see this in action with this SQL fiddle example

Related

How can I find the status in each month using a start and end date?

[ Title was: "Find out the facts: How to find the month wise active members in an healthcare organization per each year and also find the growth percentage" ]
i have 5 years of history data and would like to do some analytics on it. the data will contain active and inactive members data. the ask is for finding the active members per each month per each year.
what i am doing is am extracting month and year from effective data and grouping by month and year based on active status i.e. Status ='Active'
But in this manner I am losing the history records.
for example, if a person had membership from 01-01-2015 to 31-12-2016. this member will be shown as an inactive member now but the same person was an active member in that duration. So if I filter on the status, I will lose these old records.
i need to go to that month, Jan 2015 and check all whoever were active by that time. So I thought of doing another way.
I have extracted the month of expiry date and filtered like exp_month equal to or greater than extracted month of effective date as shown below. Here, I am not relying on the incoming source field containing member status. I am creating a field with logic to identify the status of the member during the period we are finding. This is just to identify active members per each month of year But i am not sure if this is giving me the perfect solution. Please suggest me the better approach.
SELECT extract(YEAR FROM member_effective_date) AS year
, extract(MONTH FROM member_expiry_date) AS month
, CASE WHEN extract(MONTH FROM member_expiry_date)
= extract(MONTH FROM member_effective_date)
OR extract(MONTH FROM member_expiry_date)
> extract(MONTH FROM member_effective_date)
THEN 'Yes'
ELSE 'No' END AS active_status
FROM table_name
You need to use a cross join with table of dates to get the status in each period. The cross join "inflates" the status table so you can evaluate the status for each period.
Here is an example:
CREATE TEMP TABLE table_name AS
SELECT 'member1' AS member
, '2020-01-01'::DATE AS member_effective_date
, '2020-04-27'::DATE AS member_expiry_date
;
WITH month_list
-- Month start and end for previous 12 months
AS (SELECT DATE_TRUNC('month',dt) AS month_start
, MAX(dt) AS month_end
FROM
-- List of the previous 365 dates
(SELECT DATE_TRUNC('day',SYSDATE) - (n * INTERVAL '1 day') AS dt
FROM
-- List of numbers from 1 to 365
(SELECT ROW_NUMBER() OVER () AS n FROM stl_scan LIMIT 365) )
GROUP BY month_start
)
SELECT extract(YEAR FROM b.month_start) AS year
, extract(MONTH FROM b.month_start) AS month
, CASE WHEN -- Effective before the month ended and
(a.member_effective_date <= b.month_end
AND a.member_expiry_date > b.month_start)
THEN 'Yes'
ELSE 'No' END AS active
FROM table_name a
CROSS JOIN month_list b -- Explicit cartesian product
ORDER BY 1,2
;
| year | month | active|
|------|-------|-------|
| 2019 | 8 | No |
| 2019 | 9 | No |
| 2019 | 10 | No |
| 2019 | 11 | No |
| 2019 | 12 | No |
| 2020 | 1 | Yes |
| 2020 | 2 | Yes |
| 2020 | 3 | Yes |
| 2020 | 4 | Yes |
| 2020 | 5 | No |
| 2020 | 6 | No |
| 2020 | 7 | No |
| 2020 | 8 | No |

Aggregate data to 30 minute intervals

A have a table where each row is 300 seconds (or 5 minutes) apart. I need to aggregate the data on every hour and half hour, aggregating everything before and including the hour or half hour.
I've tried this code:
SELECT
to_timestamp(floor(a / 1800 )) *
1800)
AT TIME ZONE 'UTC' as interval_alias, SUM(b) as b_sum
FROM TABLE_NAME GROUP BY interval_alias
...and it aggregates the data on every hour and half hour, but it sum the values post the hour and half hour.
The table looks something like this:
a | b
-------------------------
1533045600 | 3
1533045900 | 5
1533046200 | 6
1533046500 | 3
1533046800 | 5
1533047100 | 2
1533047400 | 3
1533047700 | 8
1533048000 | 5
1533048300 | 5
1533048600 | 6
The actual result with the above code is:
a | b
-------------------------
1533045600 | 24
1533047400 | 27
The desired output is:
a | b
-------------------------
1533045600 | 3
1533047400 | 24
I used a simpler calculation for the interval_alias and with a GROUP BY you can only select aggregations or the columns that are part of the GROUP BY. (The SELECT * you posted in the question didn't look correct...)
SELECT
FLOOR(a/1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
See example code on SQL Fiddle
update:
This is close to your desired output but will include a third result as your test data spans over more than a half hour.
SELECT
FLOOR(a/1800)*1800 + SIGN(a%1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
ORDER BY interval_alias

selecting records without value

I have a problem when I'm trying to reach the desired result. The task looks simple — make a daily count of occurrences of the event for top countries.
The main table looks like this:
id | date | country | col1 | col2 | ...
1 | 2018-01-01 21:21:21 | US | value 1 | value 2 | ...
2 | 2018-01-01 22:32:54 | UK | value 1 | value 2 | ...
From this table, I want to get daily event counts by the country, which is achieved by
SELECT date::DATE AT TIME ZONE 'UTC', country, COALESCE(count(id),0) FROM tab1
GROUP BY 1, 2
The problem comes when there is no event was made by an UK user on 2 January 2018
country_events
date | country | count
2018-01-01 | US | 23
2018-01-01 | UK | 5
2018-01-02 | US | 30
2018-01-02 | UK | 0 -> is desired result, but row is missing
I've tried to generate date series and series of countries which I'm looking for, then CROSS JOIN these two tables. This helper with columns date and country I've left joined with my result table like
SELECT * FROM helper h
LEFT JOIN country_events c ON c.date::DATE = h.date::DATE AND c.country = h.country
I'm using PostgreSQL.
You need an outer join, not a cross join:
SELECT tab1.date::date, tab1.country, coalesce(count(*), 0)
FROM generate_series(TIMESTAMP '2018-01-01 00:00:00',
TIMESTAMP '2018-01-31 00:00:00',
INTERVAL '1 day') AS ts(d)
LEFT JOIN tab1 ON tab1.date >= ts.d AND tab1.date < ts.d + INTERVAL '1 day'
GROUP BY tab1.date::date, tab1.country
ORDER BY tab1.date::date, tab1.country;
This will give the desired list for January 2018.

Fill in missing rows when aggregating over multiple fields in Postgres

I am aggregating sales for a set of products per day using Postgres and need to know not just when sales do happen, but also when they do not for further processing.
SELECT
sd.date,
COUNT(sd.sale_id) AS sales,
sd.product
FROM sales_data sd
-- sales per product, per day
GROUP BY sd.product, sd.date
ORDER BY sd.product, sd.date
This produces the following:
date | sales | product
------------+-------+-------------------
2017-08-17 | 10 | soap
2017-08-19 | 2 | soap
2017-08-20 | 5 | soap
2017-08-17 | 2 | shower gel
2017-08-21 | 1 | shower gel
As you can see - the date ranges per product are not continuous as sales_data just didn't contain any info for these products on some days.
What I'm aiming to do is to add a sales = 0 row for each product that is not sold on any day in a range - for example here, between 2017-08-17 and 2017-08-21 to give something like the the following:
date | sales | product
------------+-------+-------------------
2017-08-17 | 10 | soap
2017-08-18 | 0 | soap
2017-08-19 | 2 | soap
2017-08-20 | 5 | soap
2017-08-21 | 0 | soap
2017-08-17 | 2 | shower gel
2017-08-18 | 0 | shower gel
2017-08-19 | 0 | shower gel
2017-08-20 | 0 | shower gel
2017-08-21 | 1 | shower gel
In a simpler case where there was only a single product, it seems like the solution would be to use generate_series() i.e.:
create a full range of dates using generate_series
LEFT JOIN the already aggregated sales data onto the date series
COALESCE any NULL counts to 0 in the missing rows
The problem I have is that this approach does not seem to work dates repeat in the aggregated data as I'm grouping over not just multiple dates, but multiple products also.
It feels like I should be able to do something cunning with window functions here to solve this e.g. joining onto the full date range over partitions defined by the product name - but I can't see a way of actually getting this to work.
You could use:
WITH cte AS (
SELECT date, s.product
FROM ... -- some way to generate date series
CROSS JOIN (SELECT DISTINCT product FROM sales_data) s
)
SELECT
c.date,
c.product,
COUNT(sd.sale_id) AS sales
FROM cte c
LEFT JOIN sales_data sd
ON c.date = sd.date AND c.product= sd.product
GROUP BY c.date, c.product
ORDER BY c.date, c.product;
First create Cartesian product of dates and products, then LEFT JOIN to actual data and do calculations.
Oracle has great feature for this scenarios called Partitioned Outer Joins:
SELECT times.time_id, product, quantity
FROM inventory PARTITION BY (product)
RIGHT OUTER JOIN times ON (times.time_id = inventory.time_id)
WHERE times.time_id BETWEEN TO_DATE('01/04/01', 'DD/MM/YY')
AND TO_DATE('06/04/01', 'DD/MM/YY')
ORDER BY 2,1;
select
date,
count(sale_id) as sales,
product
from
sales_data
right join (
(
select d::date as date
from generate_series (
(select min(date) from sales_data),
(select max(date) from sales_data),
'1 day'
) gs (d)
) gs
cross join
(select distinct product from sales_data) p
) cj using (product, date)
group by product, date
order by product, date

getting matching data from two table where date is compared with date range postgres

I have 'order_history' and 'current_order' tables both has similar columns as mentioned below:
order_name, Consumer_name, order_date , Order_ amount
every time I get new orders and I have to find out overlapping orders and delete those duplicate from history table and then append current_table data but now this has been worst because my date is getting changed in current table so I have to consider +/- 14 days while comparing order_date and then delete it from history table
Please see sample table data
---Current Order
order_name| Consumer_name| order_date|order_amount
Wool | Jhon | 6/15/2017| 500
Wool | Jhon | 7/14/2017| 1000
Wool | Jhon | 8/13/2017| 800
---Order history
order_name| Consumer_name| order_date | order_amount
Wool | Jhon | 5/15/2017 | 1200
Wool | Jhon | 6/10/2017 | 500
Wool | Jhon | 7/8/2017 | 800
in above scenario I need to delete order rows for month of june and july ( 6/10/2017 and 7/8/2017) from history table and then need to appen current order table entries into history table.
I am using below query
select a.order_name, a.consumer_name, a.order_date , a.order_amount
from order_history a inner join current_order b
on ( upper(a.order_name) = upper(b.order_name) and
upper(a.consumer_name) = upper(b.consumer_name))
where
upper(a.order_name) = upper(b.order_name) and
upper(a.consumer_name) = upper(b.consumer_name)
and a.order_date -b.order_date <= 14 and a.order_date -b.order_date >= -14
but I am getting duplicate rows
can you please help me with this query. one solution can be a Cartesian product where each row will be compared with all rows of another table but this query will take huge amount of time due to data size is million rows
Use ABS to get the difference between the dates as a positive number. If you simply want to delete from the history table, use EXISTS to look up the current orders table.
delete
from order_history h
where exists
(
select *
from current_order c
where c.order_name = h.order_name
and c.consumer_name = h.consumer_name
and abs(c.order_date - h.order_date) <= 14
);