Problem with gaps in data with generate_series - postgresql

I need a query that returns the cumulative sum of all paid bills per day in the current month.
I've tried a few codes, including this one:
SELECT DISTINCT
month.day,
sum(bills.value) OVER (ORDER BY month.day)
FROM generate_series(1,31) month(day)
LEFT JOIN bills ON date_part('day',bills.payment_date) = month.day
WHERE
(
(date_part('year',bills.payment_date)=date_part('year',CURRENT_DATE)) AND
(date_part('month',bills.payment_date)=date_part('month',CURRENT_DATE))
)
GROUP BY month.day, bills.value, bills.payment_date
ORDER BY month.day
I'm getting:
day | value
1 | 1000
4 | 3000
5 | 5000
The sum is correct, but I'm not getting all the 31 days from the generate_series function. Also, when I remove the DISTINCT command, the query just repeat the days, like:
day | value
1 | 1000
4 | 3000
4 | 3000
4 | 3000
4 | 3000
5 | 5000
5 | 5000
What I want is:
day | value
1 | 1000
2 | 1000
3 | 1000
4 | 3000
5 | 5000
6 | 5000
... | 5000
31 | 5000
Any ideas?

Remove bills.value, bills.payment_date from the group by, then you also don't need the distinct any more. You can also simplify the WHERE clause. But you need to move that condition into the JOIN condition, otherwise it will turn the outer join back into an inner join.
SELECT month.day,
sum(sum(bills.value)) over (order by month.day) as total_value
FROM generate_series(1,31) month(day)
LEFT JOIN bills
ON date_part('day',bills.payment_date) = month.day
AND to_char(bills.payment_date, 'yyyymm') = to_char(current_date, 'yyyymm')
GROUP BY month.day
ORDER BY month.day

Related

Monthly Counting PostgreSQL giving months

I have a table for customers like this
cust_id | date_signed_up | location_id
-----------------------------------------
1 | 2019/01/01 | 1
2 | 2019/03/05 | 1
3 | 2019/06/17 | 1
What I need is to have a monthly count but having the months even if its 0. Ex:
monthly_count | count
-------------------------
Jan | 1
Feb | 0
Mar | 1
Apr | 0
(months can be in numbers)
Right now I made this query:
SELECT date_trunc('MONTH', (date_signed_up::date)) AS monthly, count(customer_id) AS count FROM customer
WHERE group_id = 1
GROUP BY monthly
ORDER BY monthly asc
but it's giving me just for the months there's information, skipping the ones where it's zero. How can I get all the months even if they have or not information.
You need a list of months.
How to generate Month list in PostgreSQL?
SELECT a.month , count( y.cust_id )
FROM allMonths a
LEFT JOIN yourTable y
ON a.month = date_trunc('MONTH', (date_signed_up::date))
GROUP BY a.month

Fixed range of timestamps for every uuid in SQL

I would like to generate a table with the last n weeks timestamps of data (in this case, n=3) and all the data, even if it is null.
I am using the following pieces of code
with raw_weekly_data as (SELECT
distinct d.uuid,
date_trunc('week',a.start_timestamp) as tstamp,
avg(price) as price
FROM
a join d on a.uuid = d.uuid
where start_timestamp between date_trunc('week',now()) - interval '3 week' and date_trunc('week',now())
group by 1,2,3
order by 1)
,tstamp as (SELECT
distinct tstamp
FROM
raw_weekly_data
)
SELECT
t.tstamp,
r.*
from raw_weekly_data r right join tstamp t on r.tstamp = t.tstamp
order by uuid
I would like to have something like that:
week | uuid | price
w1 | 1 | 10
w2 | 1 | 2
w3 | 1 |
w1 | 2 | 20
w2 | 2 |
w3 | 2 |
w1 | 3 | 10
w2 | 3 | 10
w3 | 3 | 20
But instead all the null results are not showed. What is the best approach in here?
week | uuid | price
w1 | 1 | 10
w2 | 1 | 2
w1 | 2 | 20
w1 | 3 | 10
w2 | 3 | 10
w3 | 3 | 20
Form a Cartesian product of all weeks an UUIDs, then LEFT JOIN to actual avg, prices per (week, uuid). Like:
SELECT *
FROM generate_series (date_trunc('week', now() - interval '3 week')
, now() - interval '1 week'
, interval '1 week') tstamp
CROSS JOIN (SELECT DISTINCT uuid FROM a) a
LEFT JOIN (
SELECT d.uuid
, date_trunc('week', a.start_timestamp) AS tstamp
, avg(price) AS price -- d.price?
FROM a
JOIN d USING (uuid)
WHERE a.start_timestamp >= date_trunc('week',now()) - interval '3 week'
AND a.start_timestamp < date_trunc('week',now())
) ad USING (uuid, tstamp)
GROUP BY 1, 2
ORDER BY 1, 2
This way you get all combinations of the last three weeks and UUIDs, extended by the average price - if one should exist for the combination.
Based on some educated guesses to fill in missing information ..

Aggregate data to 30 minute intervals

A have a table where each row is 300 seconds (or 5 minutes) apart. I need to aggregate the data on every hour and half hour, aggregating everything before and including the hour or half hour.
I've tried this code:
SELECT
to_timestamp(floor(a / 1800 )) *
1800)
AT TIME ZONE 'UTC' as interval_alias, SUM(b) as b_sum
FROM TABLE_NAME GROUP BY interval_alias
...and it aggregates the data on every hour and half hour, but it sum the values post the hour and half hour.
The table looks something like this:
a | b
-------------------------
1533045600 | 3
1533045900 | 5
1533046200 | 6
1533046500 | 3
1533046800 | 5
1533047100 | 2
1533047400 | 3
1533047700 | 8
1533048000 | 5
1533048300 | 5
1533048600 | 6
The actual result with the above code is:
a | b
-------------------------
1533045600 | 24
1533047400 | 27
The desired output is:
a | b
-------------------------
1533045600 | 3
1533047400 | 24
I used a simpler calculation for the interval_alias and with a GROUP BY you can only select aggregations or the columns that are part of the GROUP BY. (The SELECT * you posted in the question didn't look correct...)
SELECT
FLOOR(a/1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
See example code on SQL Fiddle
update:
This is close to your desired output but will include a third result as your test data spans over more than a half hour.
SELECT
FLOOR(a/1800)*1800 + SIGN(a%1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
ORDER BY interval_alias

I am computing a percentage in postgresql and I get the following unexpected behavior when dividing a number by the same number

I am new at postgresql and am having trouble wrapping my mind around why I am getting the results that I see.
I perform the following query
SELECT
name AS region_name,
COUNT(tripsq1.id) AS trips,
COUNT(DISTINCT user_id) AS unique_users,
COUNT(case when consumed_at = start_at then tripsq1.id end) AS first_day,
(SUM(case when consumed_at = start_at then tripsq1.id end)::NUMERIC(6,4))/COUNT(tripsq1.id)::NUMERIC(6,4) AS percent_on_first_day
FROM promotionsq1
INNER JOIN couponsq1
ON promotion_id = promotionsq1.id
INNER JOIN tripsq1
ON couponsq1.id = coupon_id
INNER JOIN regionsq1
ON regionsq1.id = region_id
WHERE promotion_name = 'TestPromo'
GROUP BY region_name;
and get the following result
region_name | trips | unique_users | first_day | percent_on_first_day
-------------------+-------+--------------+-----------+-----------------------
A | 3 | 2 | 1 | 33.3333333333333333
B | 1 | 1 | 0 |
C | 1 | 1 | 1 | 2000.0000000000000000
The first rows percentage gets calculated correctly while the third rows percentage is 20 times what it should be. The percent_on_first_day should be 100.00 since it is 100.0 * 1/1.
Any help would be greatly appreciated
I'm suspecting that the issue is because of this code:
SUM(case when consumed_at = start_at then tripsq1.id end)
This tells me you are summing the ids, which is meaningless. You probably want:
SUM(case when consumed_at = start_at then 1 end)

Grouped LIMIT 10 in Postgresql

I have a query:
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
order by partner_id, count desc;
which brings back counts for certain terms per partner_id. I want to be able to show the top 10 for each of the ~40 partner_id in order by the count desc
the query results look like
db=# SELECT * FROM xxx;
pid | term_desc | count
----+------------+------
4 | termdesc1 | 3434
4 | termdesc2 | 235
4 | termdesc3 | 367
4 | termdesc4 | 4533
5 | termdesc1 | 235
5 | termdesc2 | 567
5 | termdesc3 | 344
5 | termdesc4 | 56
(10k+ rows)
You could add a rank column and then filter the result by the rank :
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id,
RANK() OVER (PARTITION BY a.partner_id order by a.partner_id DESC) AS r
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
HAVING r < 11
order by partner_id, count desc;
I have not tested the code, however the trick is ranking the each row of the GROUP BY and filter the resultset with the HAVING clause, keeping only item with a lower rank than 11 (you will get 10 item per group).