MySQL SELECT MIN and MAX RIGHT JOIN numeric value of the last 30 days - select

I need a query to return the initial and final numeric value of the number of listeners of some artists of the last 30 days ordered from the highest increase of listeners to the lowest.
To better understand what I mean, here are the tables involved.
artist table saves the information of a Spotify artist.
id
name
Spotify_id
1
Shakira
0EmeFodog0BfCgMzAIvKQp
2
Bizarrap
716NhGYqD1jl2wI1Qkgq36
platform_information table save the information that I want to get from the artists and on which platform.
id
platform
information
1
spotify
monthly_listeners
2
spotify
followers
platform_information_artist table stores information for each artist on a platform and information on a specific date.
id
platform_information_id
artist_id
date
value
1
1
1
2022-11-01
100000
2
1
1
2022-11-15
101000
3
1
1
2022-11-30
102000
4
1
2
2022-11-02
85000
5
1
2
2022-11-06
90000
6
1
2
2022-11-26
100000
Right now have this query:
SELECT (SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date ASC
LIMIT 1) as month_start,
(SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date DESC
LIMIT 1) as month_end,
(SELECT month_end - month_start) as diference
ORDER BY month_start;
Which returns the following:
month_start
month_end
difference
100000
102000
2000
The problem is that this query only returns the artist I specify.
And I need the information like this:
artist_id
name
platform_information_id
month_start_value
month_end_value
difference
2
Bizarrap
1
85000
100000
15000
1
Shakira
1
100000
102000
2000
The query should return the 5 artists that have grown the most in number of monthly listeners over the last 30 days, along with the starting value 30 days ago, and the current value.
Thanks for the help.

Related

Cumulative Sum When Order Was Placed in postgresql

I have an orders table with datetime when an order was placed, and when it was completed:
orderid
userid
price
status
createdat
doneat
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
I would like to have a new column that is the cumulative total (sum price) per user when the order was created:
orderid
userid
price
status
createdat
doneat
cumulative total when placed (per user)
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
0
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
0
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
250
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
0
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
100
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
100
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
100
The logic is sum the price for each user for all orders that were completed before the current row's created at date. For orderid=2, although it's the user's 2nd order, there are no orders that were completed before its createdat datetime of 2/21/21 05:27:29, so the cumulative total when placed is 0.
The same for orderid in [5,6,7]. For those orders and that userid, the only order that was completed before their createdat dates is order 4, so their cumulative total when placed is 100.
In PowerBI the logic is like this:
SUMX (
filter(
orders,
earlier orders.userid = orders.userid && orders.doneat < orders.createdat && order.status = 'completed'),
orders.price)
Would anyone have any hints of how to achieved this in postgresql?
I tried something like this and it didn't work.
select (case when o.doneat < o.createdat over (partition by o.userid, o.status order by o.createdat)
then sum(o.price) over (partition by o.userid, o.status ORDER BY o.doneat asc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
end) as cumulativetotal_whenplaced
from order o
Thank you
You can duplicate each row into:
an "original" (which we'll decorate with a flag keep = true), that has an accounting value val = 0 (so far), and a time t = createdat;
a "duplicate" (keep = false), that has the price to account for (if status is 'completed') as val and a time t = doneat.
Then it's just a matter of accounting for the right bits:
select orderid, userid, price, status, createdat, doneat, cumtot
from (
select *, sum(val) over (partition by userid order by t, keep desc) as cumtot
from (
select *, createdat as t, 0 as val, true as keep from foo
union all
select *, doneat as t,
case when status = 'completed' then price else 0 end as val,
false as keep
from foo
) as a
) as a
where keep
order by orderid;
Example: DB Fiddle.
Note for RedShift: the window expression above needs to be replaced by:
...
select *, sum(val) over (
partition by userid order by t, keep desc
rows unbounded preceding) as cumtot
...
Result for your data:
orderid
userid
price
status
createdat
doneat
cumtot
1
128
100
completed
2021-02-16T18:40:45.000Z
2021-02-21T07:59:46.000Z
0
2
128
150
completed
2021-02-21T05:27:29.000Z
2021-02-23T11:58:23.000Z
0
3
128
100
completed
2021-09-03T08:38:14.000Z
2021-09-10T14:24:35.000Z
250
4
5
100
completed
2022-05-28T23:28:07.000Z
2022-06-26T06:10:35.000Z
0
5
5
100
canceled
2022-07-08T22:28:57.000Z
2022-08-10T06:55:17.000Z
100
6
5
100
completed
2022-07-25T13:46:38.000Z
2022-08-10T06:57:20.000Z
100
7
5
5
completed
2022-08-07T18:07:07.000Z
2022-08-12T06:56:23.000Z
100
Note: this type of accounting across time is actually robust to many corner-cases (various orders overlapping, some starting and finishing while others are still in process, etc.) It is the basis for a fast interval compaction algorithm that I should describe someday on SO.
Bonus: try to figure out why the partitioning window is ordered by t (fairly obvious) and also by keep desc (less obvious).

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month. This calcualtion is to be done for 12 months in a single query. Output should be as below.
Month Count
01/07/2019 50
01/08/2019 34
01/09/2019 23
01/10/2019 98
01/11/2019 10
01/12/2019 5
01/01/2020 32
01/02/2020 65
01/03/2020 23
01/04/2020 12
01/05/2020 64
01/06/2020 54
01/07/2020 78
I am able to get the value only for one month. I want to get it for all months in a single query.
This is my current query:
SELECT COUNT(DISTINCT TWO_MONTHS_AGO.USER_ID), TWO_MONTHS_AGO.MONTH AS INVOICE_MONTH
FROM (
SELECT USER_ID, LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
WHERE invoice_amt > 0
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 2)
GROUP BY user_id
) AS TWO_MONTHS_AGO
LEFT JOIN (
SELECT user_id,LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 1)
GROUP BY USER_ID
) AS ONE_MONTH_AGO ON TWO_MONTHS_AGO.USER_ID = ONE_MONTH_AGO.USER_ID
WHERE ONE_MONTH_AGO.USER_ID IS NULL
GROUP BY INVOICE_MONTH;
Thank you in advance.
Lona
Probably lots of different approaches but the way I would do it is as follows:
Summarise data by user and month for the last 13 months (you need 12 months plus the previous month to that first month
Compare "this" month (that has data) to "next" month and select records where there is no "next" month data
Summarise this dataset by month and distinct userid
For example, assuming a table created as follows:
create table INVOICE_DATA (
USERID varchar(4),
INVOICE_DT date,
INVOICE_AMT NUMBER(10,2)
);
the following query should give you what you want - you may need to adjust it depending on whether you are including this month, or only up to the end of last month, in your calculation, etc.:
--Summarise data by user and month
WITH MONTH_SUMMARY AS
(
SELECT USERID
,TO_CHAR(INVOICE_DT,'YYYY-MM') "INVOICE_MONTH"
,TO_CHAR(ADD_MONTHS(INVOICE_DT,1),'YYYY-MM') "NEXT_MONTH"
,SUM(INVOICE_AMT) "MONTHLY_TOTAL"
FROM INVOICE_DATA
WHERE INVOICE_DT >= TRUNC(ADD_MONTHS(current_date(),-13),'MONTH') -- Last 13 months of data
GROUP BY 1,2,3
),
--Get data for users with invoices in this month but not the next month
USER_DATA AS
(
SELECT USERID, INVOICE_MONTH, MONTHLY_TOTAL
FROM MONTH_SUMMARY MS_THIS
WHERE NOT EXISTS
(
SELECT USERID
FROM MONTH_SUMMARY MS_NEXT
WHERE
MS_THIS.USERID = MS_NEXT.USERID AND
MS_THIS.NEXT_MONTH = MS_NEXT.INVOICE_MONTH
)
AND MS_THIS.INVOICE_MONTH < TO_CHAR(current_date(),'YYYY-MM') -- Don't include this month as obviously no next month to compare to
)
SELECT INVOICE_MONTH, COUNT(DISTINCT USERID) "USER_COUNT"
FROM USER_DATA
GROUP BY INVOICE_MONTH
ORDER BY INVOICE_MONTH
;

Postgresql : Average over a limit of Date with group by

I have a table like this
item_id date number
1 2000-01-01 100
1 2003-03-08 50
1 2004-04-21 10
1 2004-12-11 10
1 2010-03-03 10
2 2000-06-29 1
2 2002-05-22 2
2 2002-07-06 3
2 2008-10-20 4
I'm trying to get the average for each uniq Item_id over the last 3 dates.
It's difficult because there are missing date in between so a range of hardcoded dates doesn't always work.
I expect a result like :
item_id MyAverage
1 10
2 3
I don't really know how to do this. Currently i manage to do it for one item but i have trouble extending it to multiples items :
SELECT AVG(MyAverage.number) FROM (
SELECT date,number
FROM item_list
where item_id = 1
ORDER BY date DESC limit 3
) as MyAverage;
My main problem is with generalising the "DESC limit 3" over a group by id.
attempt :
SELECT item_id,AVG(MyAverage.number)
FROM (
SELECT item_id,date,number
FROM item_list
ORDER BY date DESC limit 3) as MyAverage
GROUP BY item_id;
The limit is messing things up there.
I have made it " work " using between date and date but it's not working as i want because i need a limit and not an hardcoded date..
Can anybody help
You can use row_number() to assign 1 to 3 for the records with the last date for an ID an then filter for that.
SELECT x.item_id,
avg(x.number)
FROM (SELECT il.item_id,
il.number,
row_number() OVER (PARTITION BY il.item_id
ORDER BY il.date DESC) rn
FROM item_list il) x
WHERE x.rn BETWEEN 1 AND 3
GROUP BY x.item_id;

How to subtract seconds from postgres datetime without having to add it in group by clause?

Say I have column of type dateTime with value "2014-04-14 12:17:55.772" & I need to subtract seconds "2" seconds from it to get o/p like this "12:17:53".
userid EndDate seconds
--------------------------------------------------------
1 "2014-04-14 12:17:14.295" 512
1 "2014-04-14 12:31:14.295" 12
2 "2014-04-14 12:48:14.295" 2
2 "2014-04-14 13:22:14.295" 12
& the query is
select (enddate::timestamp - (seconds* interval '1 second')) seconds, userid
from user
group by userid
Now I need to group by userid only but enddate & seconds added to select query that is asking me to add it in group by clause which will not give me correct o/p.
I am expecting data in this format where I need to calculate start_time from end_time & total seconds spent.
user : 1
start_time end_time total (seconds)
"12:17" "12:17" 1
"12:22" "12:31" 512
total: 513
user : 2
"12:43" "12:48" 288
"13:22" "13:22" 1
total 289
Is there some way i could avoid group by clause in this?
Like #IMSoP says, you can use a window function to include a total for each user in your query output:
SELECT userid
, (enddate - (seconds * interval '1 second')) as start_time
, enddate as end_time
, seconds
, sum(seconds) OVER (PARTITION BY userid) as total
FROM so23063314.user;
Then you would only display the parts of the row you're interested in for each subtotal line, and display the total at the end of each block.

SELECT record based upon dates

Assuming data such as the following:
ID EffDate Rate
1 12/12/2011 100
1 01/01/2012 110
1 02/01/2012 120
2 01/01/2012 40
2 02/01/2012 50
3 01/01/2012 25
3 03/01/2012 30
3 05/01/2012 35
How would I find the rate for ID 2 as of 1/15/2012?
Or, the rate for ID 1 for 1/15/2012?
In other words, how do I do a query that finds the correct rate when the date falls between the EffDate for two records? (Rate should be for the date prior to the selected date).
Thanks,
John
How about this:
SELECT Rate
FROM Table1
WHERE ID = 1 AND EffDate = (
SELECT MAX(EffDate)
FROM Table1
WHERE ID = 1 AND EffDate <= '2012-15-01');
Here's an SQL Fiddle to play with. I assume here that 'ID/EffDate' pair is unique for all table (at least the opposite doesn't make sense).
SELECT TOP 1 Rate FROM the_table
WHERE ID=whatever AND EffDate <='whatever'
ORDER BY EffDate DESC
if I read you right.
(edited to suit my idea of ms-sql which I have no idea about).