I need a query to return the initial and final numeric value of the number of listeners of some artists of the last 30 days ordered from the highest increase of listeners to the lowest.
To better understand what I mean, here are the tables involved.
artist table saves the information of a Spotify artist.
id
name
Spotify_id
1
Shakira
0EmeFodog0BfCgMzAIvKQp
2
Bizarrap
716NhGYqD1jl2wI1Qkgq36
platform_information table save the information that I want to get from the artists and on which platform.
id
platform
information
1
spotify
monthly_listeners
2
spotify
followers
platform_information_artist table stores information for each artist on a platform and information on a specific date.
id
platform_information_id
artist_id
date
value
1
1
1
2022-11-01
100000
2
1
1
2022-11-15
101000
3
1
1
2022-11-30
102000
4
1
2
2022-11-02
85000
5
1
2
2022-11-06
90000
6
1
2
2022-11-26
100000
Right now have this query:
SELECT (SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date ASC
LIMIT 1) as month_start,
(SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date DESC
LIMIT 1) as month_end,
(SELECT month_end - month_start) as diference
ORDER BY month_start;
Which returns the following:
month_start
month_end
difference
100000
102000
2000
The problem is that this query only returns the artist I specify.
And I need the information like this:
artist_id
name
platform_information_id
month_start_value
month_end_value
difference
2
Bizarrap
1
85000
100000
15000
1
Shakira
1
100000
102000
2000
The query should return the 5 artists that have grown the most in number of monthly listeners over the last 30 days, along with the starting value 30 days ago, and the current value.
Thanks for the help.
Original data:
subject medgrp stdt endt
1 A 7/1/2014 7/31/2014
1 A 7/29/2014 8/30/2014
1 B 7/1/2014 8/15/2014
1 C 8/1/2014 9/1/2014
2 A 4/15/2014 5/15/2014
2 A 5/10/2014 6/10/2014
2 A 6/5/2014 6/15/2014
2 A 7/1/2014 8/1/2014
3 A 6/5/2014 6/15/2014
3 A 6/16/2014 8/1/2014
Re-structured data:
subject med_pattern stdt_new endt_new
1 A*B 7/1/2014 7/31/2014
1 A*B*C 8/1/2014 8/15/2014
1 A*C 8/16/2014 8/30/2014
1 C 8/31/2014 9/1/2014
2 A 4/15/2014 6/15/2014
2 A 7/1/2014 8/1/2014
3 A 6/5/2014 8/1/2014
I was able to transform original data to re-structured data by outputting stdt to endt for all records, then keep one date for each subject/medgrp, reform date periods and create the variable med_pattern.
However, this method takes a long time to run, especially for big data (>3m records).
Any suggestions to make this more efficient would be greatly appreciated!
By subject you can use a date keyed multi-data hash to track the medgrp activity for each date in the date range defined by stdt and endt. A iteration of the hash will let you compute your medgrps crossings value.
data have; input
subject medgrp $ stdt: mmddyy8. endt: mmddyy8.; format stdt endt mmddyy10.;
datalines;
1 A 7/1/2014 7/31/2014
1 A 7/29/2014 8/30/2014
1 B 7/1/2014 8/15/2014
1 A 7/15/2014 7/15/2014
1 C 8/1/2014 9/1/2014
2 A 4/15/2014 5/15/2014
2 A 5/10/2014 6/10/2014
2 A 6/5/2014 6/15/2014
2 A 7/1/2014 8/1/2014
3 A 6/5/2014 6/15/2014
3 A 6/16/2014 8/1/2014
;
data crossings_by_date / view=crossings_by_date;
if 0 then set have; * prep PDV;
if _n_ then do;
declare hash dg(multidata:'yes', ordered:'a'); %* 1st hash for subject dates;
dg.defineKey('date');
dg.defineData('date', 'medgrp');
dg.defineDone();
call missing (date); format date adate cdate mmddyy10.;
declare hash crossing(ordered:'a'); %* 2nd hash for deduping a list of medgrps ;
crossing.defineKey('medgrp');
crossing.defineData('medgrp');
crossing.defineDone();
declare hiter dgi('dg');
declare hiter xi('crossing');
end;
dg.clear();
do _n_ = 1 by 1 until (last.subject); * process subjects one by one;
set have;
by subject;
do date = stdt to endt; * load multidata hash with medgrp over date range;
dg.add();
end;
end;
* examine each date in which subject had activity;
adate = .;
cdate = -1e9;
do _i_ = 1 by 1 while (dgi.next() = 0);
if date eq adate
then continue; * hiter over multi-data will return each node;
else adate = date; * track activity date;
* load hash to dedupe tracking of medgrp on date;
crossing.clear();
do _i_ = 1 by 1 while (dg.do_over() = 0);
crossing.replace();
end;
* compute crossing representation on date, A*B*... by traversing 2nd hash;
xi.first(); length cross $20;
cross = medgrp;
do while(0 = xi.next());
cross = catx('*',cross,medgrp);
end;
if date - cdate > 1 then cluster + 1; %* track cluster based on date continuities;
cdate = date;
output; * <------------ view OUTPUT;
end;
keep subject date cross cluster;
run;
* 2nd data step processes view (1st data step);
* determine when date continuity ends or medgrp changes;
data want;
length subject 8 medgrps $20;
format stdt endt mmddyy10.;
do _n_ = 1 by 1 until (last.medgrps);
set crossings_by_date (rename=cross=medgrps);
by cluster medgrps notsorted;
if stdt = . then
stdt = date;
end;
endt = date;
keep subject medgrps stdt endt;
run;
I've a table data as below, now I need to fetch the record with in same code, where (Value2-Value1)*2 of one row >= (Value2-Value1) of consequtive date row. (all dates are uniform with in all codes)
---------------------------------------
code Date Value1 Value2
---------------------------------------
1 1-1-2018 13 14
1 2-1-2018 14 16
1 4-1-2018 15 18
2 1-1-2019 1 3
2 2-1-2018 2 3
2 4-1-2018 3 7
ex: output needs to be
1 1-1-2018 13 14
as I am begginer to SQL coding, tried my best, but cannot get through with compare only on consequtive dates.
Use a self join.
You can specify all the conditions you've listed in the ON clause:
SELECT T0.code, T0.Date, T0.Value1, T0.Value2
FROM Table As T0
JOIN Table As T1
ON T0.code = T1.code
AND T0.Date = DateAdd(Day, 1, T1.Date)
AND (T0.Value2 - T0.Value1) * 2 >= T1.Value2 - T1.Value1
I wanted to show the forecast dates with the current date plus frequency up to one year in DB2.
date :Current date
if frequency is :2
upto : 2020-01-01
output be like :
2019-05-22,
2019-07-22,
2019-09-22,
2019-11-22
Try the following RCTE:
with t(dt) as (
values current date
union all
select dt + 2 month
from t
where year(dt + 2 month) = year(current date)
)
select dt
from t;
I have a table of events, called tbl_events that looks something like this:
PersonID Date
1 30/03/2015
1 22/04/2015
1 30/06/2015
2 18/07/2016
2 09/12/2016
2 28/04/2017
3 01/10/2014
3 28/11/2016
3 28/11/2016
3 16/01/2017
4 13/04/2017
4 09/05/2017
I want to be able to group these events up by the start date of each 'sequence', with a sequence being defined as a run of events from the first identified to the last identified for each PersonID. The last event in a sequence is defined as the event where thereafter there are no subsequent events for that PersonID for a year.
The result of this I would expect to look like is below:
PersonID FirstDate Sequence Events
1 30/03/2015 1 3
2 18/07/2016 1 3
3 01/10/2014 1 1
3 28/11/2016 2 3
4 13/04/2017 1 2
I am able to identify the sequences in Excel and pivot the data, but I need to be able to do this in SQL.
Here is the formula I have used in Excel to generate the sequence number (I am populating cell C3, with column A being PersonID and B being Date):
=+IF(A2<>A3,1,IF((B3-B2)<365,C2,C2+1))
I have joined the table back on itself using ROW_NUMBER to get the difference between the Date and the previous event date for that ID, but I'm not really sure where to go from there.
Any help is much appreciated.
My solution is based on the sample data you've provided along with your excel formula.
-- easily consumable sample data
DECLARE #tbl_events TABLE (PersonId int, [date] date)
INSERT #tbl_events VALUES
(1,'20150330'),(1,'20150422'),(1,'20150630'),(2,'20160718'),(2,'20161209'),(2,'20170428'),
(3,'20141001'),(3,'20161128'),(3,'20161128'),(3,'20170116'),(4,'20170413'),(4,'20170509');
-- Solution
WITH groupings AS
(
SELECT
PersonId,
FirstDate = MIN([date]) OVER (PARTITION BY personId ORDER BY [date]),
NextDate = LAG([date],1,[date]) OVER (PARTITION BY personId ORDER BY [date]),
[date],
grouper =
DATEDIFF(DAY, MIN([date]) OVER (PARTITION BY personId ORDER BY [date]), [date]) / 365
FROM #tbl_events
),
Prep AS
(
SELECT
PersonId,
firstDate = IIF(grouper = 0, FirstDate, IIF(FirstDate = NextDate, [date],NextDate))
FROM groupings
)
SELECT
PersonId,
FirstDate,
[Sequence] = ROW_NUMBER() OVER (PARTITION BY personId ORDER BY FirstDate),
[Events] = COUNT(*)
FROM prep
GROUP BY personId, FirstDate;
Results
PersonId FirstDate Sequence Events
----------- ---------- -------------------- -----------
1 2015-03-30 1 3
2 2016-07-18 1 3
3 2014-10-01 1 1
3 2016-11-28 2 3
4 2017-04-13 1 2
First note all years have 365 days, nonetheless, I'm using 365 to emulate your excel logic; this would need to be updated to account for leap years. Next, like your excel formula - this will only be correct when there are two sequences;
it would not work when, say personId has a date of jan 1 2015, then jan 10 2016, then feb 1 2017.Let us know if we need logic to accommodate for the aforementioned scenarios.
Lastly this solution uses LAG which requires SQL Server 2012+, if you're working with an earlier version of SQL the query will have to be updated accordingly.