MYSQL select to get Consecutive Day Count user wise where the value is lesser than previous day value - select

MySql V 8.0
Question: How to write MySQL select to get Consecutive Day Count where the weight value is lesser than the previous day weight value user wise and break when no longer consecutive or weight value is same or greater than the previous day weight value of the same user.
create table userData (recordDate ,userName varchar(100), weight FLOAT);
insert into userData (recordDate, userName, weight)
values
('2020/8/1','Chris', 78),
('2021/8/2','Chris', 77),
('2021/8/3','Chris', 76),
('2021/8/1','Aamir', 78),
('2021/8/2','Aamir', 77),
('2021/8/1','Alex', 78),
('2021/8/2','Alex', 77),
('2021/8/3','Alex', 76),
('2021/8/5','Chris', 78),
('2021/8/6','Chris', 77),
('2021/8/7','Chris', 76),
('2021/8/8','Chris', 75),
('2021/8/8','Aamir', 78),
('2021/8/8','Alex', 78),
('2021/8/9','John', 78),
('2021/8/1','Ali', 78),
('2021/8/10','Chris', 78);
The expected output is
| userName | streakDays | startingDate | endingDate |
| -------- | ---------- | ------------ | ---------- |
| Alex | 3 | 2021-08-01 | 2021-08-03 |
| Chris | 3 | 2021-08-06 | 2021-08-08 |
| Aamir | 2 | 2021-08-01 | 2021-08-02 |
| Ali | 1 | 2021-08-01 | 2021-08-01 |
| John | 1 | 2021-08-09 | 2021-08-09 |
Any help would be appreciated.

According To Your data inserted in the table , This Select Query Works Fine
select userName as un ,
count((select recordDate WHERE userName = un)) as strekdays,
(select recordDate FROM userdata WHERE userName = un limit 1) as startdate ,
(select recordDate FROM userdata WHERE userName = un order by recordDate DESC limit 1) as enddate
from userdata
group by userName
And It Gives Output Like
userName
streakDays
startingDate
endingDate
Aamir
3
2021-08-01
2021-08-08
Alex
4
2021-08-01
2021-08-08
Ali
1
2021-08-01
2021-08-01
Chris
8
2021-08-01
2021-08-10
John
1
2021-08-09
2021-08-09
Let me know If this Works FOr You or not !

Problem resolved with the following query:
select
streakBreakersRemoved.userName,
streakBreakersRemoved.streakDays,
streakBreakersRemoved.startingDate,
streakBreakersRemoved.endingDate
from
(
select
userName,
count(*) as streakDays,
min(recordDate) as startingDate,
max(recordDate) as endingDate,
row_number() over (partition by userName
order by
count(*) desc) as seqNum
from
(
select
initailRecords.*,
row_number() over (partition by userName
order by
recordDate) as initialSeqNum
from
(
select
userData.*,
lag(weight) over (partition by userName
order by
recordDate) as previousWeight
from
userData
)
initailRecords
where
if(previousWeight is null || previousWeight > weight, 1, 0) = 1
)
recordsWithSeqNum
group by
userName,
to_days(recordDate) - initialSeqNum
)
streakBreakersRemoved
where
seqNum = 1
order by
streakDays desc;
Would appreciate if anyone would like to optimize the above query.

Related

Calculate duration of time ranges without overlap in PostgreSQL

I'm on Postgres 13 and have a table like this
| key | from | to
-------------------------------------------
| A | 2022-11-27T08:00 | 2022-11-27T09:00
| B | 2022-11-27T09:00 | 2022-11-27T10:00
| C | 2022-11-27T08:30 | 2022-11-27T10:30
I want to calculate the duration of each record, but without overlaps. So the desired result would be
| key | from | to | duration
----------------------------------------------------------
| A | 2022-11-27T08:00 | 2022-11-27T09:00 | '1 hour'
| B | 2022-11-27T09:00 | 2022-11-27T09:45 | '45 minutes'
| C | 2022-11-27T08:30 | 2022-11-27T10:00 | '15 minutes'
I guess, I need a subquery and subtract the overlap somehow, but how would I factor in multiple overlaps? In the example above C overlaps A and B, so I must subtract 30 minutes from A and then 45 minute from B... But I'm stuck here:
SELECT key, (("to" - "from")::interval - s.overlap) as duration
FROM time_entries, (
SELECT (???) as overlap
) s
select
key,
fromDT,
toDT,
(toDT-fromDT)::interval -
COALESCE((SELECT SUM(LEAST(te2.toDT,te1.toDT)-GREATEST(te2.fromDT,te1.fromDT))::interval
FROM time_entries te2
WHERE (te2.fromDT<te1.toDT or te2.toDT>te1.fromDT)
AND te2.key<te1.key),'0 minutes') as duration
from time_entries te1;
output:
key
fromdt
todt
duration
A
2022-11-27 08:00:00
2022-11-27 09:00:00
01:00:00
B
2022-11-27 09:00:00
2022-11-27 10:00:00
01:00:00
C
2022-11-27 08:30:00
2022-11-27 10:30:00
00:30:00
I renamed the columns from and to to fromDT and toDT to avoid using reserved words.
a, step by step, explanation is in the DBFIDDLE
Another approach.
WITH DATA AS
(SELECT KEY,
FROMDT,
TODT,
MIN(FROMDT) OVER(PARTITION BY FROMDT::DATE
ORDER BY KEY) AS START_DATE,
MAX(TODT) OVER(PARTITION BY FROMDT::DATE
ORDER BY KEY) AS END_DATE
FROM TIME_ENTRIES
ORDER BY KEY) ,STAGING_DATA AS
(SELECT KEY,
FROMDT,
TODT,
COALESCE(LAG(START_DATE) OVER (PARTITION BY FROMDT::DATE
ORDER BY KEY),FROMDT) AS T1_DATE,
COALESCE(LAG(END_DATE) OVER (PARTITION BY FROMDT::DATE
ORDER BY KEY),TODT) AS T2_DATE
FROM DATA)
SELECT KEY,
FROMDT,
TODT,
CASE
WHEN FROMDT = T1_DATE
AND TODT = T2_DATE THEN (TODT - FROMDT) ::Interval
WHEN T2_DATE < TODT THEN (TODT - T2_DATE)::Interval
ELSE (T2_DATE - TODT)::interval
END
FROM STAGING_DATA;

historical aggregation of a column up until a specified time in each row in another column

I have two tables login_attempts and checkouts in Amazon RedShift. A user can have multiple (un)successful login attempts and multiple (un)successful checkouts as shown in this example:
login_attempts
login_id | user_id | login | success
-------------------------------------------------------
1 | 1 | 2021-07-01 14:00:00 | 0
2 | 1 | 2021-07-01 16:00:00 | 1
3 | 2 | 2021-07-02 05:01:01 | 1
4 | 1 | 2021-07-04 03:25:34 | 0
5 | 2 | 2021-07-05 11:20:50 | 0
6 | 2 | 2021-07-07 12:34:56 | 1
and
checkouts
checkout_id | checkout_time | user_id | success
------------------------------------------------------------
1 | 2021-07-01 18:00:00 | 1 | 0
2 | 2021-07-02 06:54:32 | 2 | 1
3 | 2021-07-04 13:00:01 | 1 | 1
4 | 2021-07-08 09:05:00 | 2 | 1
Given this information, how can I get the following table with historical performance included for each checkout AS OF THAT TIME?
checkout_id | checkout | user_id | lastGoodLogin | lastFailedLogin | lastGoodCheckout | lastFailedCheckout |
---------------------------------------------------------------------------------------------------------------------------------------
1 | 2021-07-01 18:00:00 | 1 | 2021-07-01 16:00:00 | 2021-07-01 14:00:00 | NULL | NULL
2 | 2021-07-02 06:54:32 | 2 | 2021-07-02 05:01:01 | NULL | NULL | NULL
3 | 2021-07-04 13:00:01 | 1 | 2021-07-01 16:00:00 | 2021-07-04 03:25:34 | NULL | 2021-07-01 18:00:00
4 | 2021-07-08 09:05:00 | 2 | 2021-07-07 12:34:56 | 2021-07-05 11:20:50 | 2021-07-02 06:54:32 | NULL
Update: I was able to get lastFailedCheckout & lastGoodCheckout because that's doing window operations on the same table (checkouts) but I am failing to understand how to best join it with login_attempts table to get last[Good|Failed]Login fields. (sqlfiddle)
P.S.: I am open to PostgreSQL suggestions as well.
Good start! A couple things in your SQL - 1) You should really try to avoid inequality joins as these can lead to data explosions and aren't needed in this case. Just put a CASE statement inside your window function to use only the type of checkout (or login) you want. 2) You can use the frame clause to not self select the same row when finding previous checkouts.
Once you have this pattern you can use it to find the other 2 columns of data you are looking for. The first step is to UNION the tables together, not JOIN. This means making a few more columns so the data can live together but that is easy. Now you have the userid and the time the "thing" happened all in the same data. You just need to WINDOW 2 more times to pull the info you want. Lastly, you need to strip out the non-checkout rows with an outer select w/ where clause.
Like this:
create table login_attempts(
loginid smallint,
userid smallint,
login timestamp,
success smallint
);
create table checkouts(
checkoutid smallint,
userid smallint,
checkout_time timestamp,
success smallint
);
insert into login_attempts values
(1, 1, '2021-07-01 14:00:00', 0),
(2, 1, '2021-07-01 16:00:00', 1),
(3, 2, '2021-07-02 05:01:01', 1),
(4, 1, '2021-07-04 03:25:34', 0),
(5, 2, '2021-07-05 11:20:50', 0),
(6, 2, '2021-07-07 12:34:56', 1)
;
insert into checkouts values
(1, 1, '2021-07-01 18:00:00', 0),
(2, 2, '2021-07-02 06:54:32', 1),
(3, 1, '2021-07-04 13:00:01', 1),
(4, 2, '2021-07-08 09:05:00', 1)
;
SQL:
select *
from (
select
c.checkoutid,
c.userid,
c.checkout_time,
max(case success when 0 then checkout_time end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastFailedCheckout,
max(case success when 1 then checkout_time end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastGoodCheckout,
max(case lsuccess when 0 then login end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastFailedLogin,
max(case lsuccess when 1 then login end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastGoodLogin
from (
select checkout_time as event_time, checkoutid, userid,
checkout_time, success,
NULL as login, NULL as lsuccess
from checkouts
UNION ALL
select login as event_time,NULL as checkoutid, userid,
NULL as checkout_time, NULL as success,
login, success as lsuccess
from login_attempts
) c
) o
where o.checkoutid is not null
order by o.checkoutid

Data from last 12 months each month with trailing 12 months

This is TSQL and I'm trying to calculate repeat purchase rate for last 12 months. This is achieved by looking at sum of customers who have bought more than 1 time last 12 months and the total number of customers last 12 months.
The SQL code below will give me just that; but i would like to dynamically do this for the last 12 months. This is the part where i'm stuck and not should how to best achieve this.
Each month should include data going back 12 months. I.e. June should hold data between June 2018 and June 2018, May should hold data from May 2018 till May 2019.
[Order Date] is a normal datefield (yyyy-mm-dd hh:mm:ss)
DECLARE #startdate1 DATETIME
DECLARE #enddate1 DATETIME
SET #enddate1 = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE())-1, 0) -- Starting June 2018
SET #startdate1 = DATEADD(mm,DATEDIFF(mm,0,GETDATE())-13,0) -- Ending June 2019
;
with dataset as (
select [Phone No_] as who_identifier,
count(distinct([Order No_])) as mycount
from [MyCompany$Sales Invoice Header]
where [Order Date] between #startdate1 and #enddate1
group by [Phone No_]
),
frequentbuyers as (
select who_identifier, sum(mycount) as frequentbuyerscount
from dataset
where mycount > 1
group by who_identifier),
allpurchases as (
select who_identifier, sum(mycount) as allpurchasescount
from dataset
group by who_identifier
)
select sum(frequentbuyerscount) as frequentbuyercount, (select sum(allpurchasescount) from allpurchases) as allpurchasecount
from frequentbuyers
I'm hoping to achieve end result looking something like this:
...Dec, Jan, Feb, March, April, May, June each month holding both values for frequentbuyercount and allpurchasescount.
Here is the code. I made a little modification for the frequentbuyerscount and allpurchasescount. If you use a sumif like expression you don't need a second cte.
if object_id('tempdb.dbo.#tmpMonths') is not null drop table #tmpMonths
create table #tmpMonths ( MonthID datetime, StartDate datetime, EndDate datetime)
declare #MonthCount int = 12
declare #Month datetime = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0)
while #MonthCount > 0 begin
insert into #tmpMonths( MonthID, StartDate, EndDate )
select #Month, dateadd(month, -12, #Month), #Month
set #Month = dateadd(month, -1, #Month)
set #MonthCount = #MonthCount - 1
end
;with dataset as (
select m.MonthID as MonthID, [Phone No_] as who_identifier,
count(distinct([Order No_])) as mycount
from [MyCompany$Sales Invoice Header]
inner join #tmpMonths m on [Order Date] between m.StartDate and m.EndDate
group by m.MonthID, [Phone No_]
),
buyers as (
select MonthID, who_identifier
, sum(iif(mycount > 1, mycount, 0)) as frequentbuyerscount --sum only if count > 1
, sum(mycount) as allpurchasescount
from dataset
group by MonthID, who_identifier
)
select
b.MonthID
, max(tm.StartDate) StartDate, max(tm.EndDate) EndDate
, sum(b.frequentbuyerscount) as frequentbuyercount
, sum(b.allpurchasescount) as allpurchasecount
from buyers b inner join #tmpMonths tm on tm.MonthID = b.MonthID
group by b.MonthID
Be aware, that the code was tested only syntax-wise.
After the test data, this is the result:
MonthID | StartDate | EndDate | frequentbuyercount | allpurchasecount
-----------------------------------------------------------------------------
2018-08-01 | 2017-08-01 | 2018-08-01 | 340 | 3702
2018-09-01 | 2017-09-01 | 2018-09-01 | 340 | 3702
2018-10-01 | 2017-10-01 | 2018-10-01 | 340 | 3702
2018-11-01 | 2017-11-01 | 2018-11-01 | 340 | 3702
2018-12-01 | 2017-12-01 | 2018-12-01 | 340 | 3703
2019-01-01 | 2018-01-01 | 2019-01-01 | 340 | 3703
2019-02-01 | 2018-02-01 | 2019-02-01 | 2 | 8
2019-03-01 | 2018-03-01 | 2019-03-01 | 2 | 3
2019-04-01 | 2018-04-01 | 2019-04-01 | 2 | 3
2019-05-01 | 2018-05-01 | 2019-05-01 | 2 | 3
2019-06-01 | 2018-06-01 | 2019-06-01 | 2 | 3
2019-07-01 | 2018-07-01 | 2019-07-01 | 2 | 3

Selecting on a condition in window function postgresql

I am using postgresql and applying window function. previously I had to find first gid with same last name , and address(street_address and city) so i simply put last name in partition by clause in window function.
but now I have requirement to find first g_id of which last name is not same. while address is same How can I do it ?
This is what i was doing previously.
SELECT g_id as g_id,
First_value(g_id)
OVER (PARTITION BY lname,street_address , city ,
order by last_date DESC NULLS LAST )as c_id,
street_address as street_address FROM my table;
lets say this is my db
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x2 | bar | abc road | khi | 12-6-19
x3 | foo | abc road | khi | 19-6-19
x4 | harry | abc road | khi | 17-6-19
x5 | bar | xyz road | khi | 11-6-19
_________________________________________________
In previous scenario :
for if i run for the first row my c_id, it should return 'x2' as it considers these rows:
_________________________________________________
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x2 | bar | abc road | khi | 12-6-19
_________________________________________________
and return a row with latest last_date.
what i want now to select these rows (rows with same street_address and city but no same l_name):
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x3 | foo | abc road | khi | 19-6-19
x4 | harry | abc road | khi | 17-6-19
_________________________________________________
and output will be x3.
somehow i want to compare last_name column if it is not equals to the current value of last name and then partition by address field. and if no rows satisfy the condition c_id should be equal to current g_id
Looking at your expected output,it's not clear whether you want earliest or oldest for each group. You may change the ORDER BY accordingly for last_date in this query which uses DISTINCT ON
SELECT DISTINCT ON ( street_address, city, l_name) *
FROM mytable
ORDER BY street_address,
city,
l_name,
last_date --change this to last_date desc if you want latest
DEMO
After discussing the details in this chat:
demo:db<>fiddle
SELECT DISTINCT ON (t1.g_id)
t1.*,
COALESCE(t2.g_id, t1.g_id) AS g_id
FROM
mytable t1
LEFT JOIN mytable t2
ON t1.street_address = t2.street_address AND t1.l_name != t2.l_name
ORDER BY t1.g_id, t2.last_date DESC
here is how I solved it using subquery
creating example table.
CREATE TABLE mytable
("g_id" varchar(2), "l_name" varchar(5), "street_address" varchar(8), "city" varchar(3), "last_date" date)
;
INSERT INTO mytable
("g_id", "l_name", "street_address", "city", "last_date")
VALUES
('x1', 'bar', 'abc road', 'khi', '11-6-19'),
('x2', 'bar', 'abc road', 'khi', '12-6-19'),
('x3', 'foo', 'abc road', 'khi', '19-6-19'),
('x4', 'harry', 'abc road', 'khi', '17-6-19'),
('x5', 'bar', 'xyz road', 'khi', '11-6-19')
;
query to get g_ids
SELECT * ,
(select b.g_id from mytable b where (base.g_id = b.g_id) or (base.l_name <>
b.l_name and base.street_address = b.street_address and base.city = b.city )
order by b.last_date desc limit 1)
from mytable base

Postgresql Time Series for each Record

I'm having issues trying to wrap my head around how to extract some time series stats from my Postgres DB.
For example, I have several stores. I record how many sales each store made each day in a table that looks like:
+------------+----------+-------+
| Date | Store ID | Count |
+------------+----------+-------+
| 2017-02-01 | 1 | 10 |
| 2017-02-01 | 2 | 20 |
| 2017-02-03 | 1 | 11 |
| 2017-02-03 | 2 | 21 |
| 2017-02-04 | 3 | 30 |
+------------+----------+-------+
I'm trying to display this data on a bar/line graph with different lines per Store and the blank dates filled in with 0.
I have been successful getting it to show the sum per day (combining all the stores into one sum) using generate_series, but I can't figure out how to separate it out so each store has a value for each day... the result being something like:
["Store ID 1", 10, 0, 11, 0]
["Store ID 2", 20, 0, 21, 0]
["Store ID 3", 0, 0, 0, 30]
It is necessary to build a cross join dates X stores:
select store_id, array_agg(total order by date) as total
from (
select store_id, date, coalesce(sum(total), 0) as total
from
t
right join (
generate_series(
(select min(date) from t),
(select max(date) from t),
'1 day'
) gs (date)
cross join
(select distinct store_id from t) s
) using (date, store_id)
group by 1,2
) s
group by 1
order by 1
;
store_id | total
----------+-------------
1 | {10,0,11,0}
2 | {20,0,21,0}
3 | {0,0,0,30}
Sample data:
create table t (date date, store_id int, total int);
insert into t (date, store_id, total) values
('2017-02-01',1,10),
('2017-02-01',2,20),
('2017-02-03',1,11),
('2017-02-03',2,21),
('2017-02-04',3,30);