t-sql - compute the difference between dates for each row - tsql

Can you show how can this be done in t-sql?
sample records
accountnumber trandate
-------------------------
1000 02-11-2010
1000 02-12-2010
1000 02-13-2010
2000 02-10-2010
2000 02-15-2010
How to compute the # of days between each transactions for each accountnumber?
like this
accountnumber trandate # of days
----------------------------------------
1000 02-11-2010 0
1000 02-12-2010 1
1000 02-13-2010 1
2000 02-10-2010 0
2000 02-15-2010 5
Thanks a lot!

SELECT accountnumber,
trandate,
Datediff(DAY, a.trandate, (SELECT TOP 1 trandate
FROM mytable b
WHERE b.trandate > a.trandate
ORDER BY trandate))
FROM mytable a
ORDER BY trandate

you can use between and
select * from table1 where trandate between 'date1' and 'date2'

Hope this helps.
Select A.AccountNo, A.TranDate, B.TranDate as PreviousTranDate, A.TranDate - B.Trandate as NoOfDays
from
(Select AccountNo, TranDate, Row_Number() as RNO over (Partition by AccountNo order by TranDate)) as A,
(Select AccountNo, TranDate, Row_Number() as RNO over (Partition by AccountNo order by TranDate)) as B
Where A.AccountNo = B.AccountNo and A.RNO -1 = B.RNO
You can also use a CTE expression to increase preformance.

Related

A query to get per month data for all months and calculate percentage per month per type

From the DB (Postgresql) I want to get the percentage per month (of all months) of stock items with a certain condition. So the total of the whole month is 100% and per condition it would be a percentage of that. I'm trying all kinds of 'partition by' queries, but i quite can't get it right.
In the example there would be an extra column and on each row there would be the percentage of that month. So the value for the new column for the first row it would be 25/506*100.
Right now I have and works is:
select to_char(created_at, 'YYYY-MM') as maand, count(si.id) as aantal,
case
when condition_id=1 then 'Nieuw'
when condition_id=2 then 'Als nieuw'
when condition_id=3 then 'Goed'
when condition_id=4 then 'Redelijk'
when condition_id=5 then 'Matig'
else 'Onbepaald'
end
from stock_items si
group by maand, condition_id
order by maand desc, condition_id asc
maand
aantal
case
new column
2022-01
25
Nieuw
25/506*100
2022-01
234
Als nieuw
234/506*100
2022-01
127
Goed
127/506*100
2022-01
16
Redelijk
16/506*100
2022-01
104
Matig
104/506*100
2021-12
456
Nieuw
other month
I hope it's all clear. Thanks!
I got what I wanted. To realise i want it a little different, but this is the answer to my question.
select
to_char(created_at, 'YYYY-MM') as maand,
count(id) as aantal,
round((count(id) / (sum(count(id)) over (partition by to_char(created_at, 'YYYY-MM'))) * 100), 2) as percentage,
case
when condition_id=1 then 'Nieuw'
when condition_id=2 then 'Als nieuw'
when condition_id=3 then 'Goed'
when condition_id=4 then 'Redelijk'
when condition_id=5 then 'Matig'
else 'Onbepaald'
end
from stock_items
group by maand, condition_id
order by maand desc, condition_id asc
just warp it with CTE.
with a as (
select to_char(created_at, 'YYYY-MM') as maand, count(si.id) as aantal,
case
when condition_id=1 then 'Nieuw'
when condition_id=2 then 'Als nieuw'
when condition_id=3 then 'Goed'
when condition_id=4 then 'Redelijk'
when condition_id=5 then 'Matig'
else 'Onbepaald'
end as case
from stock_items si
group by maand, condition_id
order by maand desc, condition_id asc)
select a.*, aantal * 100 / sum(aantal) over (PARTITION BY maand) as anntal_rate from a;
/* some characters so the edit is accepted */

Quartiles calculation in Postgresql Query

I am having a hard time trying to get this done. I have the following table:
cod_prod seller price date
A Andres 10 anydate
A Paul 5 anydate
A Mike 2.5 anydate
A Josh 1.75 anydate
A Karen 7.5 anydate
.... ..... ... .......
I am trying to calculate quartiles of the price for each product and classify each seller's price into 4 quartiles.
The output I am expecting is:
Cod_Prod Seller Price Quartile 1stQ 2ndQ 3rdQ 4thQ
A Andres 10 4 2.5 5 7.5 10
A Karen 7.5 3 2.5 5 7.5 10
A Paul 5 2 2.5 5 7.5 10
A Mike 2.5 1 2.5 5 7.5 10
A Josh 1.75 1 2.5 5 7.5 10
.. ..... .... .... .... .. ... ...
This table has thousands of distinct cod_prod and thousands of sellers.
I am trying this query:
with cte as (
select seller, cod_prod, sum(price) as sum_price
from tablename
group by 2,1
)
select seller,
cod_prod,
sum_price,
ntile(4) over (partition by seller order by sum_price asc) quartile
from cte
But this not doing what I expect and still mising the 1stQ to 4thQ indicators bins
I tried many different things but this is the closest I got from what I want.
Can someone help me to solve it?
I am not sure if this query is exactly what you want, but I think can help you.
I calculated quartiles grouping by cod_prod.
WITH cte AS (SELECT seller, cod_prod, sum(price) as sum_price
FROM t
GROUP BY seller, cod_prod),
quartiles AS (SELECT
cod_prod,
percentile_cont(0.25) within group (order by sum_price asc) as "1stQ",
percentile_cont(0.50) within group (order by sum_price asc) as "2ndQ",
percentile_cont(0.75) within group (order by sum_price asc) as "3rdQ",
percentile_cont(1) within group (order by sum_price asc) as "4thQ"
FROM cte
GROUP BY cod_prod)
SELECT cte.*,
ntile(4) over (PARTITION BY cte.cod_prod ORDER BY sum_price ASC) quartile,
quartiles.*
FROM cte
INNER JOIN quartiles ON cte.cod_prod = quartiles.cod_prod;

Cohort Analysis with RedShift by Month

I am trying to build a cohort analysis for monthly retention but experiencing challenge getting the Month Number column right. The month number is supposed to return month(s) user transacted i.e 0 for registration month, 1 for the first month after registration month, 2 for the second month until the last month but currently, it returns negative month numbers in some cells.
It should be like this table:
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
Here is the SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
I think the RANK window function is the right solution. So the idea is to assigne a rank to months of user activities for each user, order by year and month.
Something like:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
RANK number starts from 1, so you may want to substract 1.
Then, you can group by user_id and month_number to get the number of interactions for each user per month from the subscription (adapt to your use case accordingly).
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2
Here is the documentation:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html

Getting fortnight from timestamp in Postgres

I'm doing some cohort analysis and want to see for a group of customers in November, how many transact weekly, fortnightly, and monthly; and for how long
I have this for the week and month (weekly example):
WITH weekly_users AS (
SELECT user_fk
, DATE_TRUNC('week',created_at) AS week
, (DATE_PART('year', created_at) - 2016) * 52 + DATE_PART('week', created_at) - 45 AS weeks_between
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY user_fk, week, weeks_between
),
t2 AS (
SELECT weekly_users.*
, COUNT(*) OVER (PARTITION BY user_fk
ORDER BY week ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING) AS prev_rec_cnt
FROM weekly_users
)
SELECT week
, COUNT(*)
FROM t2
WHERE weeks_between = prev_rec_cnt
GROUP BY week
ORDER BY week;
But weekly is too little of an interval, and monthly too much. So I want fortnight. Has anyone done this before? From Googling it seems like a challenge
Thanks in advance
Just worked it out, this is how you'd do it:
WITH fortnightly_users AS (
SELECT user_fk
, EXTRACT(YEAR FROM created_at) * 100 + CEIL(EXTRACT(WEEK FROM created_at)/2) AS fortnight
, (EXTRACT(YEAR FROM created_at) - 2016) * 26 + CEIL(EXTRACT(WEEK FROM created_at)/2) - 23 AS fortnights_between
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY user_fk, fortnight, fortnights_between
),
t2 AS (
SELECT fortnightly_users.*
, COUNT(*) OVER (PARTITION BY user_fk
ORDER BY fortnight ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING) AS prev_rec_cnt
FROM fortnightly_users
)
SELECT fortnight
, COUNT(*)
FROM t2
WHERE fortnights_between = prev_rec_cnt
GROUP BY fortnight
ORDER BY fortnight;
So you get the week number, then divide by 2. Rounding up to avoid fractional numbers for fortnights

Checking missing hours for every id in a table

I have a table that contains column for id-s (id_code) and a time for transaction (time). What I need is to figure out those hours between two dates for each id where no transaction took place. Lets say i need to check missing hours for id 1 and id 2 from a table below between 2014-06-13 12:00:00 and 2014-06-13 14:59:59 - the desired result would be that id 1 has missing transactions 2014-06-13 13:00:00 and id 2 is missing transactions 2014-06-13 14:00:00.
id_code | time
1 | 2014-06-13 12:23:12
2 | 2014-06-13 12:27:23
1 | 2014-06-13 12:56:21
2 | 2014-06-13 13:34:12
1 | 2014-06-13 14:23:56
I am using PostgreSQL 9.3
SQL Fiddle
select c.id, d.time
from
(
select distinct id
from t
) c
cross join
generate_series (
(select date_trunc('hour', min(t.time)) from t),
(select date_trunc('hour', max(t.time)) from t),
interval '1 hour'
) d(time)
left join
(
select id, date_trunc('hour', t.time) as time
from t
group by id, 2
) t on t.time = d.time and c.id = t.id
where t.time is null
order by c.id, d.time
The generate_series will build a set of all possible hours. The cross join will make that a matrix of all possible ids of all possible hours. Then the t.time is null condition will filter those id x hours that do not exist.
SELECT DISTINCT id, h FROM t, generate_series('2014-06-13 12:00:00'::timestamp, '2014-06-13 14:59:59'::timestamp, '1 hour') h
EXCEPT
SELECT id, date_trunc('hour', time) FROM t
Thanks to Clodoaldo Neto for providing a useful SQL Fiddle page for testing!