Is there a redshift equivalent to extract year from age?
WITH orders_shipped_at AS (
SELECT DISTINCT user_id,
date_trunc('month',shipped_at_et) as time_pd
from orders
),
lag_lead AS (
SELECT user_id,
time_pd,
lag(time_pd,1) OVER (PARTITION BY user_id ORDER BY user_id,
time_pd),
lead(time_pd,2) OVER (PARTITION BY user_id ORDER BY user_id,
time_pd)
FROM orders_shipped_at
)
SELECT user_id,
time_pd,
lag,
lead,
extract(year from age(time_pd, lag))*12 + extract(month from
(age(time_pd, lag))) lag_size,
extract(year from age(lead, time_pd))*12 + extract(month
from (age(lead, time_pd))) lead_size
FROM lag_lead
My error is this : Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift
AGE() is not supported. You can use DATEDIFF(). https://docs.aws.amazon.com/redshift/latest/dg/r_DATEDIFF_function.html
SELECT DATEDIFF(YEAR, '2000-01-01', '2009-12-31') AS years;
-- years
-- -------
-- 9
SELECT DATEDIFF(YEAR, '1999-12-31', '2009-12-31') AS years;
-- years
-- -------
-- 10
Related
I want to get the number of consecutive days from the current date using Postgres SQL.
enter image description here
Above is the scenario in which I have highlighted consecutive days count should be like this.
Below is the SQL query which I have created but it's not returning the expected result
with grouped_dates as (
select user_id, created_at::timestamp::date,
(created_at::timestamp::date - (row_number() over (partition by user_id order by created_at::timestamp::date) || ' days')::interval)::date as grouping_date
from watch_history
)
select * , dense_rank() over (partition by grouping_date order by created_at::timestamp::date) as in_streak
from grouped_dates where user_id = 702
order by created_at::timestamp::date
Can anyone please help me to resolve this issue?
If anyhow we can able to apply distinct for created_at field to below query then I will get solutions for my issue.
WITH list AS
(
SELECT user_id,
(created_at::timestamp::date - (row_number() over (partition by user_id order by created_at::timestamp::date) || ' days')::interval)::date as next_day
FROM watch_history
)
SELECT user_id, count(*) AS number_of_consecutive_days
FROM list
WHERE next_day IS NOT NULL
GROUP BY user_id
Does anyone have an idea how to apply distinct to created_at for the above mentioned query ?
To get the "number of consecutive days" for the same user_id :
WITH list AS
(
SELECT user_id
, array_agg(created_at) OVER (PARTITION BY user_id ORDER BY created_at RANGE BETWEEN CURRENT ROW AND '1 day' FOLLOWING) AS consecutive_days
FROM watch_history
)
SELECT user_id, count(DISTINCT d.day) AS number_of_consecutive_days
FROM list
CROSS JOIN LATERAL unnest(consecutive_days) AS d(day)
WHERE array_length(consecutive_days, 1) > 1
GROUP BY user_id
To get the list of "consecutive days" for the same user_id :
WITH list AS
(
SELECT user_id
, array_agg(created_at) OVER (PARTITION BY user_id ORDER BY created_at RANGE BETWEEN CURRENT ROW AND '1 day' FOLLOWING) AS consecutive_days
FROM watch_history
)
SELECT user_id
, array_agg(DISTINCT d.day ORDER BY d.day) AS list_of_consecutive_days
FROM list
CROSS JOIN LATERAL unnest(consecutive_days) AS d(day)
WHERE array_length(consecutive_days, 1) > 1
GROUP BY user_id
full example & result in dbfiddle
I have a table with:
ID (id client), date_start (subscription of SaaS), date_end (could be a date value or be NULL).
So I need a cumulative count of active clients month by month.
any idea on how to write that in Postgres and achieve this result?
Starting from this, but I don't know how to proceed
select
date_trunc('month', c.date_start)::date,
count(*)
from customer
Please check next solution:
select
subscrubed_date,
subscrubed_customers,
unsubscrubed_customers,
coalesce(subscrubed_customers, 0) - coalesce(unsubscrubed_customers, 0) cumulative
from (
select distinct
date_trunc('month', c.date_start)::date subscrubed_date,
sum(1) over (order by date_trunc('month', c.date_start)) subscrubed_customers
from customer c
order by subscrubed_date
) subscribed
left join (
select distinct
date_trunc('month', c.date_end)::date unsubscrubed_date,
sum(1) over (order by date_trunc('month', c.date_end)) unsubscrubed_customers
from customer c
where date_end is not null
order by unsubscrubed_date
) unsubscribed on subscribed.subscrubed_date = unsubscribed.unsubscrubed_date;
share SQL query
You have a table of customers. With a start date and sometimes an end date. As you want to group by date, but there are two dates in the table, you need to split these first.
Then, you may have months where only customers came and others where only customers left. So, you'll want a full outer join of the two sets.
For a cumulative sum (also called a running total), use SUM OVER.
with came as
(
select date_trunc('month', date_start) as month, count(*) as cnt
from customer
group by date_trunc('month', date_start)
)
, went as
(
select date_trunc('month', date_end) as month, count(*) as cnt
from customer
where date_end is not null
group by date_trunc('month', date_end)
)
select
month,
came.cnt as cust_new,
went.cnt as cust_gone,
sum(came.cnt - went.cnt) over (order by month) as cust_active
from came full outer join went using (month)
order by month;
I have this situation where people asked to group on bad addresses. And I need to work on the tools/env I have, I don't have choice for Google API or 3rd party Data Science tools. I also did my HW, see posts several years old, so still want to check all if any updates available.
In my scenario people want to group IDs 1-6 into single, rest I added for neg test.
SELECT * INTO #t FROM ( --test data: select * from #t drop table #t
SELECT 1 Id, '1 CROLANA HEIGHTS' Adr UNION -- A vs O
SELECT 2 Id, '1 CROLONA HEIGHTS' Adr union
SELECT 3 Id, '1 CROLONA HEIGHT DRIVE' Adr union
SELECT 4 Id,'1 CROLONA HEIGHTS DR' Adr union
SELECT 5 Id, '1 CROLONA HGHTS DR' Adr union
SELECT 6 Id, '1 CROLONA HTS DR' Adr UNION
---------------------------------------- rest should not match
SELECT 7 Id, '1 CORWING DR' Adr UNION
SELECT 8 Id, '1 SUNNYHILL DRIVE' Adr UNION
SELECT 9 Id, '1 CROWN HILL DR' Adr UNION
SELECT 10 Id, '1 ADDISON DRv' Adr ) a
------------------- and below is my fuzzy working script which can be improved)
SELECT id, adr, LEAD(adr,1) OVER ( ORDER BY adr ) adr_lead,
SOUNDEX(adr) Sdx, DIFFERENCE(adr, LEAD(adr,1) OVER ( ORDER BY adr )) diff
--- SOUNDEX(adr), COUNT(*) c
FROM #t
--GROUP BY SOUNDEX(adr)
WHERE SOUNDEX(adr) = SOUNDEX('1 CROLANA HEIGHTS')
There is suggestions which I gladly take. I'm using intell replace at the end of string and standalone words to improve data.
DECLARE #st VARCHAR(100) = 'La_Beg_10 La_midleMacy La' --replace et the end of string
SELECT 'ryba', #st, '-->' f, CASE WHEN #st LIKE '%' + ' La'
THEN SUBSTRING(#st,1,LEN(#st) - LEN('La')) + 'Lane' ELSE #st END N
Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)
When I run the following :
DELETE FROM db2Name.MyTable
WHERE (NBR, CATEGORY, SALES_DATE) IN
(
SELECT NBR, CATEGORY, SALES_DATE, RN
FROM
(
SELECT
NBR, CATEGORY, SALES_DATE,
rownumber() over(PARTITION BY NBR, CATEGORY, SALES_DATE) RN
FROM db2Name.DELI_DATA
WHERE NBR = 10 AND SALES_DATE = CURRENT_DATE - 7 DAYS
) A
WHERE RN > 1
GROUP BY NBR, CATEGORY, SALES_DATE, RN
)
I get error SQL0216N
I'm trying to delete duplicate records leaving one in the table when I experiment with similar queries, they're removing all records matching and leaving none
the table def looks like this
NBR int
addr string
market int
category string
sales decimal
Sales_last_month decimal
sales_date date
I can see that mostly likely the issue is in this clause not having the RN (that's why it is not matching the group by)
WHERE (NBR, CATEGORY, SALES_DATE) IN
but if I add it, it does not recognize it .... and it fails
This could be simply
DELETE FROM (
SELECT * FROM
(
SELECT
rownumber() over (PARTITION BY NBR, CATEGORY, SALES_DATE) RN
FROM db2Name.DELI_DATA
WHERE NBR = 10 AND SALES_DATE = CURRENT_DATE - 7 DAYS
) A
WHERE RN > 1
)