| Date | Price |
| 2022-05-11 04:00:00.0000000 +00:00 | 1 |
| 2022-05-12 04:00:00.0000000 +00:00 | 2 |
| 2022-05-13 04:00:00.0000000 +00:00 | 3 |
I have a long table which looks like above with various timestamps. I would like to select the highest price of every N days. How should I do the grouping?
Thanks #EdmCoff, In my case the answer looks like
select MAX(Price)
from MyTable
Group by DATEADD(DAY, 0, 3 * FLOOR(DATEDIFF(DAY, 0, Date) / 3) )
order by min(Date) asc```
Related
I am trying to find the daily count of frequent visitors from a very large data-set. Frequent visitors in this case are visitor IDs used on 2 distinct days in a rolling 3 day period.
My data set looks like the below:
ID | Date | Location | State | Brand |
1 | 2020-01-02 | A | CA | XYZ |
1 | 2020-01-03 | A | CA | BCA |
1 | 2020-01-04 | A | CA | XYZ |
1 | 2020-01-06 | A | CA | YQR |
1 | 2020-01-06 | A | WA | XYZ |
2 | 2020-01-02 | A | CA | XYZ |
2 | 2020-01-05 | A | CA | XYZ |
This is the result I am going for. The count in the visits column is equal to the count of distinct days from the date column, -2 days for each ID. So for ID 1 on 2020-01-05, there was a visit on the 3rd and 4th, so the count is 2.
Date | ID | Visits | Frequent Prior 3 Days
2020-01-01 |Null| Null | Null
2020-01-02 | 1 | 1 | No
2020-01-02 | 2 | 1 | No
2020-01-03 | 1 | 2 | Yes
2020-01-03 | 2 | 1 | No
2020-01-04 | 1 | 3 | Yes
2020-01-04 | 2 | 1 | No
2020-01-05 | 1 | 2 | Yes
2020-01-05 | 2 | 1 | No
2020-01-06 | 1 | 2 | Yes
2020-01-06 | 2 | 1 | No
2020-01-07 | 1 | 1 | No
2020-01-07 | 2 | 1 | No
2020-01-08 | 1 | 1 | No
2020-01-09 | 1 | null | Null
I originally tried to use the following line to get the result for the visits column, but end up with 3 in every successive row at whichever date it first got to 3 for that ID.
,
count(ID) over (Partition by ID order by Date ASC rows between 3 preceding and current row) as visits
I've scoured the forum, but every somewhat similar question seems to involve counting the values rather than the dates and haven't been able to figure out how to tweak to get what I need. Any help is much appreciated.
You can aggregate the dataset by user and date, then use window functions with a range frame to look at the three preceding rows.
You did not tell which database you are running - and not all databases support the window ranges, nor have the same syntax for literal intervals. In standard SQL, you would go:
select
id,
date,
count(*) cnt_visits
case
when sum(count(*)) over(
partition by id
order by date
range between interval '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from mytable
group by id, date
On the other hand, if you want a record for every user and every day (event when there is no visit), then it is a bit different. You can generate the dataset first, then bring the table with a left join:
select
i.id,
d.date,
count(t.id) cnt_visits,
case
when sum(count(t.id)) over(
partition by i.id
order by d.date
rows between '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from (select distinct id from mytable) i
cross join (select distinct date from mytable) d
left join mytable t
on t.date = d.date
and t.id = i.id
group by i.id, d.date
I would be inclined to approach this by expanding out the days and visitors using a cross join and then just window functions. Assuming you have all dates in the data:
select i.id, d.date,
count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) as cnt_visits,
(case when count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) >= 2
then 'Yes' else 'No'
end) as is_frequent_visitor
from (select distinct id from t) i cross join
(select distinct date from t) d left join
(select distinct id, date from t) t
on t.date = d.date and
t.id = i.id;
Scenario:
I have a table, events_table, that consists of records that are inserted by a webhook based on messages I send to my users:
"column_name" (type)
- "time_stamp" (timestamp with time zone)
- "username" (varchar)
- "delivered" (int)
- "action" (int)
Sample Data:
| time_stamp | username | delivered | action |
|:----------------|:---------|:----------|:-------|
|1349733421.460000| user1 | 1 | null |
|1549345346.460000| user3 | 1 | 1 |
|1524544421.460000| user1 | 1 | 1 |
|1345444421.570000| user7 | 1 | null |
|1756756761.980000| user9 | 1 | null |
|1234343421.460000| user171 | 1 | 1 |
|1843455621.460000| user5 | 1 | 1 |
| ... | ... | ... | ... |
The "delivered" column is null by default and 1 when delivered. The "action" column is null by default and is 1 when opened.
Problem:
Using PostgreSQL, how can I count the amount of individuals that opened an email in the previous 30 days from the Monday of each week?
Ideal query results:
| date | count |
|:----------------|:----------|
| 02/24/2020 | 1,234,123 |
| 02/17/2020 | 234,123 |
| 02/10/2020 | 1,234,123 |
| 02/03/2020 |12,341,213 |
| ... | ... |
My attempt:
This is the extent of what I've tried which gives me count of the previous week:
SELECT
date_trunc('week', to_timestamp("time_stamp")) as date,
count("username") as count,
lag(count(1), 1) over (order by "date") as "count_previous_week"
FROM events_table
WHERE "delivered" = 1
and "action" = 1
GROUP BY 1 order by 1 desc
This is my attempt at writing this query.
First I get the lowest and highest dates from the data set. I add 7 days on to the highest date to make sure I include data up to today.
I then run generate_series against these 2 values set with an interval of 7 days to give me every single monday between the 2 points (we can't rely on just mondays within your data set in case we have an empty week)
Then, I simply subquery and aggregate the data based on our generate_series output.
select
__weeks.week_begins,
(
select
count(distinct "username")
from
events_table
where
to_timestamp("time_stamp")::date between week_begins - '30 days'::interval and week_begins
and "delivered" = 1
and "action" = 1
)
from
(
select
generate_series(_.min_date, _.max_date, '7 days'::interval)::date as week_begins
from
(
select
min(date_trunc('week', to_timestamp("time_stamp"))::date) as min_date
max(date_trunc('week', to_timestamp("time_stamp"))::date) as max_date
from
events_table
where
"delivered" = 1
and "action" = 1
) as _
) as __weeks
order by
__weeks.week_begins
I'm not particularly keen on this query because the query planner visits the same table twice, but I can't think of another way to structure it.
I'm using PostgreSQL and this is my table measurement_archive:
+-----------+------------------------+------+-------+
| sensor_id | time | type | value |
+-----------+------------------------+------+-------+
| 123 | 2017-11-26 01:53:11+00 | PM25 | 34.32 |
+-----------+------------------------+------+-------+
| 123 | 2017-11-26 02:15:11+00 | PM25 | 32.1 |
+-----------+------------------------+------+-------+
| 123 | 2017-11-26 04:32:11+00 | PM25 | 75.3 |
+-----------+------------------------+------+-------+
I need a query that will take records from specified timeframe (eg. from 2017-01-01 00:00:00 to 2017-12-01 23:59:59) and then check if in every hour there is at least 1 record - if there is, then add 1 to result.
So, if I make that query from 2017-11-26 01:00:00 to 2017-11-26 04:59:59+00 for sensor_id == 123 on above table then the result should be 3.
select count(*)
from (
select date_trunc('hour', time) as time
from measurement_archive
where
time >= '2017-11-26 01:00:00' and time < '2017-11-26 05:00:00'
and
sensor_id = 123
group by 1
) s
alternative solution would be using distinct,
select count(*) from (select distinct a, extract(hour from time) from t where time >'2017-11-26 01:00:11' and time <'2017-11-26 05:00:00' and sensor_id=123)t;
SQL Fiddle: http://sqlfiddle.com/#!15/1da00/5
I have a table that looks something like this:
products
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| chair | 50 | 10/12/2016 | 1/4/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| toaster | 20 | 1/9/2017 | 1/9/2017 |
+-----------+-------+--------------+--------------+
I want to order this table in a way where if the product was created less than 30 days those results should show first (and be ordered by the updated date). If the product was created 30 or more days ago I want it to show after (and have it ordered by updated date within that group)
This is what the result should look like:
products - desired results
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| toaster | 20 | 1/9/2017 | 1/9/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| chair | 50 | 10/12/2016 | 1/4/2017 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
+-----------+-------+--------------+--------------+
I've started writing this query:
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
ORDER BY order_index, created_date DESC
but that only bring the rows with created_date less thatn 30 days to the top, and then ordered by created_date. I want to also sort the rows where order_index = 1 by updated_date
Unfortunately in version 9.3 only positional column numbers or expressions involving table columns can be used in order by so order_index is not available to case at all and its position is not well defined because it comes after * in the column list.
This will work.
order by
created_date <= ( current_date - 30 ) , case
when created_date > ( current_date - 30 ) then created_date
else updated_date end desc
Alternatively a common table expression can be used to wrap the result and then that can be ordered by any column.
WITH q AS(
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
)
SELECT * FROM q
ORDER BY
order_index ,
CASE order_index
WHEN 0 THEN created_date
WHEN 1 THEN updated_date
END DESC;
A third approach is to exploit nulls.
order by
case
when created_date > ( current_date - 30 ) then created_date
end desc nulls last,
updated_date desc;
This approach can be useful when the ordering columns are of different types.
I have a query returns something like that:
registered_at - date of user registration;
action_at - date of some kind of action.
| registered_at | user_id | action_at |
-------------------------------------------------------
| 2015-05-01 12:00:00 | 1 | 2015-05-04 12:00:00 |
| 2015-05-01 12:00:00 | 1 | 2015-05-10 12:00:00 |
| 2015-05-01 12:00:00 | 1 | 2015-05-16 12:00:00 |
| 2015-04-01 12:00:00 | 2 | 2015-04-04 12:00:00 |
| 2015-04-01 12:00:00 | 2 | 2015-04-05 12:00:00 |
| 2015-04-01 12:00:00 | 2 | 2015-04-10 12:00:00 |
| 2015-04-01 12:00:00 | 2 | 2015-04-30 12:00:00 |
I'm trying to implement query that will returns me something like that:
weeks_after_registration - in this example limited by 3, in real task it will be limited by 6.
| user_id | weeks_after_registration | action_counts |
-------------------------------------------------------
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 2 |
| 2 | 2 | 1 |
| 2 | 3 | 0 |
You can use extract(days from (action_at - registered_at) / 7)+1 to get the number of weeks. Then count the number of actions grouped by the number of weeks.
select user_id, wk, count(*) actions
from (select user_id, extract(days from (action_at - registered_at) / 7)+1 wk from Table1) a
where wk <= 3
group by user_id, wk
If you must display rows where action_counts = 0 in the result, then you need to join with the all possible week numbers (1, 2, 3) and all possible user_ids (1, 2) like:
select b.user_id, a.wk, coalesce(c.actions, 0) actions
from (select * from generate_series(1, 3) wk) a
join (select distinct user_id from Table1) b on true
left join (
select user_id, wk, count(*) actions
from (select user_id, extract(days from (action_at - registered_at) / 7)+1 wk from Table1) a
where wk <= 3
group by user_id, wk
) c on a.wk = c.wk and b.user_id = c.user_id
order by b.user_id, a.wk;
fiddle