I am trying to find the max value of a column, within a date range (by day), within the context of a third column. I'm an SQL newbie, so go easy.
Basically - max value, by day, by cell_id.
Each "Weather_Cell_ID" has data for each hour, for each day. Sample below.
Notice: The are multiple dates
My current PostgreSQL table looks roughly like:
+-----------------+------------------+------------+
| Weather_Cell_ID | Dates | Wind_Speed |
+-----------------+------------------+------------+
| 0001 | 2019-01-21 01:00 | 4.6 |
| 0001 | 2019-01-21 02:00 | 2.4 |
| 0001 | 2019-01-21 04:00 | 8.5 |
| 0001 | 2019-01-22 10:00 | 6.2 |
| 0001 | 2019-01-21 14:00 | 14.8 |
| 0002 | 2019-01-21 01:00 | 3.5 |
| 0002 | 2019-01-21 05:00 | 9.6 |
| 0002 | 2019-01-22 06:00 | 4.8 |
| 0002 | 2019-01-21 16:00 | 12.2 |
| 0002 | 2019-01-21 08:00 | 4.6 |
| 0003 | 2019-01-21 03:00 | 4.9 |
+-----------------+------------------+------------+
My current code looks like:
select weather_cell_id,
date_trunc('day', dates) as dates,
max(windspeed) as maxwindspeed
from view_6day_mat
GROUP BY weather_cell_id, view_6day_mat.dates
order by weather_cell_id
This however produces basically the same table with the HH:MI set to 00:00.
What I'm hoping to see as an output is:
+-----------------+------------+------------+
| Weather_Cell_ID | Dates | Wind_Speed |
+-----------------+------------+------------+
| 0001 | 2019-01-21 | 14.8 |
| 0001 | 2019-01-22 | 6.2 |
| 0002 | 2019-01-21 | 12.2 |
| 0002 | 2019-01-22 | 4.8 |
| 0003 | 2019-01-21 | 4.9 |
+-----------------+------------+------------+
select
weather_cell_id,
dates::date as dates_date,
max(windspeed) as maxwindspeed
from view_6day_mat
GROUP BY weather_cell_id, dates_date
order by weather_cell_id
Related
I need help to find and count orders in certain consecutive period of time from 'sales_track' table from users that has minimum of two or more transactions (rephrase: How many users have 2 or more transactions in period of n-days without skipping even a day)
sales_track
sales_tx_id | u_id | create_date | item_id | price
------------|------|-------------|---------|---------
ffff-0291 | 0001 | 2019-08-01 | 0300 | 5.00
ffff-0292 | 0001 | 2019-08-01 | 0301 | 2.50
ffff-0293 | 0002 | 2019-08-01 | 0209 | 3.50
ffff-0294 | 0003 | 2019-08-01 | 0020 | 1.00
ffff-0295 | 0001 | 2019-08-02 | 0301 | 2.50
ffff-0296 | 0001 | 2019-08-02 | 0300 | 5.00
ffff-0297 | 0001 | 2019-08-02 | 0209 | 3.50
ffff-0298 | 0002 | 2019-08-02 | 0300 | 5.00
For simplicity sake sample is for two consecutive days (period of time is between 2019-08-01 and 2019-08-02) only, in real operation I would have to search eg. 10 consecutive days transaction.
I'm able so far to find the minimum two or more transactions.
SELECT user_id, COUNT (user_id) FROM sales_track WHERE created_at BETWEEN
('2019-08-01') AND ('2019-08-02')
GROUP BY u_id HAVING COUNT (sales_tx_id) >= 2;
The output I'm looking for is like:
u_id | tx_count | tx_amount
------|----------|------------
0001 | 5 | 18.50
Thank you in advance your help.
step-by-step demo:db<>fiddle
First: My extended data set:
sales_tx_id | user_id | created_at | item_id | price
:---------- | :------ | :--------- | :------ | ----:
ffff-0291 | 0001 | 2019-08-01 | 0300 | 5.00
ffff-0292 | 0001 | 2019-08-01 | 0301 | 2.50
ffff-0293 | 0002 | 2019-08-01 | 0209 | 3.50
ffff-0294 | 0003 | 2019-08-01 | 0020 | 1.00
ffff-0295 | 0001 | 2019-08-02 | 0301 | 2.50
ffff-0296 | 0001 | 2019-08-02 | 0300 | 5.00
ffff-0297 | 0001 | 2019-08-02 | 0209 | 3.50
ffff-0298 | 0002 | 2019-08-02 | 0300 | 5.00
ffff-0299 | 0001 | 2019-08-05 | 0209 | 3.50
ffff-0300 | 0001 | 2019-08-05 | 0020 | 1.00
ffff-0301 | 0001 | 2019-08-06 | 0209 | 3.50
ffff-0302 | 0001 | 2019-08-06 | 0020 | 1.00
ffff-0303 | 0001 | 2019-08-07 | 0209 | 3.50
ffff-0304 | 0001 | 2019-08-07 | 0020 | 1.00
ffff-0305 | 0002 | 2019-08-08 | 0300 | 5.00
ffff-0306 | 0002 | 2019-08-08 | 0301 | 2.50
ffff-0307 | 0001 | 2019-08-09 | 0209 | 3.50
ffff-0308 | 0001 | 2019-08-09 | 0020 | 1.00
ffff-0309 | 0002 | 2019-08-09 | 0300 | 5.00
ffff-0310 | 0002 | 2019-08-09 | 0301 | 2.50
ffff-0311 | 0001 | 2019-08-10 | 0209 | 3.50
ffff-0312 | 0001 | 2019-08-10 | 0020 | 1.00
ffff-0313 | 0002 | 2019-08-10 | 0300 | 5.00
User 1 has 3 streaks:
2019-08-01, 2019-08-02
2019-08-05, 2019-08-06, 2019-08-07
2019-08-09, 2019-08-10
User 2:
Has transaction at 2019-08-01, 2019-08-02, but only one each date, so that does not count
Has streak on 2019-08-08, 2019-08-09 (2019-08-10 has only one transaction, does not extend streak)
So we are expecting 4 rows: 3 for each user 1 streak, 1 for user 2
SELECT -- 4
user_id,
SUM(count),
SUM(price),
MIN(created_at) AS consecutive_start
FROM (
SELECT *, -- 3
SUM(is_in_same_group) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id
FROM (
SELECT -- 2
*,
(lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) + 1 <> created_at)::int as is_in_same_group
FROM (
SELECT -- 1
created_at,
user_id,
COUNT(*),
SUM(price) AS price
FROM
sales_track
WHERE created_at BETWEEN '2018-02-01' AND '2019-08-11'
GROUP BY created_at, user_id
HAVING COUNT(*) >= 2
) s
) s
) s
GROUP BY user_id, group_id
Grouping all (created_at, user_id) groups and remove those with COUNT() < 2
the lag() window function allows to get the value of the previous record within one ordered group. The group here is the user_id. The check here is: If the current created_at value is the next to the previous (current + 1) then 0, else 1.
Now we can use the cummulative SUM() window function to sum these values: The value increases if the gap is too big (if value is 1) otherwise it is the same value as the previous date. Now we got a group_id for all dates that only differ +1
Finally these groups can be grouped for SUM() and COUNT()
I am trying to make one of those stock charts using postgreSQL that will look like the following.
My data would look something like this:
stock_data
stock_price trade_datetime
5.1 | 1/1/2000 1:00 PM
6.2 | 1/1/2000 2:00 PM
5.0 | 1/2/2000 1:00 PM
3.4 | 1/2/2000 2:00 PM
4.8 | 1/2/2000 3:00 PM
7.0 | 1/3/2000 2:30 PM
5.9 | 1/3/2000 5:55 PM
Desired result
MIN | MAX | AVG | close | date
5.1 | 6.2 | 5.65| 6.2 | 1/1/2000
3.4 | 5.0 | 4.4 | 4.8 | 1/2/2000
5.9 | 7.0 | 6.45| 5.9 | 1/3/2000
I am thinking I probably need to use windowed functions, but I just can't seem to get this one right.
You can do this by using the expected aggregate functions and then joining to a derived table that uses the LAST_VALUE window function:
SELECT
MIN(stock_price) AS "MIN"
, MAX(stock_price) AS "MAX"
, AVG(stock_price) AS "AVG"
, MAX(closing.closing_price) AS "close"
, trade_datetime::date AS "date"
FROM
stock_data
INNER JOIN LATERAL (
SELECT
LAST_VALUE(stock_price) OVER (PARTITION BY trade_datetime::date) AS closing_price
FROM
stock_data AS closing_data
WHERE closing_data.trade_datetime::date = stock_data.trade_datetime::date
) AS closing ON true
GROUP BY
trade_datetime::date
ORDER BY
trade_datetime::date ASC
Yields:
| MIN | MAX | AVG | close | date |
| --- | --- | ------------------ | ----- | ------------------------ |
| 5.1 | 6.2 | 5.6500000000000000 | 6.2 | 2000-01-01T00:00:00.000Z |
| 3.4 | 5.0 | 4.4000000000000000 | 4.8 | 2000-01-02T00:00:00.000Z |
| 5.9 | 7.0 | 6.4500000000000000 | 5.9 | 2000-01-03T00:00:00.000Z |
DB Fiddle
I have a Spark DataFrame with the following entries:
| order id | time | amt |
| 1 | 2017-10-01 12:00 | 100 |
| 2 | 2017-10-01 15:00 | 100 |
| 3 | 2017-10-01 17:00 | 100 |
| 4 | 2017-10-02 16:00 | 100 |
| 5 | 2017-10-02 23:00 | 100 |
I want to add a column amount_prev_24h that has, for each order id, the sum of amt for all orders in the last 24 hours.
| order id | time | amt | amt_24h
| 1 | 2017-10-01 12:00 | 100 | 0
| 2 | 2017-10-01 15:00 | 100 | 100
| 3 | 2017-10-01 17:00 | 100 | 200
| 4 | 2017-10-02 16:00 | 100 | 100
| 5 | 2017-10-02 23:00 | 100 | 100
How would I go about doing it?
This is a pyspark code and similar to scala API.
df = df.withColumn('time_uts', unix_timestamp('time', format='yyyy-MM-dd HH:mm'))
df = df.withColumn('amt_24h', sum('amt').over(Window.orderBy('time_uts').rangeBetween(-24 * 3600, -1))).fillna(0, subset='amt_24h')
I hope this may help you.
From a table of "time entries" I'm trying to create a report of weekly totals for each user.
Sample of the table:
+-----+---------+-------------------------+--------------+
| id | user_id | start_time | hours_worked |
+-----+---------+-------------------------+--------------+
| 997 | 6 | 2018-01-01 03:05:00 UTC | 1.0 |
| 996 | 6 | 2017-12-01 05:05:00 UTC | 1.0 |
| 998 | 6 | 2017-12-01 05:05:00 UTC | 1.5 |
| 999 | 20 | 2017-11-15 19:00:00 UTC | 1.0 |
| 995 | 6 | 2017-11-11 20:47:42 UTC | 0.04 |
+-----+---------+-------------------------+--------------+
Right now I can run the following and basically get what I need
SELECT COALESCE(SUM(time_entries.hours_worked),0) AS total,
time_entries.user_id,
week::date
--Using generate_series here to account for weeks with no time entries when
--doing the join
FROM generate_series( (DATE_TRUNC('week', '2017-11-01 00:00:00'::date)),
(DATE_TRUNC('week', '2017-12-31 23:59:59.999999'::date)),
interval '7 day') as week LEFT JOIN time_entries
ON DATE_TRUNC('week', time_entries.start_time) = week
GROUP BY week, time_entries.user_id
ORDER BY week
This will return
+-------+---------+------------+
| total | user_id | week |
+-------+---------+------------+
| 14.08 | 5 | 2017-10-30 |
| 21.92 | 6 | 2017-10-30 |
| 10.92 | 7 | 2017-10-30 |
| 14.26 | 8 | 2017-10-30 |
| 14.78 | 10 | 2017-10-30 |
| 14.08 | 13 | 2017-10-30 |
| 15.83 | 15 | 2017-10-30 |
| 8.75 | 5 | 2017-11-06 |
| 10.53 | 6 | 2017-11-06 |
| 13.73 | 7 | 2017-11-06 |
| 14.26 | 8 | 2017-11-06 |
| 19.45 | 10 | 2017-11-06 |
| 15.95 | 13 | 2017-11-06 |
| 14.16 | 15 | 2017-11-06 |
| 1.00 | 20 | 2017-11-13 |
| 0 | | 2017-11-20 |
| 2.50 | 6 | 2017-11-27 |
| 0 | | 2017-12-04 |
| 0 | | 2017-12-11 |
| 0 | | 2017-12-18 |
| 0 | | 2017-12-25 |
+-------+---------+------------+
However, this is difficult to parse particularly when there's no data for a week. What I would like is a pivot or crosstab table where the weeks are the columns and the rows are the users. And to include nulls from each (for instance if a user had no entries in that week or week without entries from any user).
Something like this
+---------+---------------+--------------+--------------+
| user_id | 2017-10-30 | 2017-11-06 | 2017-11-13 |
+---------+---------------+--------------+--------------+
| 6 | 4.0 | 1.0 | 0 |
| 7 | 4.0 | 1.0 | 0 |
| 8 | 4.0 | 0 | 0 |
| 9 | 0 | 1.0 | 0 |
| 10 | 4.0 | 0.04 | 0 |
+---------+---------------+--------------+--------------+
I've been looking around online and it seems that "dynamically" generating a list of columns for crosstab is difficult. I'd rather not hard code them, which seems weird to do anyway for dates. Or use something like this case with week number.
Should I look for another solution besides crosstab? If I could get the series of weeks for each user including all nulls I think that would be good enough. It just seems that right now my join strategy isn't returning that.
Personally I would use a Date Dimension table and use that table as the basis for the query. I find it far easier to use tabular data for these types of calculations as it leads to SQL that's easier to read and maintain. There's a great article on creating a Date Dimension table in PostgreSQL at https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac, though you could get away with a much simpler version of this table.
Ultimately what you would do is use the Date table as the base for the SELECT cols FROM table section and then join against that, or probably use Common Table Expressions, to create the calculations.
I'll write up a solution to that if you would like demonstrating how you could create such a query.
I had a calender entity in my project which manages the open and close time of business day of the whole year.
Below is the record of a specific month
id | today_date | year | month_of_year | day_of_month | is_business_day
-------+---------------------+------+---------------+-------------+---------------+
10103 | 2016-02-01 00:00:00 | 2016 | 2 | 1 | t
10104 | 2016-02-02 00:00:00 | 2016 | 2 | 2 | t
10105 | 2016-02-03 00:00:00 | 2016 | 2 | 3 | t
10106 | 2016-02-04 00:00:00 | 2016 | 2 | 4 | t
10107 | 2016-02-05 00:00:00 | 2016 | 2 | 5 | t
10108 | 2016-02-06 00:00:00 | 2016 | 2 | 6 | f
10109 | 2016-02-07 00:00:00 | 2016 | 2 | 7 | f
10110 | 2016-02-08 00:00:00 | 2016 | 2 | 8 | t
10111 | 2016-02-09 00:00:00 | 2016 | 2 | 9 | t
10112 | 2016-02-10 00:00:00 | 2016 | 2 | 10 | t
10113 | 2016-02-11 00:00:00 | 2016 | 2 | 11 | t
10114 | 2016-02-12 00:00:00 | 2016 | 2 | 12 | t
10115 | 2016-02-13 00:00:00 | 2016 | 2 | 13 | f
10116 | 2016-02-14 00:00:00 | 2016 | 2 | 14 | f
10117 | 2016-02-15 00:00:00 | 2016 | 2 | 15 | t
10118 | 2016-02-16 00:00:00 | 2016 | 2 | 16 | t
10119 | 2016-02-17 00:00:00 | 2016 | 2 | 17 | t
10120 | 2016-02-18 00:00:00 | 2016 | 2 | 18 | t
I want the get the today_date of last 7 working date. Supporse today_date is 2016-02-18 and date of last 7 working dates as 2016-02-09.
You can use row_number() for this like this:
SELECT * FROM
(SELECT t.*,row_number() OVER(order by today_date desc) as rnk
FROM Calender t
WHERE today_date <= current_date
AND is_business_day = 't')
WHERE rnk = 7
This will give you the row of the 7th business day from todays date
I see that you tagged your question with Doctrine, ORM and Datetime. Were you after a QueryBuilder solution? Maybe this is closer to what you want:
$qb->select('c.today_date')
->from(Calendar::class, 'c')
->where("c.today_date <= :today")
->andWhere("c.is_business_day = 't'")
->setMaxResults(7)
->orderBy("c.today_date", "DESC")
->setParameter('today', new \DateTime('now'), \Doctrine\DBAL\Types\Type::DATETIME));