get monthly and weekly average working hours in postgresql - postgresql

I have a table that has following columns:- local_id | time_in | time_out | date | employee_id
I have to calculate average working hours(which will be calculated by time_out and time_in) on a monthly basis in PSQL. I have no clue how to do that, was thinking about using date_part function...
here are the table details:
local_id | time_in | time_out | date | employee_id
---------+----------+----------+------------+-------------
7 | 08:00:00 | 17:00:00 | 2020-02-12 | 2
6 | 08:00:00 | 17:00:00 | 2020-02-12 | 4
8 | 09:00:00 | 17:00:00 | 2020-02-12 | 3
13 | 08:05:00 | 17:00:00 | 2020-02-17 | 3
12 | 08:00:00 | 18:09:00 | 2020-02-13 | 2

Click: demo:db<>fiddle; extended example covering two months
SELECT
employee_id,
date_trunc('month', the_date) AS month, -- 1
AVG(time_out - time_in) -- 2, 3
FROM
mytable
GROUP BY employee_id, month -- 3
date_trunc() "shortens" the date to a certain date part. In that case, all dates are truncated to the month. This gives the opportunity to group by month. (for your "monthly basis")
Calculate the working time by calculating the difference of both times
Grouping by employee_id and calculated month, calculating the average of the time differences.

Related

Postgres max value per hour with time it occurred

Given a Postgres table with columns highwater_datetime::timestamp and highwater::integer, I am trying to construct a select statement for a given highwater_datetime range, that generates rows with a column for the max highwater for each hour (first occurrence when dups) and another column showing the highwater_datetime when it occurred (truncated to the minute and order by highwater_datetime asc). e.g.
| highwater_datetime | max_highwater |
+--------------------+---------------+
| 2021-01-27 20:05 | 8 |
| 2021-01-27 21:00 | 7 |
| 2021-01-27 22:00 | 7 |
| 2021-01-27 23:00 | 7 |
| 2021-01-28 00:00 | 7 |
| 2021-01-28 01:32 | 7 |
| 2021-01-28 02:00 | 7 |
| 2021-01-28 03:00 | 7 |
| 2021-01-28 04:22 | 9 |
DISTINCT ON should do the trick:
SELECT DISTINCT ON (date_trunc('hour', highwater_datetime))
highwater_datetime,
highwater
FROM mytable
ORDER BY date_trunc('hour', highwater_datetime),
highwater DESC,
highwater_datetime;
DISTINCT ON will output the first row for each entry with the same hour according to the ORDER BY clause.

KDB select only rows with max value on a column elegantly

I have this table for stock prices (simplified version here):
+----------+--------+-------+
| Time | Ticker | Price |
+----------+--------+-------+
| 10:00:00 | A | 5 |
| 10:00:01 | A | 6 |
| 10:00:00 | B | 3 |
+----------+--------+-------+
I want to select the row group by Ticker with maximum Time, e.g.
+----------+--------+-------+
| Time | Ticker | Price |
+----------+--------+-------+
| 10:00:01 | A | 6 |
| 10:00:00 | B | 3 |
+----------+--------+-------+
I know how to do it in SQL, similar question can be found here , but I have no idea how to do elegantly it in KDB.
I have a solution that do selection twice:
select first Time, first Ticker, first Price by Ticker from (`Time xdesc select Time, Ticker, Price from table where date=2018.06.21)
Is there more clean solution?
Whenever you're doing a double select involving a by, it's a good sign that you can instead use fby
q)t:([]time:10:00:00 10:00:01 10:00:00;ticker:`A`A`B;price:5 6 3)
q)
q)select from t where time=(max;time) fby ticker
time ticker price
---------------------
10:00:01 A 6
10:00:00 B 3
Kdb also offers a shortcut of taking last records whenever do you a select by with no specified columns but this approach isn't as general or customizable
q)select by ticker from t
ticker| time price
------| --------------
A | 10:00:01 6
B | 10:00:00 3
One additional thing to note, select by can give wrong results if the data is not sorted correctly.
e.g.
select by ticker from reverse[t]
ticker| time price
------| --------------
A | 10:00:00 5 //wrong result
B | 10:00:00 3
The fby can get the correct results regardless of the order:
select from (reverse t) where time=(max;time) fby ticker
time ticker price
---------------------
10:00:00 B 3
10:00:01 A 6

Extracting real dates from dates of the week

I know how to extract a DOW from a date. eg SELECT EXTRACT(DOW FROM '2018-04-23'::date)
But how can I do the inverse? How can I take a series of DOW and convert them into the next date for a given week? (relative to the current week).
+-----+---------+
| id | the_dow |
+-----+----------
| 358 | 1 |
| 359 | 2 |
| 360 | 5 |
| 361 | 2 |
| 362 | 3 |
+-----+---------+
Just add that number to the start of the week:
date_trunc('week', current_date)::date + the_dow
As far as I know date_trunc() uses the ISO definition of the week, so the first day will be Monday. Using the isodow for the extract and the subtracting 1 from that value would be easier.

PostgreSQL - aggregate series interval 2 years

I have some
id_merchant | data | sell
11 | 2009-07-20 | 1100.00
22 | 2009-07-27 | 1100.00
11 | 2005-07-27 | 620.00
31 | 2009-08-07 | 2403.20
33 | 2009-08-12 | 4822.00
52 | 2009-08-14 | 4066.00
52 | 2009-08-15 | 295.00
82 | 2009-08-15 | 0.00
23 | 2011-06-11 | 340.00
23 | 2012-03-22 | 1000.00
23 | 2012-04-08 | 1000.00
23 | 2012-07-13 | 36.00
23 | 2013-07-17 | 2480.00
23 | 2014-04-09 | 1000.00
23 | 2014-06-10 | 1500.00
23 | 2014-07-20 | 700.50
I want to create table as select with interval 2 years. First date for merchant is min(date). So i generate series (min(date)::date,current(date)::date,'2 years')
I want to get to table like that:
id_merchant | data | sum(sell)
23 | 2011-06-11 | 12382.71
23 | 2013-06-11 | 12382.71
23 | 2015-06-11 | 12382.71
But there is some mistake in my query because sum(sell) is the same for all series and the sum is wrong. Event if i sum sale ther is about 6000 not 12382.71.
My query:
select m.id_gos_pla,
generate_series(m.min::date,dath()::date,'2 years')::date,
sum(rch.suma)
from rch, minmax m
where rch.id_gos_pla=m.id_gos_pla
group by m.id_gos_pla,m.min,m.max
order by 1,2;
Pls for help.
I would do it this way:
select
periods.id_merchant,
periods.date as period_start,
(periods.date + interval '2' year - interval '1' day)::date as period_end,
coalesce(sum(merchants.amount), 0) as sum
from
(
select
id_merchant,
generate_series(min(date), max(date), '2 year'::interval)::date as date
from merchants
group by id_merchant
) periods
left join merchants on
periods.id_merchant = merchants.id_merchant and
merchants.date >= periods.date and
merchants.date < periods.date + interval '2' year
group by periods.id_merchant, periods.date
order by periods.id_merchant, periods.date
We use sub-query to generate date periods for each id_merchant according to the first date for this merchant and required interval. Then join it with merchants table on date within period condition and group by merchant_id and period (periods.date is the starting period date which is enough). And finally we take everything we need: starting date, ending date, merchant and sum.

PostgreSQL: get rows based on a date, between 2 date columns.

I have the folowing table:
| id | duty_id | date_start | date_end |
| 1 | 1 | 2015-07-16 07:00:00 | 2015-07-16 14:30:00 |
| 2 | 3 | 2015-07-17 03:30:00 | 2015-07-17 11:00:00 |
| 3 | 5 | 2015-07-17 12:00:00 | 2015-07-17 19:30:00 |
and i have a date: 2015-07-17.
and i need to select the rows that happens on my date. AKA i need these lines:
| 2 | 3 | 2015-07-17 03:30:00 | 2015-07-17 11:00:00 |
| 3 | 5 | 2015-07-17 12:00:00 | 2015-07-17 19:30:00 |
sadly the BETWEEN doesn't work:
SELECT * FROM table WHERE ('2015-07-17'::DATE BETWEEN date_start AND date_end)
gives back empty result.
How can i get those lines?
The problem is that when a DATE is coerced to a TIMESTAMP (as must be done here to compare the DATE of '2015-07-17' to the TIMESTAMPs in the data) the time portion of the coerced TIMESTAMP is set to 00:00:00, and thus since the test data doesn't have a time period which is valid at midnight on 2015-07-17 no rows are returned.
If you add an INTERVAL literal of 211 minutes (three hours and 31 minutes) to the converted date you'll get results returned because the test data DOES have a row which is valid at 2015-07-17 at 03:31 AM:
SELECT * FROM my_table
WHERE '2015-07-17'::DATE + INTERVAL '211' MINUTE BETWEEN date_start
AND date_end;
SQLFiddle here
Best of luck.
Please try something like this:
SELECT * FROM table
WHERE '2015-07-17'::DATE BETWEEN date_start::DATE AND date_end::DATE;
SQLFiddle excample