Month Not Printing When No Transaction In Particular Month - postgresql

I had written a code for getting employee attrition details, Showing employee count of opening, join, left & closing, Month wise.
Issue here is that if there is no value in any above four column, system is not generating the month.
Please suggest the solution.
OUTPUT:
yyear | mmonth | charmonth | opening | incoming | relived | closing
-------+--------+-----------+---------+----------+---------+---------
2018 | 4 | Apr-18 | 14 | 2 | 0 | 16
2018 | 5 | May-18 | 16 | 1 | 0 | 17
2018 | 8 | Aug-18 | 17 | 3 | 0 | 20
2018 | 9 | Sep-18 | 20 | 1 | 0 | 21
2018 | 10 | Oct-18 | 21 | 23 | 4 | 40
2018 | 11 | Nov-18 | 40 | 5 | 1 | 44
2018 | 12 | Dec-18 | 44 | 2 | 0 | 46
2019 | 1 | Jan-19 | 46 | 1 | 0 | 47
2019 | 2 | Feb-19 | 47 | 1 | 0 | 48
2019 | 3 | Mar-19 | 48 | 6 | 1 | 53
2019 | 4 | Apr-19 | 53 | 1 | 0 | 54
2019 | 5 | May-19 | 54 | 3 | 1 | 56
2019 | 6 | Jun-19 | 56 | 2 | 0 | 58
(13 rows)
If you see the sequence of month, June-18, July-18 is missing.
Code:
WITH table_1 AS (
select
startdate as ddate,
enddate as lastday,
extract('month' from startdate) as mmonth,
extract('year' from startdate) as yyear,
to_char(to_timestamp(startdate),'Mon-YY') as months
from shr_period
where startdate >= DATE('2018-01-01')
and enddate <= DATE('2019-07-01')
and ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
)
SELECT
table_1.yyear,
table_1.mmonth,
table_1.months as charmonth,
(SELECT
COUNT(*)
FROM shr_emp_job OPENING
WHERE OPENING.dateofjoining < table_1.ddate
and OPENING.relieveddate is null
and OPENING.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) AS OPENING,
count(*) as incoming,
(select count(*)
from shr_emp_job rel
where rel.relieveddate is not null
and rel.dateofjoining <= table_1.lastday
and rel.dateofjoining >= table_1.ddate
and rel.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) as relived,
(SELECT COUNT(*)
FROM shr_emp_job CLOSING
WHERE CLOSING.dateofjoining <= table_1.lastday
and relieveddate is null
and CLOSING.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) AS CLOSING
FROM
shr_emp_job
JOIN table_1 ON table_1.mmonth = extract('month' from shr_emp_job.dateofjoining)
AND table_1.yyear = extract('year' from shr_emp_job.dateofjoining)
where shr_emp_job.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
GROUP BY table_1.mmonth, table_1.yyear, table_1.ddate, table_1.lastday, charmonth
ORDER BY table_1.yyear, table_1.mmonth;

As a quick look try changing your JOIN from an inner join to an outer join. So instead of
FROM
shr_emp_job
JOIN table_1 ON
do
FROM
shr_emp_job
RIGHT OUTER JOIN table_1 ON
This tells Postgres to keep the selected columns from the right mentioned table (table_1) even when there is no matching values in the left mentioned table (shr_emp_job). For those conditions NULL is supplied for the missing values.

Related

How to sum for previous n number of days for a number of dates in MySQL

I have a list of dates each with a value in MYSQL.
For each date I want to sum the value for this date and the previous 4 days.
I also want to sum the values for the start of that month to the present date. So for example:
For 07/02/2021 sum all values from 07/02/2021 to 01/02/2021
For 06/02/2021 sum all values from 06/02/2021 to 01/02/2021
For 31/01/2021 sum all values from 31/01/2021 to 01/01/2021
The output should look like:
Any help would be appreciated.
Thanks
In MYSQL 8.0 you get to use analytic/windowed functions.
SELECT
*,
SUM(value) OVER (
ORDER BY date
ROWS BETWEEN 4 PRECEEDING
AND CURRENT ROW
) AS five_day_period,
SUM(value) OVER (
PARTITION BY DATE_FORMAT(date, '%Y-%m-01')
ORDER BY date
) AS month_to_date
FROM
your_table
In the first case, it's just saying sum up the value column, in date order, starting from 4 rows before the current row, and ending on the current row.
In the second case, there's no ROWS BETWEEN, and so it defaults to all the rows preceding the current row up to the current row. Instead, we add a PARTITION BY which says to treat all rows with the same calendar month separately from any rows on a different calendar month. This, all rows before the current one only looks back to the first row in the partition, which is the first row in the current month.
In MySQL 5.x there are no such functions. As such I would resort to correlated sub-queries.
SELECT
*,
(
SELECT SUM(value)
FROM your_table AS five_day_lookup
WHERE date >= DATE_SUB(your_table.date, INTERVAL 4 DAYS)
AND date <= your_table.date
)
AS five_day_period,
(
SELECT SUM(value)
FROM your_table AS monthly_lookup
WHERE date >= DATE(DATE_FORMAT(your_table.date, '%Y-%m-01'))
AND date <= your_table.date
)
AS month_to_date
FROM
your_table
Here is a other way to do that:
Select
t1.`mydate` AS 'Date'
, t1.`val` AS 'Value'
, SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
, SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
FROM tab t1
LEFT JOIN tab t2 ON t2.`mydate`
BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
t1.`mydate` - INTERVAL 4 DAY)
AND t1.`mydate`
GROUP BY t1.`mydate`
ORDER BY t1.`mydate` desc;
sample
MariaDB [bkvie]> SELECT * FROM tab;
+----+------------+------+
| id | mydate | val |
+----+------------+------+
| 1 | 2021-02-07 | 10 |
| 2 | 2021-02-06 | 30 |
| 3 | 2021-02-05 | 40 |
| 4 | 2021-02-04 | 50 |
| 5 | 2021-02-03 | 10 |
| 6 | 2021-02-02 | 20 |
| 7 | 2021-01-31 | 20 |
| 8 | 2021-01-30 | 10 |
| 9 | 2021-01-29 | 30 |
| 10 | 2021-01-28 | 40 |
| 11 | 2021-01-27 | 20 |
| 12 | 2021-01-26 | 30 |
| 13 | 2021-01-25 | 10 |
| 14 | 2021-01-24 | 40 |
| 15 | 2021-02-01 | 10 |
+----+------------+------+
15 rows in set (0.00 sec)
result
MariaDB [bkvie]> Select
-> t1.`mydate` AS 'Date'
-> , t1.`val` AS 'Value'
-> , SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
-> , SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
-> FROM tab t1
-> LEFT JOIN tab t2 ON t2.`mydate`
-> BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
-> t1.`mydate` - INTERVAL 4 DAY)
-> AND t1.`mydate`
-> GROUP BY t1.`mydate`
-> ORDER BY t1.`mydate` desc;
+------------+-------+--------------+---------------+
| Date | Value | 5 Day Period | Month of Date |
+------------+-------+--------------+---------------+
| 2021-02-07 | 10 | 140 | 170 |
| 2021-02-06 | 30 | 150 | 160 |
| 2021-02-05 | 40 | 130 | 130 |
| 2021-02-04 | 50 | 110 | 90 |
| 2021-02-03 | 10 | 70 | 40 |
| 2021-02-02 | 20 | 90 | 30 |
| 2021-02-01 | 10 | 110 | 10 |
| 2021-01-31 | 20 | 120 | 200 |
| 2021-01-30 | 10 | 130 | 180 |
| 2021-01-29 | 30 | 130 | 170 |
| 2021-01-28 | 40 | 140 | 140 |
| 2021-01-27 | 20 | 100 | 100 |
| 2021-01-26 | 30 | 80 | 80 |
| 2021-01-25 | 10 | 50 | 50 |
| 2021-01-24 | 40 | 40 | 40 |
+------------+-------+--------------+---------------+
15 rows in set (0.00 sec)
MariaDB [bkvie]>

postgres tablefunc, sales data grouped by product, with crosstab of months

TIL about tablefunc and crosstab. At first I wanted to "group data by columns" but that doesn't really mean anything.
My product sales look like this
product_id | units | date
-----------------------------------
10 | 1 | 1-1-2018
10 | 2 | 2-2-2018
11 | 3 | 1-1-2018
11 | 10 | 1-2-2018
12 | 1 | 2-1-2018
13 | 10 | 1-1-2018
13 | 10 | 2-2-2018
I would like to produce a table of products with months as columns
product_id | 01-01-2018 | 02-01-2018 | etc.
-----------------------------------
10 | 1 | 2
11 | 13 | 0
12 | 0 | 1
13 | 20 | 0
First I would group by month, then invert and group by product, but I cannot figure out how to do this.
After enabling the tablefunc extension,
SELECT product_id, coalesce("2018-1-1", 0) as "2018-1-1"
, coalesce("2018-2-1", 0) as "2018-2-1"
FROM crosstab(
$$SELECT product_id, date_trunc('month', date)::date as month, sum(units) as units
FROM test
GROUP BY product_id, month
ORDER BY 1$$
, $$VALUES ('2018-1-1'::date), ('2018-2-1')$$
) AS ct (product_id int, "2018-1-1" int, "2018-2-1" int);
yields
| product_id | 2018-1-1 | 2018-2-1 |
|------------+----------+----------|
| 10 | 1 | 2 |
| 11 | 13 | 0 |
| 12 | 0 | 1 |
| 13 | 10 | 10 |

Create trip number on tracking data

I am currently working on a Postgres database with data for car tracking which looks similar to this:
+----+--------+------------+----------+
| id | car_id | date | time |
+----+--------+------------+----------+
| 11 | 1 | 2014-12-20 | 12:12:12 |
| 12 | 1 | 2014-12-20 | 12:12:13 |
| 13 | 1 | 2014-12-20 | 12:12:14 |
| 23 | 1 | 2015-12-20 | 23:42:10 |
| 24 | 1 | 2015-12-20 | 23:42:11 |
| 31 | 2 | 2014-12-20 | 15:12:12 |
| 32 | 2 | 2014-12-20 | 15:12:14 |
+----+--------+------------+----------+
Here is the setup:
CREATE TABLE test (
id int
, car_id int
, date text
, time text
);
INSERT INTO test VALUES
(11, 1, '2014-12-20', '12:12:12'),
(12, 1, '2014-12-20', '12:12:13'),
(13, 1, '2014-12-20', '12:12:14'),
(23, 1, '2015-12-20', '23:42:10'),
(24, 1, '2015-12-20', '23:42:11'),
(31, 2, '2014-12-20', '15:12:12'),
(32, 2, '2014-12-20', '15:12:14');
I want to create a column where the traces are assigned a trip number sorted by id
id car_id date time (trip)
11 1 2014-12-20 12:12:12 1
12 1 2014-12-20 12:12:13 1
13 1 2014-12-20 12:12:14 1
23 1 2015-12-20 23:42:10 2 (trip +1 because time difference is bigger then 5 sec)
24 1 2015-12-20 23:42:11 2
31 2 2014-12-20 15:12:12 3 (trip +1 because car id is different)
32 2 2014-12-20 15:12:14 3 `
I have put op following rules
first row (lowest id) gets the value trip = 1
for the following rows: if car_id is equal to the row above and time
difference between the row and the row above is smaller then 5 then trip is
the same as the row above, else trip is the row above +1
I have tried with the following
Create table test as select
"id", "date", "time", car_id,
extract(epoch from "date" + "time") - lag(extract(epoch from "date" + "time")) over (order by "id") as diff,
Case
when t_diff < 5 and car_id - lag(car_id) over (order by "id") = 0
then lag(trip) over (order by "id")
else lag(trip) over (order by "id") + 1
end as trip
From road_1 order by "id"
but it does not work :( How can I compute the trip column?
First, use (date || ' ' || time)::timestamp AS datetime to form a timestamp out of date and time
SELECT id, test.car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test
which yields
| id | car_id | datetime |
|----+--------+---------------------|
| 11 | 1 | 2014-12-20 12:12:12 |
| 12 | 1 | 2014-12-20 12:12:13 |
| 13 | 1 | 2014-12-20 12:12:14 |
| 23 | 1 | 2015-12-20 23:42:10 |
| 24 | 1 | 2015-12-20 23:42:11 |
| 31 | 2 | 2014-12-20 15:12:12 |
| 32 | 2 | 2014-12-20 15:12:14 |
It is helpful to do this since we'll be using datetime - prev > '5 seconds'::interval
to identify rows which are 5 seconds apart. Notice that
2014-12-20 23:59:59 and 2014-12-21 00:00:00 are 5 seconds apart
but it would be difficult/tedious to determine this if all we had were separate date and time columns.
Now we can express the rule that the trip is increased by 1 when
NOT ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
(More on why the condition is expressed in this seemingly backwards way, below).
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
yields
| id | car_id | prev_car_id | datetime | prev_date | new_trip |
|----+--------+-------------+---------------------+---------------------+----------|
| 11 | 1 | | 2014-12-20 12:12:12 | | 1 |
| 12 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 | 0 |
| 13 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 | 0 |
| 23 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 | 1 |
| 24 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 | 0 |
| 31 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 | 1 |
| 32 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 | 0 |
Now trip can be expressed as the cumulative sum over the new_trip column:
SELECT id, car_id, datetime, sum(new_trip) OVER (ORDER BY datetime) AS trip
FROM (
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
) t3
yields
| id | car_id | datetime | trip |
|----+--------+---------------------+------|
| 11 | 1 | 2014-12-20 12:12:12 | 1 |
| 12 | 1 | 2014-12-20 12:12:13 | 1 |
| 13 | 1 | 2014-12-20 12:12:14 | 1 |
| 31 | 2 | 2014-12-20 15:12:12 | 2 |
| 32 | 2 | 2014-12-20 15:12:14 | 2 |
| 23 | 1 | 2015-12-20 23:42:10 | 3 |
| 24 | 1 | 2015-12-20 23:42:11 | 3 |
I used
(CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END)
instead of
(CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END)
because prev_car_id and prev_date may be NULL. Thus, on the first row, (car_id != prev_car_id) returns NULL when instead we want TRUE.
By expressing the condition in the opposite way, we can identify the unintersting rows correctly:
((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
and use the ELSE clause to return 1 when the condition is TRUE or NULL. You can see the difference here:
SELECT id
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
, (CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END) AS new_trip_wrong
, car_id, prev_car_id, datetime, prev_date
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
yields
| id | new_trip | new_trip_wrong | car_id | prev_car_id | datetime | prev_date |
|----+----------+----------------+--------+-------------+---------------------+---------------------|
| 11 | 1 | 0 | 1 | | 2014-12-20 12:12:12 | |
| 12 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |
| 13 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |
| 23 | 1 | 1 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |
| 24 | 0 | 0 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |
| 31 | 1 | 1 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |
| 32 | 0 | 0 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |
Note the difference in the new_trip versus new_trip_wrong columns.

Rank based on row number SQL Server 2008 R2

I want to group rank my table data by rowcount. First 12 rows that are ordered by date for each ProductID would get value = 1. Next 12 rows would get value = 2 assigned and so on.
How table structure looks:
For ProductID = 1267 are below associated dates:
02-01-2016
03-01-2016
.
. (skipping months..table has one date per month)
.
12-01-2016
02-01-2017
.
.
.
02-01-2018
Use row_number() over() with some arithmetic to calculate groups of 12 ordered by date (per productid). Change the sort to ASCendng or DESCendng to suit your need.
select *
, (11 + row_number() over(partition by productid order by somedate DESC)) / 12 as rnk
from mytable
GO
myTableID | productid | somedate | rnk
--------: | :------------- | :------------------ | :--
9 | 123456 | 2018-11-12 08:24:25 | 1
8 | 123456 | 2018-10-02 12:29:04 | 1
7 | 123456 | 2018-09-09 02:39:30 | 1
2 | 123456 | 2018-09-02 08:49:37 | 1
1 | 123456 | 2018-07-04 12:25:06 | 1
5 | 123456 | 2018-06-06 11:38:50 | 1
12 | 123456 | 2018-05-23 21:12:03 | 1
18 | 123456 | 2018-04-02 03:59:16 | 1
3 | 123456 | 2018-01-02 03:42:24 | 1
17 | 123456 | 2017-11-29 03:19:32 | 1
10 | 123456 | 2017-11-10 00:45:41 | 1
13 | 123456 | 2017-11-05 09:53:38 | 1
16 | 123456 | 2017-10-20 15:39:42 | 2
4 | 123456 | 2017-10-14 19:25:30 | 2
20 | 123456 | 2017-09-21 21:31:06 | 2
6 | 123456 | 2017-04-06 22:10:58 | 2
14 | 123456 | 2017-03-24 23:35:52 | 2
19 | 123456 | 2017-01-22 05:07:23 | 2
11 | 123456 | 2016-12-13 19:17:08 | 2
15 | 123456 | 2016-12-02 03:22:32 | 2
dbfiddle here

Linear regression with postgres

I use Postgres and i have a large number of rows with values and date per station.
(Dates can be separated by several days.)
id | value | idstation | udate
--------+-------+-----------+-----
1 | 5 | 12 | 1984-02-11 00:00:00
2 | 7 | 12 | 1984-02-17 00:00:00
3 | 8 | 12 | 1984-02-21 00:00:00
4 | 9 | 12 | 1984-02-23 00:00:00
5 | 4 | 12 | 1984-02-24 00:00:00
6 | 8 | 12 | 1984-02-28 00:00:00
7 | 9 | 14 | 1984-02-21 00:00:00
8 | 15 | 15 | 1984-02-21 00:00:00
9 | 14 | 18 | 1984-02-21 00:00:00
10 | 200 | 19 | 1984-02-21 00:00:00
Forgive what may be a silly question, but I'm not much of a database guru.
Is it possible to directly enter a SQL query that will calculate linear regression per station for each date, knowing that the regression must be calculate only with actual id date, previous id date and next id date ?
For example linear regression for id 2 must be calculate with value 7(actual),5(previous),8(next) for dates 1984-02-17 , 1984-02-11 and 1984-02-21
Edit : I have to use regr_intercept(value,udate) but i really don't know how to do this if i have to use only actual, previous and next value/date for each lines.
Edit2 : 3 rows added to idstation(12); id and dates numbers are changed
Hope you can help me, thank you !
This is the combination of Joop's statistics and Denis's window functions:
WITH num AS (
SELECT id, idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
-- id + the ids of the {prev,next} records
-- within the same idstation group
, drag AS (
SELECT id AS center
, LAG(id) OVER www AS prev
, LEAD(id) OVER www AS next
FROM thedata
WINDOW www AS (partition by idstation ORDER BY id)
)
-- junction CTE between ID and its three feeders
, tri AS (
SELECT center AS this, center AS that FROM drag
UNION ALL SELECT center AS this , prev AS that FROM drag
UNION ALL SELECT center AS this , next AS that FROM drag
)
SELECT t.this, n.idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM num n
JOIN tri t ON t.that = n.id
GROUP BY t.this, n.idstation
;
Results:
INSERT 0 7
this | idstation | intercept | slope | rsq | avgx | avgy
------+-----------+-------------------+-------------------+-------------------+------------------+------------------
1 | 12 | -46 | 1 | 1 | 52 | 6
2 | 12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
3 | 12 | -10.6666666666667 | 0.333333333333333 | 1 | 54.5 | 7.5
4 | 14 | | | | 51 | 9
5 | 15 | | | | 51 | 15
6 | 18 | | | | 51 | 14
7 | 19 | | | | 51 | 200
(7 rows)
The clustering of the group-of-three can probably be done more elegantly using a rank() or row_number() function, which would also allow larger sliding windows to be used.
DROP SCHEMA zzz CASCADE;
CREATE SCHEMA zzz ;
SET search_path=zzz;
CREATE TABLE thedata
( id INTEGER NOT NULL PRIMARY KEY
, value INTEGER NOT NULL
, idstation INTEGER NOT NULL
, udate DATE NOT NULL
);
INSERT INTO thedata(id,value,idstation,udate) VALUES
(1 ,5 ,12 ,'1984-02-21' )
,(2 ,7 ,12 ,'1984-02-23' )
,(3 ,8 ,12 ,'1984-02-26' )
,(4 ,9 ,14 ,'1984-02-21' )
,(5 ,15 ,15 ,'1984-02-21' )
,(6 ,14 ,18 ,'1984-02-21' )
,(7 ,200 ,19 ,'1984-02-21' )
;
WITH a AS (
SELECT idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
SELECT idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM a
GROUP BY idstation
;
output:
idstation | intercept | slope | rsq | avgx | avgy
-----------+-------------------+-------------------+-------------------+------------------+------------------
15 | | | | 51 | 15
14 | | | | 51 | 9
19 | | | | 51 | 200
12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
18 | | | | 51 | 14
(5 rows)
Note: if you want a spline-like regression you should also use the lag() and lead() window functions, like in Denis's answer.
If the average is ok for you you could use avg build in... Something like
SELECT avg("value") FROM "my_table" WHERE "idstation" = 3;
Should do. For more complicated things you will need to write some pl/SQL-function I'm afraid or check for a addon on PostgreSQL.
Look into window functions. If I get your question correctly, lead() and lag() will likely give you precisely what you want. Example usage:
select idstation as idstation,
id as curr_id,
udate as curr_date,
lag(id) over w as prev_id,
lag(udate) over w as prev_date,
lead(id) over w as next_id,
lead(udate) over w as next_date
from dates
window w as (
partition by idstation order by udate, id
)
order by idstation, udate, id
http://www.postgresql.org/docs/current/static/tutorial-window.html