Rows to columns with no specific column headings

Rows to columns with no specific column headings - tsql

I have a query that returns data which is in rows and I need to change the results so the results are in columns and not rows. I have done research and found this article which has a dynamic query that can be used but it doesn't seem my situation will be able to use that solution. That solution seemed to rely on each row having a unique name that could be used as a column name, I don't have that.
My data contains customer records of the visits a customer has had to our facility. Some customers will only see us once and some will see us many times in the same time span, so we have no way of predicting how many visits per customer we will have in a given time period. Note ID 10219 has only one visit, 5180 has 3 and there are many for 5199.
ID Task Visit Date RF Score PF Score
10219 Follow Up Visit 12/26/2013 1 6
5180 Initial Visit 6/9/2011 3 9
5180 Follow Up Visit 7/8/2011 3 10
5180 Follow Up Visit 9/2/2011 1 10
5199 Follow Up Visit 9/15/2011 2 7
5199 Follow Up Visit 9/8/2011 5 6
5199 Follow Up Visit 10/27/2011 4 7
5199 Follow Up Visit 10/20/2011 2 4
5199 Follow Up Visit 10/13/2011 4 8
5199 Follow Up Visit 11/17/2011 3 4
5199 Follow Up Visit 11/10/2011 2 5
5199 Follow Up Visit 11/3/2011 3 3
With data that is structured like this does anyone know how to convert these rows to columns dynamically even though I don't know how many columns will be needed?
EDIT: the final result should look like this:
ID Task1 Visit Date1 RF Score1 PF Score1 Task2 Visit Date2 RF Score2 PF Score2 Task3 Visit Date3 RF Score3 PF Score3
5180 Initial Visit 6/9/2011 3 9 Follow Up Visit 7/8/2011 3 10 Follow Up Visit 9/2/2011 1 10

The solution that you link to would work for your case but you have to adjust it slightly because you want to pivot on multiple columns of data. Since you need to pivot multiple columns, then you first will want to unpivot your Visit Date, Task, Rf Score and Pf Score columns into multiple rows, then apply the pivot function. Besides the process of unpivoting, I would also suggest using a windowing function like row_number to generate a unique sequence for each id with date.
You will start your query by using the following:
select id, task, [visit date], [rf score], [pf score],
row_number() over(partition by id
order by [visit date]) seq
from yourtable
See SQL Fiddle with Demo. This creates a number that will be used to associate each value in Visit Date, Task, Rf Score and Pf Score with the actual number of the visit.
Once you have this row number, then you will want to unpivot your multiple columns into multiple rows of data. There are several ways to do this including using the unpivot function. But since you are using SQL Server 2008R2, you can use CROSS APPLY with VALUES:
select id,
col = col + cast(seq as varchar(10)),
value
from
(
select id, task, [visit date], [rf score], [pf score],
row_number() over(partition by id
order by [visit date]) seq
from yourtable
) d
cross apply
(
values
('VisitDate', convert(varchar(10), [visit date], 120)),
('Task', [task]),
('RfScore', cast([rf score] as varchar(10))),
('PfScore', cast([pf score] as varchar(10)))
) c (col, value)
See SQL Fiddle with Demo. Your data is now in a format that can easily be pivoted:
| ID | COL | VALUE |
|-------|------------|-----------------|
| 5180 | VisitDate1 | 2011-06-09 |
| 5180 | Task1 | Initial Visit |
| 5180 | RfScore1 | 3 |
| 5180 | PfScore1 | 9 |
| 5180 | VisitDate2 | 2011-07-08 |
| 5180 | Task2 | Follow Up Visit |
| 5180 | RfScore2 | 3 |
| 5180 | PfScore2 | 10 |
The code when the PIVOT is added willbe:
select id,
VisitDate1, Task1, RfScore1, PfScore1,
VisitDate2, Task2, RfScore2, PfScore2,
VisitDate3, Task3, RfScore3, PfScore3,
VisitDate4, Task4, RfScore4, PfScore4,
VisitDate5, Task5, RfScore5, PfScore5,
VisitDate6, Task6, RfScore6, PfScore6
from
(
select id,
col = col + cast(seq as varchar(10)),
value
from
(
select id, task, [visit date], [rf score], [pf score],
row_number() over(partition by id
order by [visit date]) seq
from yourtable
) d
cross apply
(
values
('VisitDate', convert(varchar(10), [visit date], 120)),
('Task', [task]),
('RfScore', cast([rf score] as varchar(10))),
('PfScore', cast([pf score] as varchar(10)))
) c (col, value)
) d
pivot
(
max(value)
for col in (VisitDate1, Task1, RfScore1, PfScore1,
VisitDate2, Task2, RfScore2, PfScore2,
VisitDate3, Task3, RfScore3, PfScore3,
VisitDate4, Task4, RfScore4, PfScore4,
VisitDate5, Task5, RfScore5, PfScore5,
VisitDate6, Task6, RfScore6, PfScore6)
) piv;
See SQL Fiddle with Demo.
The above works great if you have a limited number of value, but if they are unknown then you will need to use dynamic SQL and the code above will be converted to:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(col + cast(seq as varchar(10)))
from
(
select row_number() over(partition by id
order by [visit date]) seq
from yourtable
) d
cross apply
(
select 'VisitDate', 1 union all
select 'Task', 2 union all
select 'RfScore', 3 union all
select 'PfScore', 4
) c (col, so)
group by seq, col, so
order by seq, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'select id, ' + #cols + '
from
(
select id,
col = col + cast(seq as varchar(10)),
value
from
(
select id, task, [visit date], [rf score], [pf score],
row_number() over(partition by id
order by [visit date]) seq
from yourtable
) d
cross apply
(
values
(''VisitDate'', convert(varchar(10), [visit date], 120)),
(''Task'', task),
(''RfScore'', cast([rf score] as varchar(10))),
(''PfScore'', cast([pf score] as varchar(10)))
) c (col, value)
) s
pivot
(
max(value)
for col in (' + #cols + ')
) p '
execute sp_executesql #query;
See SQL Fiddle with Demo. Both versions give a result:
| ID | VISITDATE1 | TASK1 | RFSCORE1 | PFSCORE1 | VISITDATE2 | TASK2 | RFSCORE2 | PFSCORE2 | VISITDATE3 | TASK3 | RFSCORE3 | PFSCORE3 | VISITDATE4 | TASK4 | RFSCORE4 | PFSCORE4 | VISITDATE5 | TASK5 | RFSCORE5 | PFSCORE5 | VISITDATE6 | TASK6 | RFSCORE6 | PFSCORE6 | VISITDATE7 | TASK7 | RFSCORE7 | PFSCORE7 | VISITDATE8 | TASK8 | RFSCORE8 | PFSCORE8 |
|-------|------------|-----------------|----------|----------|------------|-----------------|----------|----------|------------|-----------------|----------|----------|------------|-----------------|----------|----------|------------|-----------------|----------|----------|------------|-----------------|----------|----------|------------|-----------------|----------|----------|------------|-----------------|----------|----------|
| 5180 | 2011-06-09 | Initial Visit | 3 | 9 | 2011-07-08 | Follow Up Visit | 3 | 10 | 2011-09-02 | Follow Up Visit | 1 | 10 | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| 5199 | 2011-09-08 | Follow Up Visit | 5 | 6 | 2011-09-15 | Follow Up Visit | 2 | 7 | 2011-10-13 | Follow Up Visit | 4 | 8 | 2011-10-20 | Follow Up Visit | 2 | 4 | 2011-10-27 | Follow Up Visit | 4 | 7 | 2011-11-03 | Follow Up Visit | 3 | 3 | 2011-11-10 | Follow Up Visit | 2 | 5 | 2011-11-17 | Follow Up Visit | 3 | 4 |
| 10219 | 2013-12-26 | Follow Up Visit | 1 | 6 | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) | (null) |

Related

Distinct Count Dates by timeframe

I am trying to find the daily count of frequent visitors from a very large data-set. Frequent visitors in this case are visitor IDs used on 2 distinct days in a rolling 3 day period.
My data set looks like the below:
ID | Date | Location | State | Brand |
1 | 2020-01-02 | A | CA | XYZ |
1 | 2020-01-03 | A | CA | BCA |
1 | 2020-01-04 | A | CA | XYZ |
1 | 2020-01-06 | A | CA | YQR |
1 | 2020-01-06 | A | WA | XYZ |
2 | 2020-01-02 | A | CA | XYZ |
2 | 2020-01-05 | A | CA | XYZ |
This is the result I am going for. The count in the visits column is equal to the count of distinct days from the date column, -2 days for each ID. So for ID 1 on 2020-01-05, there was a visit on the 3rd and 4th, so the count is 2.
Date | ID | Visits | Frequent Prior 3 Days
2020-01-01 |Null| Null | Null
2020-01-02 | 1 | 1 | No
2020-01-02 | 2 | 1 | No
2020-01-03 | 1 | 2 | Yes
2020-01-03 | 2 | 1 | No
2020-01-04 | 1 | 3 | Yes
2020-01-04 | 2 | 1 | No
2020-01-05 | 1 | 2 | Yes
2020-01-05 | 2 | 1 | No
2020-01-06 | 1 | 2 | Yes
2020-01-06 | 2 | 1 | No
2020-01-07 | 1 | 1 | No
2020-01-07 | 2 | 1 | No
2020-01-08 | 1 | 1 | No
2020-01-09 | 1 | null | Null
I originally tried to use the following line to get the result for the visits column, but end up with 3 in every successive row at whichever date it first got to 3 for that ID.
,
count(ID) over (Partition by ID order by Date ASC rows between 3 preceding and current row) as visits
I've scoured the forum, but every somewhat similar question seems to involve counting the values rather than the dates and haven't been able to figure out how to tweak to get what I need. Any help is much appreciated.

You can aggregate the dataset by user and date, then use window functions with a range frame to look at the three preceding rows.
You did not tell which database you are running - and not all databases support the window ranges, nor have the same syntax for literal intervals. In standard SQL, you would go:
select
id,
date,
count(*) cnt_visits
case
when sum(count(*)) over(
partition by id
order by date
range between interval '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from mytable
group by id, date
On the other hand, if you want a record for every user and every day (event when there is no visit), then it is a bit different. You can generate the dataset first, then bring the table with a left join:
select
i.id,
d.date,
count(t.id) cnt_visits,
case
when sum(count(t.id)) over(
partition by i.id
order by d.date
rows between '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from (select distinct id from mytable) i
cross join (select distinct date from mytable) d
left join mytable t
on t.date = d.date
and t.id = i.id
group by i.id, d.date

I would be inclined to approach this by expanding out the days and visitors using a cross join and then just window functions. Assuming you have all dates in the data:
select i.id, d.date,
count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) as cnt_visits,
(case when count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) >= 2
then 'Yes' else 'No'
end) as is_frequent_visitor
from (select distinct id from t) i cross join
(select distinct date from t) d left join
(select distinct id, date from t) t
on t.date = d.date and
t.id = i.id;

Select row by id and it's nearest rows sorted by some value. PostgreSQL

I have chapters table like this:
id | title | sort_number | book_id
1 | 'Chap 1' | 3 | 1
5 | 'Chap 2' | 6 | 1
8 | 'About ' | 1 | 1
9 | 'Chap 3' | 9 | 1
10 | 'Attack' | 1 | 2
Id is unique, sort_number is unique for same book(book_id)
1)How can load all data (3 rows) for 3 chapters (current, next and prev) sorted by sort_number if i have only current chapter id?
2)How can i load current chapter data (1 row) and only id's of next, prev if they exist?

This can be done using window functions
select id, title, sort_number, book_id,
lag(id) over w as prev_chapter,
lead(id) over w as next_chapter
from chapters
window w as (partition by book_id order by sort_number);
With your sample data that returns:
id | title | sort_number | book_id | prev_chapter | next_chapter
---+--------+-------------+---------+--------------+-------------
8 | About | 1 | 1 | | 1
1 | Chap 1 | 3 | 1 | 8 | 5
5 | Chap 2 | 6 | 1 | 1 | 9
9 | Chap 3 | 9 | 1 | 5 |
10 | Attack | 1 | 2 | |
The above query can now be used to answer both your questions:
1)
select id, title, sort_number, book_id
from (
select id, title, sort_number, book_id,
--first_value(id) over w as first_chapter,
lag(id) over w as prev_chapter_id,
lead(id) over w as next_chapter_id
from chapters
window w as (partition by book_id order by sort_number)
) t
where 1 in (id, prev_chapter_id, next_chapter_id)
2)
select *
from (
select id, title, sort_number, book_id,
lag(id) over w as prev_chapter_id,
lead(id) over w as next_chapter_id
from chapters
window w as (partition by book_id order by sort_number)
) t
where id = 1

Computing rolling sums efficiently in PostgreSQL

Supposing I have a set of transactions (purchases) with dates for a set of customers, I want to calculate a rolling x day sum of purchase amount and number of purchases by customer in that same window. I've gotten it to work using a window function, but I have to fill in for dates where the customer did not make any purchases. In so doing, I'm using a Cartesian product. Is there a more efficient approach so that it's more scalable as the number of customers – and time window – increases?
Edit: As noted in the comments, I'm on PostgreSQL v9.3.
Here's sample data (note that some customers may have 0, 1, or multiple purchases on a given date):
| id | cust_id | txn_date | amount |
|----|---------|------------|--------|
| 1 | 123 | 2017-08-17 | 10 |
| 2 | 123 | 2017-08-17 | 5 |
| 3 | 123 | 2017-08-18 | 5 |
| 4 | 123 | 2017-08-20 | 50 |
| 5 | 123 | 2017-08-21 | 100 |
| 6 | 456 | 2017-08-01 | 5 |
| 7 | 456 | 2017-08-01 | 5 |
| 8 | 456 | 2017-08-01 | 5 |
| 9 | 456 | 2017-08-30 | 5 |
| 10 | 456 | 2017-08-01 | 1000 |
| 11 | 789 | 2017-08-15 | 1000 |
| 12 | 789 | 2017-08-30 | 1000 |
And here's the desired output:
| cust_id | txn_date | sum_dly_txns | tot_txns_7d | cnt_txns_7d |
|---------|------------|--------------|-------------|-------------|
| 123 | 2017-08-17 | 15 | 15 | 2 |
| 123 | 2017-08-18 | 5 | 20 | 3 |
| 123 | 2017-08-20 | 50 | 70 | 4 |
| 123 | 2017-08-21 | 100 | 170 | 5 |
| 456 | 2017-08-01 | 1015 | 1015 | 4 |
| 456 | 2017-08-30 | 5 | 5 | 1 |
| 789 | 2017-08-15 | 1000 | 1000 | 1 |
| 789 | 2017-08-30 | 1000 | 1000 | 1 |
Here's SQL that produces the totals as desired:
SELECT *
FROM (
-- One row per day per user
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
),
-- Every possible transaction date for every user
dummydates AS (
SELECT txn_date, uids.cust_id
FROM (
SELECT generate_series(
timestamp '2017-08-01'
,timestamp '2017-08-30'
,interval '1 day')::date
) d(txn_date)
CROSS JOIN (SELECT DISTINCT cust_id FROM daily_txns) uids
),
txns_dummied AS (
SELECT
d.cust_id
,d.txn_date
,COALESCE(sum_dly_txns,0) AS sum_dly_txns
,COALESCE(cnt_dly_txns,0) AS cnt_dly_txns
FROM dummydates d
LEFT JOIN daily_txns dx
ON d.txn_date = dx.txn_date
AND d.cust_id = dx.cust_id
ORDER BY d.txn_date, d.cust_id
)
SELECT
cust_id
,txn_date
,sum_dly_txns
,SUM(COALESCE(sum_dly_txns,0)) OVER w AS tot_txns_7d
,SUM(cnt_dly_txns) OVER w AS cnt_txns_7d
FROM txns_dummied
WINDOW w AS (
PARTITION BY cust_id
ORDER BY txn_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7d moving window
)
ORDER BY cust_id, txn_date
) xfers
WHERE sum_dly_txns > 0 -- Omit dates with no transactions
;
SQL Fiddle

Instead of ROWS BETWEEN 6 PRECEDING AND CURRENT ROW did you want to write RANGE '6 days' PRECEEDING ?
This must be what you are looking for:
SELECT DISTINCT
cust_id
,txn_date
,SUM(amount) OVER (PARTITION BY cust_id, txn_date) sum_dly_txns
,SUM(amount) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
,COUNT(*) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
from transactions
ORDER BY cust_id, txn_date
Edit: Since you are using an old version (I tested the one above on my postgresql 11), the point above will not make much sense so you will need to old-fashioned SQL (that is, witout window functions).
It is a bit less efficient but does a fair job.
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
)
SELECT t1.cust_id, t1.txn_date, t1.sum_dly_txns, SUM(t2.sum_dly_txns), SUM(t2.cnt_dly_txns)
from daily_txns t1
join daily_txns t2 ON t1.cust_id = t2.cust_id and t2.txn_date BETWEEN t1.txn_date - 7 and t1.txn_date
group by t1.cust_id, t1.txn_date, t1.sum_dly_txns
order by t1.cust_id, t1.txn_date

Use another table's data in postgreSQL

I have a event table and a transaction log.
And I want count each event's total revenue by one sql.
Could anything tell how to do this.
please be ware there will be more than 100,000 logs in transaction table.
event_table:
Event_id | start_date | end_date
------------------------
11111 | 2013-01-04 | 2013-01-05
11112 | 2013-01-08 | 2013-01-10
11113 | 2013-01-11 | 2013-01-12
11114 | 2013-01-15 | 2013-01-18
11115 | 2013-01-19 | 2013-01-21
11116 | 2013-01-22 | 2013-01-24
11117 | 2013-01-26 | 2013-01-29
transaction_log:
id | name | time_created | Cost
------------------------
1 | michael | 2013-01-04 | 1
2 | michael | 2013-01-08 | 4
3 | mary | 2013-01-11 | 5
4 | john | 2013-01-15 | 2
5 | michael | 2013-01-19 | 3
6 | mary | 2013-01-22 | 2
7 | john | 2013-01-26 | 4
I tried to use the sql like following, but it does not work.
select
event_table.id,
( select sum(Cost)
from transaction_log
where date(time_created) between transaction_log.start_date and transaction_log.end_date ) as revenue
from event_table

It is failing because the fields start_date and end_date are from event_table but you're stating them as transaction_log.start_date and transaction_log.end_date. This will work:
select
event_table.id,
( select sum(Cost)
from transaction_log
where date(time_created) between event_table.start_date and event_table.end_date ) as revenue
from event_table
There is no need to cast time_created as date (date(time_created)) if it is already of date data type. Otherwise, if time_created is timestamp or timestamptz, then for performance you may want to consider doing:
select
event_table.id,
( select sum(Cost)
from transaction_log
where time_created >= event_table.start_date::timestamptz and time_created < (event_table.end_date+1)::timestamptz ) as revenue
from event_table
Also for performance, when executing a query like the one above, PostgreSQL is executing a subquery for each row of the main query (in this case the event_table table). Joining and using GROUP BY will generally provide you with better results:
select e.id, sum(l.Cost) as revenue
from event_table e
join transaction_log l ON (l.time_created BETWEEN e.start_date AND e.end_date)
group by e.id

Linear regression with postgres

I use Postgres and i have a large number of rows with values and date per station.
(Dates can be separated by several days.)
id | value | idstation | udate
--------+-------+-----------+-----
1 | 5 | 12 | 1984-02-11 00:00:00
2 | 7 | 12 | 1984-02-17 00:00:00
3 | 8 | 12 | 1984-02-21 00:00:00
4 | 9 | 12 | 1984-02-23 00:00:00
5 | 4 | 12 | 1984-02-24 00:00:00
6 | 8 | 12 | 1984-02-28 00:00:00
7 | 9 | 14 | 1984-02-21 00:00:00
8 | 15 | 15 | 1984-02-21 00:00:00
9 | 14 | 18 | 1984-02-21 00:00:00
10 | 200 | 19 | 1984-02-21 00:00:00
Forgive what may be a silly question, but I'm not much of a database guru.
Is it possible to directly enter a SQL query that will calculate linear regression per station for each date, knowing that the regression must be calculate only with actual id date, previous id date and next id date ?
For example linear regression for id 2 must be calculate with value 7(actual),5(previous),8(next) for dates 1984-02-17 , 1984-02-11 and 1984-02-21
Edit : I have to use regr_intercept(value,udate) but i really don't know how to do this if i have to use only actual, previous and next value/date for each lines.
Edit2 : 3 rows added to idstation(12); id and dates numbers are changed
Hope you can help me, thank you !

This is the combination of Joop's statistics and Denis's window functions:
WITH num AS (
SELECT id, idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
-- id + the ids of the {prev,next} records
-- within the same idstation group
, drag AS (
SELECT id AS center
, LAG(id) OVER www AS prev
, LEAD(id) OVER www AS next
FROM thedata
WINDOW www AS (partition by idstation ORDER BY id)
)
-- junction CTE between ID and its three feeders
, tri AS (
SELECT center AS this, center AS that FROM drag
UNION ALL SELECT center AS this , prev AS that FROM drag
UNION ALL SELECT center AS this , next AS that FROM drag
)
SELECT t.this, n.idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM num n
JOIN tri t ON t.that = n.id
GROUP BY t.this, n.idstation
;
Results:
INSERT 0 7
this | idstation | intercept | slope | rsq | avgx | avgy
------+-----------+-------------------+-------------------+-------------------+------------------+------------------
1 | 12 | -46 | 1 | 1 | 52 | 6
2 | 12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
3 | 12 | -10.6666666666667 | 0.333333333333333 | 1 | 54.5 | 7.5
4 | 14 | | | | 51 | 9
5 | 15 | | | | 51 | 15
6 | 18 | | | | 51 | 14
7 | 19 | | | | 51 | 200
(7 rows)
The clustering of the group-of-three can probably be done more elegantly using a rank() or row_number() function, which would also allow larger sliding windows to be used.

DROP SCHEMA zzz CASCADE;
CREATE SCHEMA zzz ;
SET search_path=zzz;
CREATE TABLE thedata
( id INTEGER NOT NULL PRIMARY KEY
, value INTEGER NOT NULL
, idstation INTEGER NOT NULL
, udate DATE NOT NULL
);
INSERT INTO thedata(id,value,idstation,udate) VALUES
(1 ,5 ,12 ,'1984-02-21' )
,(2 ,7 ,12 ,'1984-02-23' )
,(3 ,8 ,12 ,'1984-02-26' )
,(4 ,9 ,14 ,'1984-02-21' )
,(5 ,15 ,15 ,'1984-02-21' )
,(6 ,14 ,18 ,'1984-02-21' )
,(7 ,200 ,19 ,'1984-02-21' )
;
WITH a AS (
SELECT idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
SELECT idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM a
GROUP BY idstation
;
output:
idstation | intercept | slope | rsq | avgx | avgy
-----------+-------------------+-------------------+-------------------+------------------+------------------
15 | | | | 51 | 15
14 | | | | 51 | 9
19 | | | | 51 | 200
12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
18 | | | | 51 | 14
(5 rows)
Note: if you want a spline-like regression you should also use the lag() and lead() window functions, like in Denis's answer.

If the average is ok for you you could use avg build in... Something like
SELECT avg("value") FROM "my_table" WHERE "idstation" = 3;
Should do. For more complicated things you will need to write some pl/SQL-function I'm afraid or check for a addon on PostgreSQL.

Look into window functions. If I get your question correctly, lead() and lag() will likely give you precisely what you want. Example usage:
select idstation as idstation,
id as curr_id,
udate as curr_date,
lag(id) over w as prev_id,
lag(udate) over w as prev_date,
lead(id) over w as next_id,
lead(udate) over w as next_date
from dates
window w as (
partition by idstation order by udate, id
)
order by idstation, udate, id
http://www.postgresql.org/docs/current/static/tutorial-window.html

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Rows to columns with no specific column headings - tsql

Related

Distinct Count Dates by timeframe

Select row by id and it's nearest rows sorted by some value. PostgreSQL

Computing rolling sums efficiently in PostgreSQL

Use another table's data in postgreSQL

Linear regression with postgres

Categories

Resources