sum up graphs from different streams but slightly different timestamps - postgresql

Right now I try to solve a problem which seems for my understanding somehow related to Finding gaps in huge event streams?
I have several streams of data in my table. I want to sum them up by time, but they have not always the same timestamp. The table looks like this:
Schema:
CREATE TABLE Table1
("id" int, "stream_id" int, "timestamp" timestamp, "value" real)
;
INSERT INTO Table1
("id", "stream_id", "timestamp", "value")
VALUES
(1, 7, '2015-06-01 15:20:30', 0.1),
(2, 7, '2015-06-01 15:20:31', 0.2),
(3, 7, '2015-06-01 15:20:32', 0.3),
(4, 7, '2015-06-01 15:25:30', 0.5),
(5, 7, '2015-06-01 15:25:31', 1.0),
(6, 6, '2015-06-01 15:20:31', 1.1),
(7, 6, '2015-06-01 15:20:32', 1.2),
(8, 6, '2015-06-01 15:20:33', 1.3),
(9, 6, '2015-06-01 15:25:31', 1.5),
(10, 6, '2015-06-01 15:25:32', 2.0)
;
My attempt to solve it:
with ts as (select "timestamp"
from Table1
order by "timestamp"
),
data as (select "timestamp","value"
from Table1
order by "timestamp"
),
streams as (select "stream_id"
from Table1
group by "stream_id"
order by "stream_id"
)
select * .... (question)
I wish to get a graph line for all summed up data. When at a time, there is no data in the other streams, the sum should take the line, which timestamp < current_timestamp but nearest at the current time_stamp. If there is no value, assume 0.
I thought about recursive queries, but I somehow don't see the solution...
EDIT: Here I tried to explain it graphically:
EDIT 2:
I think about something like this, but I don't get the last "thingy" to finish it.
with RECURSIVE data as (
select * from rawdata
where date(date_time)='2014-05-01'
),
streams as (
select stream_id from data
group by stream_id
),
t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < (select count(*) from streams)
)
SELECT n FROM t;

I apologize, there was an error in the previous query
Here is a new, corrected query:
WITH times AS(
SELECT DISTINCT "timestamp" As tm
FROM table1
)
SELECT tm, SUM( val ) as s_u_m
FROM (
SELECT tm, "stream_id",
( SELECT "value" FROM Table1 t2
WHERE t2."timestamp" = max( t1."timestamp" )
AND t2."stream_id" = t1."stream_id"
ORDER BY "id" DESC LIMIT 1
) As val
FROM times t
JOIN table1 t1
ON t.tm >= t1."timestamp"
GROUP BY tm, "stream_id"
order by tm
) you_must_have_an_alias_here_in_order_to_avoid_the_syntax_error
GROUP BY tm
ORDER BY tm;
;
and a demo with 3 streams in the source data: http://sqlfiddle.com/#!15/30eb8/5
This is a source table with a layout that mimics the layout of your graph:
| x | id | timestamp | stream6 | stream7 | stream8 |
|----|----|------------------------|---------|---------|---------|
| 1 | 1 | June, 01 2015 15:20:30 | (null) | 0.1 | (null) |
| 2 | 2 | June, 01 2015 15:20:31 | (null) | 0.2 | (null) |
| 3 | 3 | June, 01 2015 15:20:31 | 1.1 | (null) | (null) |
| 4 | 4 | June, 01 2015 15:20:32 | (null) | 0.3 | (null) |
| 5 | 5 | June, 01 2015 15:20:32 | 1.2 | (null) | (null) |
| 6 | 11 | June, 01 2015 15:20:32 | (null) | (null) | 2.3 |
| 7 | 12 | June, 01 2015 15:20:32 | (null) | (null) | 1.1 |
| 8 | 10 | June, 01 2015 15:20:33 | 1.3 | (null) | (null) |
| 9 | 13 | June, 01 2015 15:20:33 | (null) | (null) | 1.7 |
| 10 | 6 | June, 01 2015 15:25:30 | (null) | 0.5 | (null) |
| 11 | 7 | June, 01 2015 15:25:31 | 1.5 | (null) | (null) |
| 12 | 8 | June, 01 2015 15:25:31 | (null) | 1 | (null) |
| 13 | 9 | June, 01 2015 15:25:32 | 2 | (null) | (null) |
And the result is: ( v(3) means: value from a record with x=3)
| tm | s_u_m |
|------------------------|-----------|
| June, 01 2015 15:20:30 | 0.1 | 0 + v(1) + 0
| June, 01 2015 15:20:31 | 1.3000001 | v(3) + v(2) + 0
| June, 01 2015 15:20:32 | 2.6 | v(5) + v(4) + v(7) => see note below !!!
| June, 01 2015 15:20:33 | 3.3 | v(8) + v(4) + v(9)
| June, 01 2015 15:25:30 | 3.5 | v(8) + v(10)+ v(9)
| June, 01 2015 15:25:31 | 4.2 | v(11)+ v(12)+ v(9)
| June, 01 2015 15:25:32 | 4.7 | v(13)+ v(12)+ v(9)
Note for a record | June, 01 2015 15:20:32 | 2.6 |
The source table in the demo contains two records with the same date and the same source_id:
| 6 | 11 | June, 01 2015 15:20:32 | (null) | (null) | 2.3 |
| 7 | 12 | June, 01 2015 15:20:32 | (null) | (null) | 1.1 |
The query picks up only the latest record x=7 due to ORDER BY "id" DESC in this code fragment:
( SELECT "value" FROM Table1 t2
WHERE t2."timestamp" = max( t1."timestamp" )
AND t2."stream_id" = t1."stream_id"
ORDER BY "id" DESC LIMIT 1
) As val
If you want to pick up the first record x=6 instead of the latest, then remove DESC from the order by clause.
If you want to sum all records with the same date and stream_id (in the above example - records 6 + 7), then change the above query to:
( SELECT SUM("value") FROM Table1 t2
WHERE t2."timestamp" = max( t1."timestamp" )
AND t2."stream_id" = t1."stream_id"
) As val
And if you want to pick up a random record, then use ORDER BY random().

Related

PostgreSQL build working range from one date column

I'm using PostgreSQL v. 11.2
I have a table
|id | bot_id | date |
| 1 | 1 | 2020-04-20 16:00:00|
| 2 | 2 | 2020-04-22 12:00:00|
| 3 | 3 | 2020-04-24 04:00:00|
| 4 | 1 | 2020-04-27 09:00:00|
And for example, I have DateTime range 2020-03-30 00:00:00 and 2020-04-30 00:00:00
I need to show get working ranges to count the total working hours of each bot.
Like this:
|bot_id | start_date | end_date |
| 1 | 2020-03-30 00:00:00 | 2020-04-20 16:00:00 |
| 2 | 2020-04-20 16:00:00 | 2020-04-22 12:00:00 |
| 3 | 2020-04-22 12:00:00 | 2020-04-24 04:00:00 |
| 1 | 2020-04-24 04:00:00 | 2020-04-27 09:00:00 |
| 1 | 2020-04-27 09:00:00 | 2020-04-30 00:00:00 |
I've tried to use LAG(date) but I'm not getting first and last dates of the range.
You could use a UNION ALL, with one part building the start_date/end_date couples from your values & the other part filling in the last period (from the last date to 2020-04-30 00:00:00):
WITH values (id, bot_id, date) AS (
VALUES (1, 1, '2020-04-20 16:00:00'::TIMESTAMP)
, (2, 2, '2020-04-22 12:00:00')
, (3, 3, '2020-04-24 04:00:00')
, (4, 1, '2020-04-27 09:00:00')
)
(
SELECT bot_id
, LAG(date, 1, '2020-03-30 00:00:00') OVER (ORDER BY id) AS start_date
, date AS end_date
FROM values
)
UNION ALL
(
SELECT bot_id
, date AS start_date
, '2020-04-30 00:00:00' AS end_date
FROM values
ORDER BY id DESC
LIMIT 1
)
+------+--------------------------+--------------------------+
|bot_id|start_date |end_date |
+------+--------------------------+--------------------------+
|1 |2020-03-30 00:00:00.000000|2020-04-20 16:00:00.000000|
|2 |2020-04-20 16:00:00.000000|2020-04-22 12:00:00.000000|
|3 |2020-04-22 12:00:00.000000|2020-04-24 04:00:00.000000|
|1 |2020-04-24 04:00:00.000000|2020-04-27 09:00:00.000000|
|1 |2020-04-27 09:00:00.000000|2020-04-30 00:00:00.000000|
+------+--------------------------+--------------------------+

Creating a query to subtract different values from a single column into a new column

Poorly worded title, but I can't think of a succinct way to describe my problem.
I have a table with the following columns:
year | month | basin_id | value
I need to take the values for all basin_ids of one year/month and subtract from that the corresponding values for all basin_ids of another year/month, and store the resulting values in such a way that they are still associated with their respective basin_ids.
This seems like it should be a rather simple query/subquery, and I can calculate the difference in values just fine with:
SELECT (val1.value-val2.value)
FROM value_table_1 as val1,
value_table_2 as val2
WHERE val1.basin_id=val2.basin_id
where value_table_1 and value_table_2 are temporary tables I've made by segregating all values associated with year1/month1 and year2/month2 for the sake of simplifying my query.
My problem from here is I get a column with all of the new values, but not with their associated basins. How can I achieve this? I am writing this within a plpgsql stored procedure, if that helps.
Say my table is as follows:
year | month | basin_id | value
-----+-------+----------+-------
2017 | 04 | 123 | 10
2017 | 04 | 456 | 6
2017 | 05 | 123 | 12
2017 | 05 | 456 | 4
and I'm given the inputs:
year1 := 2017
month1 := 04
year2 := 2017
month2 := 05
I want to get the following table as a result:
basin_id | value
----------+------
123 | -2
456 | 2
I think you want something like this..
CREATE TABLE foo
AS
SELECT *
FROM ( VALUES
( 2010, 02, 5, 8 ),
( 2013, 05, 5, 3 )
) AS t( year, month, basinid, value );
CREATE TEMPORARY TABLE bar
AS
SELECT basinid,
f1.year AS f1y, f1.month AS f1m,
f2.year AS f2y, f2.month AS f2m,
f1.value-f2.value AS value
FROM foo AS f1
INNER JOIN foo as f2
USING (basinid);
basinid | f1y | f1m | f2y | f2m | value
---------+------+-----+------+-----+----------
5 | 2010 | 2 | 2010 | 2 | 0
5 | 2010 | 2 | 2013 | 5 | 5
5 | 2013 | 5 | 2010 | 2 | -5
5 | 2013 | 5 | 2013 | 5 | 0
(4 rows)
SELECT *
FROM bar
WHERE f1y = 2013
AND f1m = 5
AND f2y = 2010
AND f2m = 2;

Tsql -> filter data 6 months from today, date field in table is YYYYMM

I need some help.
Currently is March 2017.
how do I extract all records 6 months ago from February 2017 until end of this year. the date format in my table is in YYYYMM
Here is my sql statement
select columns from budget
where month_number > = DATEADD(MONTH, -6, CURRENT_TIMESTAMP);
the output I am getting is as below:
+------------+-------+--------------+
| month_name | month | month_number |
+------------+-------+--------------+
| January | 1 | 201601 |
| February | 2 | 201602 |
| March | 3 | 201603 |
| April | 4 | 201604 |
| May | 5 | 201605 |
| June | 6 | 201606 |
| July | 7 | 201607 |
| August | 8 | 201608 |
| September | 9 | 201609 |
| October | 10 | 201610 |
| November | 11 | 201611 |
| December | 12 | 201612 |
| January | 1 | 201701 |
| February | 2 | 201702 |
| March | 3 | 201703 |
| April | 4 | 201704 |
| July | 7 | 201707 |
| December | 12 | 201712 |
+------------+-------+--------------+
I am not getting the right output. I am still getting data from Jan 2016 onwards. Please help
Thanks
Alternatively ..
declare #budget table (month_Number int)
insert #budget (month_number)
select 201601
union all
select 201602
union all
select 201702
union all
select 201705
union all
select 201709
select * from #budget
where month_number >= (YEAR(DATEADD(MONTH, -6, CURRENT_TIMESTAMP)) * 100) + MONTH(DATEADD(MONTH, -6, CURRENT_TIMESTAMP));
Select *
From Budget
Where month_number>= convert(varchar(6),DATEADD(MONTH, -6, CURRENT_TIMESTAMP),112)
Order By month_number
If 2012+
Select *
From Budget
Where month_number>= format(DATEADD(MONTH, -6, CURRENT_TIMESTAMP),'yyyyMM')
Order By month_number
Returns
month_name month month_number
September 9 201609
October 10 201610
November 11 201611
December 12 201612
January 1 201701
February 2 201702
March 3 201703
April 4 201704
July 7 201707
December 12 201712

Calculating the forecasts by month for the last 3 months in postgres

I have a table called forecasts where we store the forecasts for all the products for the next 6 months. For example when we are in November we create the forecast for December, January, February, March, April and May. The forecasts table looks something like the one below
+----------------+---------------+--------------+----------+
| product_number | forecasted_on | forecast_for | quantity |
+----------------+---------------+--------------+----------+
| Prod 1 | 2016-11-01 | 2016-12-01 | 100 |
| Prod 1 | 2016-11-01 | 2017-01-01 | 200 |
| Prod 1 | 2016-11-01 | 2017-02-01 | 300 |
| Prod 1 | 2016-11-01 | 2017-03-01 | 400 |
| Prod 1 | 2016-11-01 | 2017-04-01 | 500 |
| Prod 1 | 2016-11-01 | 2017-05-01 | 600 |
+----------------+---------------+--------------+----------+
Where the table contains a list of product numbers and the date on which the forecast was created i.e. forecasted_on and a month for which the forecast was created for along with the forecasted quantity.
Each month data gets added for the next 6 months. So when the forecasted_on is 1-December-2016 forecasts will be created for January till June.
I am trying to create a report that shows how the total forecasts have varied for the last 3 months. Something like this
+------------+----------------+---------------+----------------+
| | 0 months prior | 1 month prior | 2 months prior |
+------------+----------------+---------------+----------------+
| 2016-12-01 | 200 | 150 | 250 |
| 2017-01-01 | 300 | 250 | 150 |
| 2017-02-01 | 100 | 150 | 100 |
+------------+----------------+---------------+----------------+
Currently I am using a lot of repetitive code in rails to generate this table. I wanted to see if there was an easier way to do it directly using a SQL query.
Any help would be greatly appreciated.
Use PIVOT query:
select forecast_for,
sum( case when forecasted_on + interval '1' month = forecast_for
then quantity end ) q_0,
sum( case when forecasted_on + interval '2' month = forecast_for
then quantity end ) q_1,
sum( case when forecasted_on + interval '3' month = forecast_for
then quantity end ) q_2,
sum( case when forecasted_on + interval '4' month = forecast_for
then quantity end ) q_3,
sum( case when forecasted_on + interval '5' month = forecast_for
then quantity end ) q_4,
sum( case when forecasted_on + interval '6' month = forecast_for
then quantity end ) q_5
from Table1
group by forecast_for
order by 1
;
Demo: http://sqlfiddle.com/#!15/30e5e/1
| forecast_for | q_0 | q_1 | q_2 | q_3 | q_4 | q_5 |
|----------------------------|--------|--------|--------|--------|--------|--------|
| December, 01 2016 00:00:00 | 100 | (null) | (null) | (null) | (null) | (null) |
| January, 01 2017 00:00:00 | (null) | 200 | (null) | (null) | (null) | (null) |
| February, 01 2017 00:00:00 | (null) | (null) | 300 | (null) | (null) | (null) |
| March, 01 2017 00:00:00 | (null) | (null) | (null) | 400 | (null) | (null) |
| April, 01 2017 00:00:00 | (null) | (null) | (null) | (null) | 500 | (null) |
| May, 01 2017 00:00:00 | (null) | (null) | (null) | (null) | (null) | 600 |
Assuming that (product_number, forcast_on, forcasted_for) is unique (so no aggregation is required), then this should do the job:
WITH forecast_dates AS (
SELECT DISTINCT product_number, forcast_for
FROM forecasts
)
SELECT
fd.forcast_for AS "forecast for",
m1.quantity AS "one month prior",
m2.quantity AS "two months prior",
m3.quantity AS "three months prior"
FROM forecast_dates fd
LEFT JOIN forecasts m1 ON fd.forcast_for = m1.forcast_for AND fd.forcast_for = m1.forcasted_on + INTERVAL '1 month'
LEFT JOIN forecasts m2 ON fd.forcast_for = m2.forcast_for AND fd.forcast_for = m2.forcasted_on + INTERVAL '2 month'
LEFT JOIN forecasts m3 ON fd.forcast_for = m3.forcast_for AND fd.forcast_for = m3.forcasted_on + INTERVAL '3 month'
WHERE fd.product_number = 'Prod 1'
ORDER BY fd.forcast_for;

Why does this crosstab() query return duplicate keys?

I have following table called sample_events:
Column | Type
--------+-----
title | text
date | date
with values:
title | date
-------+------------
ev1 | 2017-01-01
ev2 | 2017-01-03
ev3 | 2017-01-02
ev4 | 2017-12-10
ev5 | 2017-12-11
ev6 | 2017-07-28
In order to create a pivot table with the number of events per month in each unique year I used the crosstab function in the form crosstab(text source_sql, text category_sql):
SELECT * FROM crosstab (
'SELECT extract(year from date) AS year,
extract(month from date) AS month, count(*)
FROM sample_events
GROUP BY year, month'
,
'SELECT * FROM generate_series(1, 12)'
) AS (
year int, jan int, feb int, mar int,
apr int, may int, jun int, jul int,
aug int, sep int, oct int, nov int, dec int
) ORDER BY year;
Result is as follows and as expected:
year | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
2017 | 3 | | | | | | 1 | | | | | 2
Now, I would like to create a pivot table with the number of events per day of week in each unique week of the year. I tried following query:
SELECT * FROM crosstab (
'SELECT extract(week from date) AS week,
extract(dow from date) AS day_of_week, count(*)
FROM sample_events
GROUP BY week, day_of_week'
,
'SELECT * FROM generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
) ORDER BY week;
Result is not as expected:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | | 1 | | | |
1 | | 1 | | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
All six events are there but for whatever reason there is duplicate week value. I expected the result to be something like:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | 1 | 1 | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
Questions
1) Why do results from the latter query contain duplicate key values but the former does not?
2) How to create a pivot table with unique week values?
crosstab() expects ordered input. You need to add ORDER BY in the input:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int AS week
, extract(dow from date)::int AS day_of_week
, count(*)::int
FROM sample_events
GROUP BY week, day_of_week
ORDER BY week, day_of_week'
, 'SELECT generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
);
Or just ORDER BY week.
Strictly speaking, values of the same key (week in the example) need to be grouped (come in sequence). Keys don't have to be ordered. But the simplest and cheapest way to achieve this is ORDER BY (which sorts keys additionally).
Or short:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int
, extract(dow from date)::int
, count(*)::int
FROM sample_events
GROUP BY 1, 2
ORDER BY 1, 2' -- or just ORDER BY 1
, 'SELECT generate_series(0, 6)'
) AS ...
Your first example with months happens to work because input data has months in sequence. But this can break any time if the physical order of rows in your table changes (VACUUM, UPDATE, ...). You can never rely on the physical order of rows in a relational table.
More explanation:
PostgreSQL Crosstab Query