how to retrieve information from three tables in below conditions in postgresql - postgresql

I have three tables.
TABLE_1:
T2_ID ver date boolean
---------------------------------------------------------
1 | X-20-50 | 2019-01-01 16:20:51.722336+00 | TRUE
2 | X-50-30 | 2019-02-26 16:20:51.722336+00 | TRUE
3 | X-20-32 | 2019-03-20 16:20:51.722336+00 | FALSE
1 | X-20-50 | 2019-01-09 16:20:51.722336+00 | FALSE
2 | X-20-50 | 2019-12-02 16:20:51.722336+00 | TRUE
3 | X-20-50 | 2019-01-24 16:20:51.722336+00 | TRUE
TABLE_2:
id | type | scheduler
--------------------------------------------------
1 | ABC | w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12
2 | PQR | w5,w9
3 | TRC | w1,w4,w8
TABLE_3
start_date_of_ver | end_date_of_ver | ver_name
-----------------------------------------------------------
2019-01-01 00:00:00+00 | 2019-04-01 00:00:00+00 | X-20-50
2019-02-25 00:00:00+00 | 2019-05-26 00:00:00+00 | X-50-30
2019-03-15 00:00:00+00 | 2019-06-06 00:00:00+00 | X-20-32
Table 4 should fulfill the below condition.
it takes version name (ver_name) as input
from this (ver_name), it takes start date and end date of version (from table_3) if the version period is 3 months then it creates 12 weeks table with id (type) as the first column and creates an entry of twelve-week according to table 2 of the scheduler.
information on table 4 will be updated as and when table 1 has entries of that particular week which are TRUE
Note: table 1, entries get generates on a daily basis.
Desired table: which has only ver_name as input and calculate below table.
When table_1 don't have any entries then table_4 should look like as below
Table_4: X-20-50
id_of_table_2 | week_1 | week_2 | week_3 | week_4 | week_5 | week_6 | week_7 | week_8 | week_9 | week_10 | week_11 | week_12 |
------------------------------------------------------------------------------------------------------------------------------
ABC | w1 | w2 | w3 | w4 | w5 | w6 | w7 | w8 | w9 | w10 | w11 | w12 |
PQR | | | | | w5 | | | | w9 | | | |
TRC | w1 | | | w4 | | | | w8 | | | | |
When table_1 has entries then table_4 should look like as below
X-20-50
id_of_table_2 | week_1 | week_2 | week_3 | week_4 | week_5 | week_6 | week_7 | week_8 | week_9 | week_10 | week_11 | week_12 |
------------------------------------------------------------------------------------------------------------------------------
ABC | Done | Done | w3 | w4 | w5 | w6 | w7 | w8 | w9 | w10 | w11 | w12 |
PQR | | | | | w5 | | | | w9 | | | |
TRC | Done | | | w4 | | | | w8 | | | | |

You can create function which can take starting date of a week as input.
Example-
create function a(start_date)
RETURNS json
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
outputjson json;
BEGIN
EXECUTE 'select json_agg(*) from table_name where date >= '||start_date||' and (date '||start_date||' + integer ''7'')' into outputjson;
RETURN outputjson;
END;
$$
Hope this will help.

Your requirement needs a little refinement. You specify to retrieve weekly data yet fail to define a your week. On what day does it begin? Are all weeks 7 days long? What happens when Dec 31 falls on Tuesday is Friday Jan 3 in the same week (see current year calendar). Then there is the issue of user input and what it represents. Is it the desired start date and the week is that date and the next 6 days or any date within weekly period?
The following assumes an ISO 8601 definition (google it - lots of stuff). Every week begins on Monday and all weeks are 7 days long. (Thus the week containing 31-Dec-2019 also includes 3-Jan-2020). The routine extracts the ISO Year and ISO week user entered date.
--setup
create table weekly_something( c1 text, c2 text, date1 timestamptz, someem boolean);
insert into weekly_something( c1, c2, date1, someem )
values ('ABC','AB-20-50','2019-11-25 16:20:51.722336+00',TRUE)
, ('PQR','AB-50-30','2019-11-26 16:20:51.722336+00',TRUE)
, ('TRC','CD-20-32','2019-11-27 16:20:51.722336+00',FALSE)
, ('ABC','AB-20-50','2019-12-02 16:20:51.722336+00',FALSE)
, ('ABC','AB-20-50','2019-12-02 16:20:51.722336+00',TRUE)
, ('JFF','yy-45-89','2019-12-31 16:20:51.722336+00',TRUE)
, ('JFF','yy-89-30','2020-01-03 16:20:51.722336+00',TRUE) ;
-- JFF Just For Fun
-- SQL Function
create function week_of(week_date date)
returns setof weekly_something
language sql stable strict
as $$
select *
from weekly_something
where (extract('isoyear' from week_date), extract('week' from week_date)) =
(extract('isoyear' from date1), extract('week' from date1));
$$;
-- test
select * from week_of('2019-11-26');
select * from week_of('2019-12-30');

Related

Condition lead results in postgres query

I have a table person_updates in postgresql with rows like:
| id | status | person_id | modified_at |
|----|--------|-----------|------------------|
| 1 | INFO | 2 | 2019-11-01 10:00 |
| 1 | UPDATE | 2 | 2019-11-02 15:00 |
| 1 | DEBUG | 2 | 2019-11-03 12:00 |
| 3 | INFO | 4 | 2019-11-04 14:00 |
| 3 | UPDATE | 4 | 2019-11-05 16:00 |
| 5 | INFO | 6 | 2019-11-06 08:00 |
| 5 | DEBUG | 6 | 2019-11-07 07:00 |
I want to get the INFO rows that are followed by an UPDATE row:
| id | status | person_id | modified_at |
|----|--------|-----------|------------------|
| 1 | INFO | 2 | 2019-11-01 10:00 |
| 3 | INFO | 4 | 2019-11-04 14:00 |
I've attempted this by doing a lead query
select d2.id, d2.status, d2.modified_at, d2.person_id,
lead(d2.status) over (partition by d2.id order by d2.modified_at) as next_status
from person_updates d2
where d2.status = 'INFO'
This returns more rows than I want. Adding a and d2.next_status = 'UPDATE' throws an error. How do I do this query?
Like this:
select t.id, t.status, t.modified_at, t.person_id
from (
select *,
lead(status) over (partition by id order by modified_at) as next_status
from person_updates
) t
where t.status = 'INFO' and t.next_status = 'UPDATE'
See the demo.
Results:
| id | status | modified_at | person_id |
| --- | ------ | ------------------------ | --------- |
| 1 | INFO | 2019-11-01T10:00:00.000Z | 2 |
| 3 | INFO | 2019-11-04T14:00:00.000Z | 4 |
You can use window function lead() to get the status of the next record. Since window functions are not allowed in the where clause, you need to turn the query to a subquery, and then filter in the outer query, like so:
select *
from (
select
t.*,
lead(status) over(partition by id order by modified_at) lead_status
from person_updates t
) t
where status = 'INFO' and lead_status = 'UPDATE'

SQL Insert fails on i.name does not exist, when it seemingly does, during insert

I'm using Postgres SQL and pgAdmin. I'm attempting to copy data between a staging table, and a production table using INSERT INTO with a SELECT FROM statement with a to_char along the way. This may or may not be the wrong approach. The SELECT fails because apparently "column i.dates does not exist".
The question is: Why am I getting 'column i.dates does not exist'?
The schema for both tables is identical except for a date conversion.
I've tried matching the schema of the tables with the exception of the to_char conversion. I've checked and double checked the column exists.
This is the code I'm trying:
INSERT INTO weathergrids (location, dates, temperature, rh, wd, ws, df, cu, cc)
SELECT
i.location AS location,
i.dates as dates,
i.temperature as temperature,
i.rh as rh,
i.winddir as winddir,
i.windspeed as windspeed,
i.droughtfactor as droughtfactor,
i.curing as curing,
i.cloudcover as cloudcover
FROM (
SELECT location,
to_char(to_timestamp(dates, 'YYYY-DD-MM HH24:MI'), 'HH24:MI YYYY-MM-DD HH24:MI'),
temperature, rh, wd, ws, df, cu, cc
FROM wosweathergrids
) i;
The error I'm receiving is:
ERROR: column i.dates does not exist
LINE 4: i.dates as dates,
^
SQL state: 42703
Character: 151
My data schema is like:
+-----------------+-----+-------------+-----------------------------+-----+
| TABLE | NUM | COLNAME | DATATYPE | LEN |
+-----------------+-----+-------------+-----------------------------+-----+
| weathergrids | 1 | id | integer | 32 |
| weathergrids | 2 | location | numeric | 6 |
| weathergrids | 3 | dates | timestamp without time zone | |
| weathergrids | 4 | temperature | numeric | 3 |
| weathergrids | 5 | rh | numeric | 4 |
| weathergrids | 6 | wd | numeric | 4 |
| weathergrids | 7 | wsd | numeric | 4 |
| weathergrids | 8 | df | numeric | 4 |
| weathergrids | 9 | cu | numeric | 4 |
| weathergrids | 10 | cc | numeric | 4 |
| wosweathergrids | 1 | id | integer | 32 |
| wosweathergrids | 2 | location | numeric | 6 |
| wosweathergrids | 3 | dates | character varying | 16 |
| wosweathergrids | 4 | temperature | numeric | 3 |
| wosweathergrids | 5 | rh | numeric | 4 |
| wosweathergrids | 6 | wd | numeric | 4 |
| wosweathergrids | 7 | ws | numeric | 4 |
| wosweathergrids | 8 | df | numeric | 4 |
| wosweathergrids | 9 | cu | numeric | 4 |
| wosweathergrids | 10 | cc | numeric | 4 |
+-----------------+-----+-------------+-----------------------------+-----+
Your derived table (sub-query) named i has no column named dates because the column dates is "hidden" in the to_char() function and as it does not define an alias for that expression, no column dates is available "outside" of the derived table.
But I don't see the reason for a derived table to begin with. Also: aliasing a column with the same name is also unnecessary i.location as location is exactly the same thing as i.location.
So your query can be simplified to:
INSERT INTO weathergrids (location, dates, temperature, rh, wd, ws, df, cu, cc)
SELECT
location,
to_timestamp(dates, 'YYYY-DD-MM HH24:MI'),
temperature,
rh,
winddir,
windspeed,
droughtfactor,
curing,
cloudcover
FROM wosweathergrids
You don't need to give an alias to the to_timestamp() expression as the column are matched by position, not by name in an insert ... select statement.

SUM OVER PARTITION ON Date range

Im trying to do a cumulative sum over specific periods of time for every row in Postgres, example:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 4 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 3 | Aaron |
|---------------------|------------------|------------------|
| 22-05-1990 | 7 | Aaron |
|---------------------|------------------|------------------|
Expected result, taking a range of 60 days:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 38 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 41 | Aaron |
|---------------------|------------------|------------------|
| 01-05-1990 | 10 | Aaron |
|---------------------|------------------|------------------|
I tried with the following but the results are not correct:
WITH tab AS (SELECT * FROM table_with_values)
SELECT tab.Date, SUM(tab.Value)
FILTER (WHERE tab.Date<=tab.Date AND tab.Date >=t.Date - INTERVAL '60 DAY')
OVER(PARTITION BY tab.Employee ORDER BY tab.Date ROWS BETWEEN UNBOUND PRECEDENT AND CURRENT ROW)
AS values_cumulative, tab.Employee
FROM tab
Try this:
SELECT date, employee, sum(bvalue)
FROM (
SELECT a.*, b.date as bdate, b.value as bvalue
FROM testtable a
LEFT JOIN testtable b ON
a.employee = b.employee AND
b.date <= a.date AND
b.date >= a.date - integer '60') c
GROUP BY employee, date
ORDER BY date ASC;
date | employee | sum
------------+----------+-----
1990-01-25 | Aaron | 34
1990-02-15 | Aaron | 38
1990-03-02 | Aaron | 41
1990-05-01 | Aaron | 10
(4 Zeilen)

PostgreSQL Crosstab generate_series of weeks for columns

From a table of "time entries" I'm trying to create a report of weekly totals for each user.
Sample of the table:
+-----+---------+-------------------------+--------------+
| id | user_id | start_time | hours_worked |
+-----+---------+-------------------------+--------------+
| 997 | 6 | 2018-01-01 03:05:00 UTC | 1.0 |
| 996 | 6 | 2017-12-01 05:05:00 UTC | 1.0 |
| 998 | 6 | 2017-12-01 05:05:00 UTC | 1.5 |
| 999 | 20 | 2017-11-15 19:00:00 UTC | 1.0 |
| 995 | 6 | 2017-11-11 20:47:42 UTC | 0.04 |
+-----+---------+-------------------------+--------------+
Right now I can run the following and basically get what I need
SELECT COALESCE(SUM(time_entries.hours_worked),0) AS total,
time_entries.user_id,
week::date
--Using generate_series here to account for weeks with no time entries when
--doing the join
FROM generate_series( (DATE_TRUNC('week', '2017-11-01 00:00:00'::date)),
(DATE_TRUNC('week', '2017-12-31 23:59:59.999999'::date)),
interval '7 day') as week LEFT JOIN time_entries
ON DATE_TRUNC('week', time_entries.start_time) = week
GROUP BY week, time_entries.user_id
ORDER BY week
This will return
+-------+---------+------------+
| total | user_id | week |
+-------+---------+------------+
| 14.08 | 5 | 2017-10-30 |
| 21.92 | 6 | 2017-10-30 |
| 10.92 | 7 | 2017-10-30 |
| 14.26 | 8 | 2017-10-30 |
| 14.78 | 10 | 2017-10-30 |
| 14.08 | 13 | 2017-10-30 |
| 15.83 | 15 | 2017-10-30 |
| 8.75 | 5 | 2017-11-06 |
| 10.53 | 6 | 2017-11-06 |
| 13.73 | 7 | 2017-11-06 |
| 14.26 | 8 | 2017-11-06 |
| 19.45 | 10 | 2017-11-06 |
| 15.95 | 13 | 2017-11-06 |
| 14.16 | 15 | 2017-11-06 |
| 1.00 | 20 | 2017-11-13 |
| 0 | | 2017-11-20 |
| 2.50 | 6 | 2017-11-27 |
| 0 | | 2017-12-04 |
| 0 | | 2017-12-11 |
| 0 | | 2017-12-18 |
| 0 | | 2017-12-25 |
+-------+---------+------------+
However, this is difficult to parse particularly when there's no data for a week. What I would like is a pivot or crosstab table where the weeks are the columns and the rows are the users. And to include nulls from each (for instance if a user had no entries in that week or week without entries from any user).
Something like this
+---------+---------------+--------------+--------------+
| user_id | 2017-10-30 | 2017-11-06 | 2017-11-13 |
+---------+---------------+--------------+--------------+
| 6 | 4.0 | 1.0 | 0 |
| 7 | 4.0 | 1.0 | 0 |
| 8 | 4.0 | 0 | 0 |
| 9 | 0 | 1.0 | 0 |
| 10 | 4.0 | 0.04 | 0 |
+---------+---------------+--------------+--------------+
I've been looking around online and it seems that "dynamically" generating a list of columns for crosstab is difficult. I'd rather not hard code them, which seems weird to do anyway for dates. Or use something like this case with week number.
Should I look for another solution besides crosstab? If I could get the series of weeks for each user including all nulls I think that would be good enough. It just seems that right now my join strategy isn't returning that.
Personally I would use a Date Dimension table and use that table as the basis for the query. I find it far easier to use tabular data for these types of calculations as it leads to SQL that's easier to read and maintain. There's a great article on creating a Date Dimension table in PostgreSQL at https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac, though you could get away with a much simpler version of this table.
Ultimately what you would do is use the Date table as the base for the SELECT cols FROM table section and then join against that, or probably use Common Table Expressions, to create the calculations.
I'll write up a solution to that if you would like demonstrating how you could create such a query.

Crosstab function and Dates PostgreSQL

I had to create a cross tab table from a Query where dates will be changed into column names. These order dates can be increase or decrease as per the dates passed in the query. The order date is in Unix format which is changed into normal format.
Query is following:
Select cd.cust_id
, od.order_id
, od.order_size
, (TIMESTAMP 'epoch' + od.order_date * INTERVAL '1 second')::Date As order_date
From consumer_details cd,
consumer_order od,
Where cd.cust_id = od.cust_id
And od.order_date Between 1469212200 And 1469212600
Order By od.order_id, od.order_date
Table as follows:
cust_id | order_id | order_size | order_date
-----------|----------------|---------------|--------------
210721008 | 0437756 | 4323 | 2016-07-22
210721008 | 0437756 | 4586 | 2016-09-24
210721019 | 10749881 | 0 | 2016-07-28
210721019 | 10749881 | 0 | 2016-07-28
210721033 | 13639 | 2286145 | 2016-09-06
210721033 | 13639 | 2300040 | 2016-10-03
Result will be:
cust_id | order_id | 2016-07-22 | 2016-09-24 | 2016-07-28 | 2016-09-06 | 2016-10-03
-----------|----------------|---------------|---------------|---------------|---------------|---------------
210721008 | 0437756 | 4323 | 4586 | | |
210721019 | 10749881 | | | 0 | |
210721033 | 13639 | | | | 2286145 | 2300040