Postgres - create a column (alter table) as a calculation of other two columns - postgresql

I have a table in Posgres that contains task start and task end dates. It's possible to generate a column in this tale as rate between (current day -start day) /(start day-end day) the column is the % of time elapse. I try in this way but does not work.
ALTER TABLE public.gantt_task
ADD COLUMN
percentage_progress
GENERATED ALWAYS AS (
(DATEDIFF("day",
CURRENT_DATE,public.gantt_Tasks.start_date)) / DATEDIFF("day", public.gantt_Tasks.end_date ,public.gantt_Tasks.start_date))
STORED

The manual says postgres only supports materialized (ie, stored) generated columns, which means the value is generated when the row is inserted or updated, which means it will use the insert/update date, not the CURRENT_DATE you want.
So, you need to create a view instead. This allows evaluating CURRENT_DATE at the date of the SELECT, not the INSERT/UPDATE, to generate columns.
CREATE VIEW foo AS SELECT *,
(CURRENT_DATE - public.gantt_Tasks.start_date)
/ (public.gantt_Tasks.end_date-public.gantt_Tasks.start_date)
AS percentage_progress
FROM public.gantt_task
Note DATEDIFF is mysql syntax not postgres, and division by zero is not allowed, so if start_date and end_date can be identical then you'll have to modify the expression conditions depending on what you want. Also your expression will go over 100% when CURRENT_DATE is later than end_date. Perhaps something like:
least( 1.0, (CURRENT_DATE-start_date)/greatest( 1, end_date-start_date)::FLOAT )

I won't write proper SQL code. But you might/should split it into two or three tasks:
Add new column that allows null (that should be default)
Update table
Add constrains (if required)

Related

Postgres: Storing output of moving average query to a column

I have a table in Postgres 14.2
Table name is test
There are 3 columns: date, high, and five_day_mavg (date is PK if it matters)
I have a select statement which properly calculates a 5 day moving average based on the data in high.
select date,
avg(high) over (order by date rows between 4 preceding and current row) as mavg_calc
from test
It products output as such:
I have 2 goals:
First to store the output of the query in five_day_mavg.
Second to store this in such a way that when I a new row with data
in high, it automatically calculates that value
The closest I got was:
update test set five_day_mavg = a.mav_calc
from (
select date,
avg(high) over (order by date rows between 4 preceding and current row) as mav_calc
from test
) a;
but all that does is sets the value of every row in five_day_mavg to entire average of high
Thanks to #a_horse_with_no_name
I played around with the WHERE clause
update test l set five_day_mavg = b.five_day_mavg from (select date, avg(high) over (order by date rows between 4 preceding and current row) as five_day_mavg from test )b where l.date = b.date;
a couple of things. I defined each table. The original table I aliased as l, the temporary table created by doing a windows function (the select statement in parenthesis) I aliased as b and I joined with the WHERE clause on date which is the index/primary key.
Also, I was using 'a' as the letter for alias, and I think that may have contributed to the issue.
Either way, solved now.

Convert date column in source to date +time format in target (YYYY-MM-DD.HH.MM.SS.nnnnnn) and for each duplicate entry need to increment by 1 nano sec

I have a requirement to convert date in source table column to a date time in target.Now, this date time column is part of a composite primary key in target, so if there are any duplicate entry then we have to increase the nanosecond by 1. This has to be done in Postgres CTE for DBT query. Also there can be duplicates in source so to achieve unique value we need to add nanosecond while conversion for duplicate rows.
For eg, 2021-07-30 00:00:00.000000
If more than one row for same effective date then increment nanosecond by 1
Update:- postgres version 11.9
Postgres 9.4 doesn't have on conflict. And Postgres doesn't support "nanosecond" as an interval. But if you won't get conflicts on after incrementing, you can try:
insert into target (dt)
select (case when t.dt is null then s.dt
else s.dt + interval '1 microsecond'
end)
from source s left join
target t
on s.dt = t.dt;
This problem gets a bit trickier if you have duplicates in the source or if there are conflicts after incrementing. You haven't provided sample data and desired results, so this answers the simplest interpretation of your question,.

No value is added into the column

I am trying to find the difference between the Unix seconds and adding into the existing null column but the results are not added into the column. As I am new I can't figure it out.
INSERT INTO "Operation"(minutes)
select (departure_unix_seconds - arrival_unix_seconds)/60 As Difference
from public."Operation";
assuming you have a column in your table called "minutes" , and you want to update that column , here is the syntax:
update public."Operation"
set minutes = (departure_unix_seconds - arrival_unix_seconds)/60
however usually when a column value depends on other column(s) ,It's better to be implemented as "generated column":
alter table Operation
add column minutes generated always as (departure_unix_seconds - arrival_unix_seconds)/60 stored;

Proper table to track employee changes over time?

I have been using Python to do this in memory, but I would like to know the proper way to set up an employee mapping table in Postgres.
row_id | employee_id | other_id | other_dimensions | effective_date | expiration_date | is_current
Unique constraint on (employee_id, other_id), so a new row would be inserted whenever there is a change
I would want the expiration date from the previous row to be updated to the new effective_date minus 1 day, and the is_current should be updated to False
Ultimate purpose is to be able to map each employee back accurately on a given date
Would love to hear some best practices so I can move away from my file-based method where I read the whole roster into memory and use pandas to make changes, then truncate the original table and insert the new one.
Here's a general example built using the column names you provided that I think does more or less what you want. Don't treat it as a literal ready-to-run solution, but rather an example of how to make something like this work that you'll have to modify a bit for your own actual use case.
The rough idea is to make an underlying raw table that holds all your data, and establish a view on top of this that gets used for ordinary access. You can still use the raw table to do anything you need to do to or with the data, no matter how complicated, but the view provides more restrictive access for regular use. Rules are put in place on the view to enforce these restrictions and perform the special operations you want. While it doesn't sound like it's significant for your current application, it's important to note that these restrictions can be enforced via PostgreSQL's roles and privileges and the SQL GRANT command.
We start by making the raw table. Since the is_current column is likely to be used for reference a lot, we'll put an index on it. We'll take advantage of PostgreSQL's SERIAL type to manage our raw table's row_id for us. The view doesn't even need to reference the underlying row_id. We'll default the is_current to a True value as we expect most of the time we'll be adding current records, not past ones.
CREATE TABLE raw_employee (
row_id SERIAL PRIMARY KEY,
employee_id INTEGER,
other_id INTEGER,
other_dimensions VARCHAR,
effective_date DATE,
expiration_date DATE,
is_current BOOLEAN DEFAULT TRUE
);
CREATE INDEX employee_is_current_index ON raw_employee (is_current);
Now we define our view. To most of the world this will be the normal way to access employee data. Internally it's a special SELECT run on-demand against the underlying raw_employee table that we've already defined. If we had reason to, we could further refine this view to hide more data (it's already hiding the low-level row_id as mentioned earlier) or display additional data produced either via calculation or relations with other tables.
CREATE OR REPLACE VIEW employee AS
SELECT employee_id, other_id,
other_dimensions, effective_date, expiration_date,
is_current
FROM raw_employee;
Now our rules. We construct these so that whenever someone tries an operation against our view, internally it'll perform a operation against our raw table according to the restrictions we define. First INSERT; it mostly just passes the data through without change, but it has to account for the hidden row_id:
CREATE OR REPLACE RULE employee_insert AS ON INSERT TO employee DO INSTEAD
INSERT INTO raw_employee VALUES (
NEXTVAL('raw_employee_row_id_seq'),
NEW.employee_id, NEW.other_id,
NEW.other_dimensions,
NEW.effective_date, NEW.expiration_date,
NEW.is_current
);
The NEXTVAL part enables us to lean on PostgreSQL for row_id handling. Next is our most complicated one: UPDATE. Per your described intent, it has to match against employee_id, other_id pairs and perform two operations: updating the old record to be no longer current, and inserting a new record with updated dates. You didn't specify how you wanted to manage new expiration dates, so I took a guess. It's easy to change it.
CREATE OR REPLACE RULE employee_update AS ON UPDATE TO employee DO INSTEAD (
UPDATE raw_employee SET is_current = FALSE
WHERE raw_employee.employee_id = OLD.employee_id AND
raw_employee.other_id = OLD.other_id;
INSERT INTO raw_employee VALUES (
NEXTVAL('raw_employee_row_id_seq'),
COALESCE(NEW.employee_id, OLD.employee_id),
COALESCE(NEW.other_id, OLD.other_id),
COALESCE(NEW.other_dimensions, OLD.other_dimensions),
COALESCE(NEW.effective_date, OLD.expiration_date - '1 day'::INTERVAL),
COALESCE(NEW.expiration_date, OLD.expiration_date + '1 year'::INTERVAL),
TRUE
);
);
The use of COALESCE enables us to update columns that have explicit updates, but keep old values for ones that don't. Finally, we need to make a rule for DELETE. Since you said you want to ensure you can track employee histories, the best way to do this is also the simplest: we just disable it.
CREATE OR REPLACE RULE employee_delete_protect AS
ON DELETE TO employee DO INSTEAD NOTHING;
Now we ought to be able to insert data into our raw table by performing INSERT operations on our view. Here are two sample employees; the first has a few weeks left but the second is about to expire. Note that at this level we don't need to care about the row_id. It's an internal implementation detail of the lower level raw table.
INSERT INTO employee VALUES (
1, 1,
'test', CURRENT_DATE - INTERVAL '1 week', CURRENT_DATE + INTERVAL '3 weeks',
TRUE
);
INSERT INTO employee VALUES (
2, 2,
'another test', CURRENT_DATE - INTERVAL '1 month', CURRENT_DATE,
TRUE
);
The final example is deceptively simple after all the build-up that we've done. It performs an UPDATE operation on the view, and internally it results in an update to the existing employee #2 plus a new entry for employee #2.
UPDATE employee SET expiration_date = CURRENT_DATE + INTERVAL '1 year'
WHERE employee_id = 2 AND other_id = 2;
Again I'll stress that this isn't meant to just take and use without modification. There should be enough info here though for you to make something work for your specific case.

Add datetime constraint to a PostgreSQL multi-column partial index

I've got a PostgreSQL table called queries_query, which has many columns.
Two of these columns, created and user_sid, are frequently used together in SQL queries by my application to determine how many queries a given user has done over the past 30 days. It is very, very rare that I query these stats for any time older than the most recent 30 days.
Here is my question:
I've currently created my multi-column index on these two columns by running:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
But I'd like to further restrict the index to only care about those queries in which the created date is within the past 30 days. I've tried doing the following:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
WHERE created >= NOW() - '30 days'::INTERVAL`
But this throws an exception stating that my function must be immutable.
I'd love to get this working so that I can optimize my index, and cut back on the resources Postgres needs to do these repeated queries.
You get an exception using now() because the function is not IMMUTABLE (obviously) and, quoting the manual:
All functions and operators used in an index definition must be "immutable" ...
I see two ways to utilize a (much more efficient) partial index:
1. Partial index with condition using constant date:
CREATE INDEX queries_recent_idx ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp;
Assuming created is actually defined as timestamp. It wouldn't work to provide a timestamp constant for a timestamptz column (timestamp with time zone). The cast from timestamp to timestamptz (or vice versa) depends on the current time zone setting and is not immutable. Use a constant of matching data type. Understand the basics of timestamps with / without time zone:
Ignoring time zones altogether in Rails and PostgreSQL
Drop and recreate that index at hours with low traffic, maybe with a cron job on a daily or weekly basis (or whatever is good enough for you). Creating an index is pretty fast, especially a partial index that is comparatively small. This solution also doesn't need to add anything to the table.
Assuming no concurrent access to the table, automatic index recreation could be done with a function like this:
CREATE OR REPLACE FUNCTION f_index_recreate()
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
DROP INDEX IF EXISTS queries_recent_idx;
EXECUTE format('
CREATE INDEX queries_recent_idx
ON queries_query (user_sid, created)
WHERE created > %L::timestamp'
, LOCALTIMESTAMP - interval '30 days'); -- timestamp constant
-- , now() - interval '30 days'); -- alternative for timestamptz
END
$func$;
Call:
SELECT f_index_recreate();
now() (like you had) is the equivalent of CURRENT_TIMESTAMP and returns timestamptz. Cast to timestamp with now()::timestamp or use LOCALTIMESTAMP instead.
Select today's (since midnight) timestamps only
db<>fiddle here
Old sqlfiddle
If you have to deal with concurrent access to the table, use DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY. But you can't wrap these commands into a function because, per documentation:
... a regular CREATE INDEX command can be performed within a
transaction block, but CREATE INDEX CONCURRENTLY cannot.
So, with two separate transactions:
CREATE INDEX CONCURRENTLY queries_recent_idx2 ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp; -- your new condition
Then:
DROP INDEX CONCURRENTLY IF EXISTS queries_recent_idx;
Optionally, rename to old name:
ALTER INDEX queries_recent_idx2 RENAME TO queries_recent_idx;
2. Partial index with condition on "archived" tag
Add an archived tag to your table:
ALTER queries_query ADD COLUMN archived boolean NOT NULL DEFAULT FALSE;
UPDATE the column at intervals of your choosing to "retire" older rows and create an index like:
CREATE INDEX some_index_name ON queries_query (user_sid, created)
WHERE NOT archived;
Add a matching condition to your queries (even if it seems redundant) to allow it to use the index. Check with EXPLAIN ANALYZE whether the query planner catches on - it should be able to use the index for queries on an newer date. But it won't understand more complex conditions not matching exactly.
You don't have to drop and recreate the index, but the UPDATE on the table may be more expensive than index recreation and the table gets slightly bigger.
I would go with the first option (index recreation). In fact, I am using this solution in several databases. The second incurs more costly updates.
Both solutions retain their usefulness over time, performance slowly deteriorates as more outdated rows are included in the index.