postgresql - insert result of query SELECT EXTRACT into another table - postgresql

I have the following table in postgresql (table1):
Var1,
var2,
var3,
timestamp1 timestamp without time zone NOT NULL,
timestamp2 timestamp without time zone NOT NULL,
diff double precision,
The column diff is empty.
I calculate the variable diff by the following code:
SELECT EXTRACT(EPOCH FROM ((timestamp1 – timestamp2)/1800))
I want insert the result of this operation in variable diff of table 1.
I write the following code, but do not work…
CREATE TEMPORARY TABLE temptablename AS
SELECT EXTRACT(EPOCH FROM ((timestamp1 – timestamp2)/1800)) AS diff2 from table1;
INSERT INTO table1 (diff) SELECT diff2 FROM temptablename;
ERROR: null value in column "" violates not-null constraint
DETAIL: Failing row contains (null, null, null, null, null,83).

Assuming your arithmetic is right, it sounds like you just need an update statement.
update table1
set diff = extract(epoch from ((timestamp1 – timestamp2)/1800))
where diff is null;
The WHERE clause isn't strictly necessary, since you already know that column is empty. But it guards against overwriting values the second time you run that statement.

Related

Weighted Random Selection

Please. I have two tables with the most common first and last names. Each table has basically two fields:
Tables
CREATE TABLE "common_first_name" (
"first_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
CREATE TABLE "common_last_name" (
"last_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
P.S: The TOP 1 name occurs only ~ 1.8% of the time. The tables have 1000 rows each.
Function (Pseudo, not READY)
CREATE OR REPLACE FUNCTION create_sample_data(p_number_of_records INT)
RETURNS VOID
AS $$
DECLARE
SUM_OF_WEIGHTS CONSTANT INT := 100;
BEGIN
FOR i IN 1..coalesce(p_number_of_records, 0) LOOP
--Get the random first and last name but taking in consideration their probability (RATIO)round(random()*SUM_OF_WEIGHTS);
--create_person (random_first_name || ' ' || random_last_name);
END LOOP;
END
$$
LANGUAGE plpgsql VOLATILE;
P.S.: The sum of all ratios for each name (per table) sums up to 100%.
I want to run a function N times and get a name and a surname to create sample data... both tables have 1000 rows each.
The sample size can be anywhere from 1000 full names to 1000000 names, so if there is a "fast" way of doing this random weighted function, even better.
Any suggestion of how to do it in PL/PGSQL?
I am using PG 13.3 on SUPABASE.IO.
Thanks
Given the small input dataset, it's straightforward to do this in pure SQL. Use CTEs to build lower & upper bound columns for each row in each of the common_FOO_name tables, then use generate_series() to generate sets of random numbers. Join everything together, and use the random value between the bounds as the WHERE clause.
with first_names_weighted as (
select first_name,
sum(ratio) over (order by first_name) - ratio as lower_bound,
sum(ratio) over (order by first_name) as upper_bound
from common_first_name
),
last_names_weighted as (
select last_name,
sum(ratio) over (order by last_name) - ratio as lower_bound,
sum(ratio) over (order by last_name) as upper_bound
from common_last_name
),
randoms as (
select random() * (select sum(ratio) from common_first_name) as f_random,
random() * (select sum(ratio) from common_last_name) as l_random
from generate_series(1, 32)
)
select r, first_name, last_name
from randoms r
cross join first_names_weighted f
cross join last_names_weighted l
where f.lower_bound <= r.f_random and r.f_random <= f.upper_bound
and l.lower_bound <= r.l_random and r.l_random <= l.upper_bound;
Change the value passed to generate_series() to control how many names to generate. If it's important that it be a function, you can just use a LANGAUGE SQL function definition to parameterize that number:
https://www.db-fiddle.com/f/mmGQRhCP2W1yfhZTm1yXu5/3

Setting an empty column as a timestamp with time zone when creating a table in Postgres

I am trying to create a table that has an column called decay_date that will have empty values (for the time being) but be formatted as a timestamp with time zone data type. If I simply do this:
CREATE TABLE my_schema.final_summed_table AS
SELECT
'Unstudied' AS table_name,
r.region,
r.state,
r.co_fips,
c.co_name AS county,
'TIER 0' AS tiermetric_lfd,
'UNASSESSED' AS val_combine_lfd,
'Unmapped' AS mod_unmod_lfd,
'' AS det_approx_lfd,
'' AS decay_date_lfd TYPE TIMESTAMP WITH TIME ZONE NULL USING decay_date_lfd::TIMESTAMP
FROM my_schema.unmapped r, cnms.counties c
WHERE r.co_fips = c.co_fips
I get an error: Syntax error near 'TYPE' This table will be UNIONED later with another table that has the same decay_date column with the data type timestamp with time zone. How do I set the timestamp datatype for my decay_date column while creating my table?
'' isn't a valid value for a timestamp to begin with. And you can't use USING like that in a column alias. That's only allowed (and needed) when you ALTER a table and change the type of a column.
Just select a null value and cast that to the desired type:
null::timestamptz AS decay_date_lfd
That is bad syntax. You'd have to cast the literal to the desired type.
So use a NULL value:
CREATE TABLE my_schema.final_summed_table AS
SELECT
CAST ('Unstudied' AS text) AS table_name,
r.region,
r.state,
r.co_fips,
c.co_name AS county,
CAST ('TIER 0' AS text) AS tiermetric_lfd,
CAST ('UNASSESSED' AS text) AS val_combine_lfd,
CAST ('Unmapped' AS text) AS mod_unmod_lfd,
CAST (NULL AS text) AS det_approx_lfd,
CAST (decay_date_lfd AS timestamp with time zone) AS decay_date_lfd
FROM my_schema.unmapped r, cnms.counties c
WHERE r.co_fips = c.co_fips;
I'd avoid CREATE TABLE ... AS and use CREATE TABLE and INSERT INTO ... SELECT ... instead.

Weird now() time difference with Postgres triggers

In a Postgres 10.10 database, I have a table table1 , and an AFTER INSERT trigger on table1 for table2:
CREATE TABLE table1 (
id SERIAL PRIMARY KEY,
-- other cols
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL
);
CREATE UNIQUE INDEX table1_pkey ON table1(id int4_ops);
CREATE TABLE table2 (
id SERIAL PRIMARY KEY,
table1_id integer NOT NULL REFERENCES table1(id) ON UPDATE CASCADE,
-- other cols (not used in query)
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL
);
CREATE UNIQUE INDEX table2_pkey ON table2(id int4_ops);
This query is executed on application start:
CREATE OR REPLACE FUNCTION after_insert_table1()
RETURNS trigger AS
$$
BEGIN
INSERT INTO table2 (table1_id, ..., created_at, updated_at)
VALUES (NEW.id, ..., 'now', 'now');
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
DROP TRIGGER IF EXISTS after_insert_table1 ON "table1";
CREATE TRIGGER after_insert_table1
AFTER INSERT ON "table1"
FOR EACH ROW
EXECUTE PROCEDURE after_insert_table1();
I noticed some created_at and updated_at values on table2 are different to table1. In fact, table2 has mostly older values.
Here are 10 sequential entries, which show the difference jumping around a huge amount within a few minutes:
|table1_id|table1_created |table2_created |diff |
|---------|--------------------------|-----------------------------|----------------|
|2000 |2019-11-07 22:29:47.245+00|2019-11-07 19:51:09.727021+00|-02:38:37.517979|
|2001 |2019-11-07 22:30:02.256+00|2019-11-07 13:18:29.45962+00 |-09:11:32.79638 |
|2002 |2019-11-07 22:30:43.021+00|2019-11-07 13:44:12.099577+00|-08:46:30.921423|
|2003 |2019-11-07 22:31:00.794+00|2019-11-07 19:51:09.727021+00|-02:39:51.066979|
|2004 |2019-11-07 22:31:11.315+00|2019-11-07 13:18:29.45962+00 |-09:12:41.85538 |
|2005 |2019-11-07 22:31:27.234+00|2019-11-07 13:44:12.099577+00|-08:47:15.134423|
|2006 |2019-11-07 22:31:47.436+00|2019-11-07 13:18:29.45962+00 |-09:13:17.97638 |
|2007 |2019-11-07 22:33:19.484+00|2019-11-07 17:22:48.129063+00|-05:10:31.354937|
|2008 |2019-11-07 22:33:51.607+00|2019-11-07 19:51:09.727021+00|-02:42:41.879979|
|2009 |2019-11-07 22:34:28.786+00|2019-11-07 13:18:29.45962+00 |-09:15:59.32638 |
|2010 |2019-11-07 22:36:50.242+00|2019-11-07 13:18:29.45962+00 |-09:18:20.78238 |
Sequential entries have similar differences (mostly negative/mostly positive), and similar orders of magnitude (mostly minutes vs mostly hours) within the sequence, though there are exceptions
Here are the top 5 largest positive differences:
|table1_id|table1_created |table2_created |diff |
|---------|--------------------------|-----------------------------|----------------|
|1630 |2019-10-25 21:12:14.971+00|2019-10-26 00:52:09.376+00 |03:39:54.405 |
|950 |2019-09-16 12:36:07.185+00|2019-09-16 14:07:35.504+00 |01:31:28.319 |
|1677 |2019-10-26 22:19:12.087+00|2019-10-26 23:38:34.102+00 |01:19:22.015 |
|58 |2018-12-08 20:11:20.306+00|2018-12-08 21:06:42.246+00 |00:55:21.94 |
|171 |2018-12-17 22:24:57.691+00|2018-12-17 23:16:05.992+00 |00:51:08.301 |
Here are the top 5 largest negative differences:
|table1_id|table1_created |table2_created |diff |
|---------|--------------------------|-----------------------------|----------------|
|1427 |2019-10-15 16:03:43.641+00|2019-10-14 17:59:41.57749+00 |-22:04:02.06351 |
|1426 |2019-10-15 13:26:07.314+00|2019-10-14 18:00:50.930513+00|-19:25:16.383487|
|1424 |2019-10-15 13:13:44.092+00|2019-10-14 18:00:50.930513+00|-19:12:53.161487|
|4416 |2020-01-11 00:15:03.751+00|2020-01-10 08:43:19.668399+00|-15:31:44.082601|
|4420 |2020-01-11 01:58:32.541+00|2020-01-10 11:04:19.288023+00|-14:54:13.252977|
Negative differences outnumber positive differences 10x. The database timezone is UTC.
table2.table1_id is a foreign key, so it should be impossible to insert before insert on table1 completes.
table1.created_at is set by Sequelize, using option timestamps: true on the model.
When a row is inserted into table1, it's done inside a transaction. From the documentation I can find, triggers are executed inside the same transaction, so I can't think of a reason for this.
I can fix the issue by changing my trigger to use NEW.created_at instead of 'now', but I'm curious if anyone has any idea what the cause of this bug is?
Here is the query used to produce the above difference tables:
SELECT
table1.id AS table1_id,
table1.created_at AS table1_created,
table2.created_at AS table2_created,
(table2.created_at - table1.created_at) AS diff
FROM table1
INNER JOIN table2 ON
table2.table1_id = table1.id AND (
(table2.created_at - table1.created_at) > '2 min' OR
(table1.created_at - table2.created_at) > '2 min')
ORDER BY diff;
While 'now' is not a plain string, it is also not a function in this context, but a special date/time input. The manual:
... simply notational shorthands that will be converted to ordinary date/time values when read. (In particular, now and related strings are converted to a specific time value as soon as they are read.)
The body of a PL/pgSQL function is stored as string, each nested SQL command is parsed and prepared when control reaches it the first time per session. The manual:
The PL/pgSQL interpreter parses the function's source text and
produces an internal binary instruction tree the first time the
function is called (within each session). The instruction tree fully
translates the PL/pgSQL statement structure, but individual SQL
expressions and SQL commands used in the function are not translated
immediately.
As each expression and SQL command is first executed in the function,
the PL/pgSQL interpreter parses and analyzes the command to create a
prepared statement, using the SPI manager's SPI_prepare function.
Subsequent visits to that expression or command reuse the prepared statement.
There is more. Read on. But that's enough for our case:
The first time the trigger is executed per session, 'now' is translated to the current timestamp (the transaction timestamp). While doing more inserts in that same transaction, there won't be any difference to transaction_timestamp() because that is stable within a transaction by design.
But every subsequent transaction in the same session will insert the same, constant timestamp in table2, while values for table1 may be anything (not sure what Sequelize does there). If new values in table1 are the then current timestamp, that results in a "negative" diff in your test. (Timestamps in table2 will be older.)
Solution
Situations where you actually want 'now' are few and far between. Typically, you want the function now() (without single quotes!) - which is equivalent to CURRENT_TIMESTAMP (standard SQL) and transaction_timestamp(). Related (recommended reading!):
Difference between now() and current_timestamp
In your particular case I suggest column defaults instead of doing additional work in triggers. If you set the same default now() in table1 and table2, you also eliminate any nonsense the INSERT to table1 might add. And you never have to even mention these columns in inserts any more:
CREATE TABLE table1 (
id SERIAL PRIMARY KEY,
-- other cols
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now() -- or leave this one NULL?
);
CREATE TABLE table2 (
id SERIAL PRIMARY KEY,
table1_id integer NOT NULL REFERENCES table1(id) ON UPDATE CASCADE,
-- other cols (not used in query)
created_at timestamptz NOT NULL DEFAULT now(), -- not 'now'!
updated_at timestamptz NOT NULL DEFAULT now() -- or leave this one NULL?
);
CREATE OR REPLACE FUNCTION after_insert_table1()
RETURNS trigger LANGUAGE plpgsql AS
$$
BEGIN
INSERT INTO table2 (table1_id) -- more columns? but not: created_at, updated_at
VALUES (NEW.id); -- more columns?
RETURN NULL; -- can be NULL for AFTER trigger
END
$$;

Set the value of a column to its default value

I have few existing tables in which I have to modify various columns to have a default value.
How can I apply the default value to old records which are NULL, so that the old records will be consistent with the new ones
ALTER TABLE "mytable" ALTER COLUMN "my_column" SET DEFAULT NOW();
After modifying table looks something like this ...
Table "public.mytable"
Column | Type | Modifiers
-------------+-----------------------------+-----------------------------------------------
id | integer | not null default nextval('mytable_id_seq'::regclass)
....
my_column | timestamp(0) with time zone | default now()
Indexes:
"mytable_pkey" PRIMARY KEY, btree (id)
Is there a simple to way to have all columns which are currently null and also which have a default value to be set to the default value ?
Deriving from insert into:
For clarity, you can also request default values explicitly, for individual columns or for the entire row:
INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', DEFAULT);
INSERT INTO products DEFAULT VALUES;
I just tried this, and it is as simple as
update mytable
set my_column = default
where my_column is null
See sqlfiddle
Edit: olaf answer is easiest and correct way of doing this however the below also is viable solution for most cases.
For a each column it is easy to use the information_schema and get the default value of a column and then use that in a UPDATE statement
UPDATE mytable set my_column = (
SELECT column_default
FROM information_schema.columns
WHERE (table_schema, table_name, column_name) = ('public', 'mytable','my_column')
)::timestamp
WHERE my_column IS NULL;
Note the sub-query must by typecast to the corresponding column data type .
Also this statement will not evaluate expressions as column_default will be of type character varying it will work for NOW() but not for expressions like say (NOW()+ interval ' 7 days')
It is better to get expression and validate it then apply it manually

Extracting the number of days from a calculated interval

I am trying to get a query like the following one to work:
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
In the referenced table, to_date and from_date are of type timestamp without time zone. A regular query like
SELECT to_date - from_date FROM histories;
Gives me interval results such as '65 days 04:58:09.99'. But using this expression inside the first query gives me an error: invalid input syntax for type interval. I've tried various quotations and even nesting the query without luck. Can this be done?
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
This makes no sense. INTERVAL xxx is syntax for interval literals. So INTERVAL from_date is a syntax error, since from_date isn't a literal. If your code really looks more like INTERVAL '2012-02-01' then that's going to fail, because 2012-02-01 is not valid syntax for an INTERVAL.
The INTERVAL keyword here is just noise. I suspect you misunderstood an example from the documentation. Remove it and the expression will be fine.
I'm guessing you're trying to get the number of days between two dates represented as timestamp or timestamptz.
If so, either cast both to date:
SELECT to_date::date - from_date::date FROM histories;
or get the interval, then extract the day component:
SELECT extract(day from to_date - from_date) FROM histories;
This example demontrates the creation of a table with trigger which updates the difference between a stop_time and start_time in DDD HH24:MI:SS format where the DDD stands for the amount of dates ...
DROP TABLE IF EXISTS benchmarks ;
SELECT 'create the "benchmarks" table'
;
CREATE TABLE benchmarks (
guid UUID NOT NULL DEFAULT gen_random_uuid()
, id bigint UNIQUE NOT NULL DEFAULT cast (to_char(current_timestamp, 'YYMMDDHH12MISS') as bigint)
, git_hash char (8) NULL DEFAULT 'hash...'
, start_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, stop_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, diff_time varchar (20) NOT NULL DEFAULT 'HH:MI:SS'
, update_time timestamp DEFAULT DATE_TRUNC('second', NOW())
, CONSTRAINT pk_benchmarks_guid PRIMARY KEY (guid)
) WITH (
OIDS=FALSE
);
create unique index idx_uniq_benchmarks_id on benchmarks (id);
-- START trigger trg_benchmarks_upsrt_diff_time
-- hrt = human readable time
CREATE OR REPLACE FUNCTION fnc_benchmarks_upsrt_diff_time()
RETURNS TRIGGER
AS $$
BEGIN
-- NEW.diff_time = age(NEW.stop_time::timestamp-NEW.start_time::timestamp);
NEW.diff_time = to_char(NEW.stop_time-NEW.start_time, 'DDD HH24:MI:SS');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_benchmarks_upsrt_diff_time
BEFORE INSERT OR UPDATE ON benchmarks
FOR EACH ROW EXECUTE PROCEDURE fnc_benchmarks_upsrt_diff_time();
--
-- STOP trigger trg_benchmarks_upsrt_diff_time
Just remove the keyword INTERVAL:
SELECT EXTRACT(DAY FROM to_date - from_date) FROM histories;