I have a table in a PG 14 database having a column containing Infinity values in numeric[] arrays as follow:
SELECT id, factors_list FROM mytable ORDER BY id ASC LIMIT 2;
id | factors_list
---+-------------
1 | {Infinity,1,2.91825,2.2911174796669,1.58367915763394,1.96345397169765,1.41599564744287}
2 | {Infinity,1,1.0625,2.114,4.25,2.18021276595745}
The data type of this column is ARRAY (numeric[]) and the length of the array is variable (with some records being NULL):
SELECT column_name, data_type FROM information_schema.columns WHERE
table_name = 'mytable' AND column_name = 'factors_list';
column_name | data_type
----------------+-----------
factors_list | ARRAY
In order to restore this database table into an older (<14) PG database, I need to replace all Infinity values by any valid number, let's say 99999999.
How could I achieve that in an efficient way? (I have roughly 200'000 rows)
This simple update statement should do the job:
UPDATE mytable
SET factors_list = array_replace(factors_list, 'Infinity', 99999999)
WHERE TRUE;
Related
I am wondering if it is possible to extract the column name, data type, and one sample value from a database table with a single PostgreSQL query. I'm aiming to do this for all columns of one table.
Feels like a variable is needed for the column name so you can use it when querying the table for the sample value but Postgres doesn't support this in plain SQL statement (How to declare a variable in a PostgreSQL query).
I can achieve this through hard-coding a single value but any suggestions on whether this is possible to do for each column of the table (join each one using its name in a select statement to obtain the single sample value)?
column | data_type | sample_val
--------------------------------
foo_col1 | text | NULL
foo_col2 | text | 'foo_val2'
foo_col3 | text | NULL
select column_name as column, data_type, sample_val
from information_schema.columns t1
join pg_class t2 on (t1.table_name = t2.relname)
left outer join pg_description t3 on (t2.oid = t3.objoid and t3.objsubid = t1.ordinal_position)
left outer join (select CAST('foo_col2' AS text) as foo_col2, foo_col2 as sample_val from foo_schema.foo_table limit 1) n2
on (n2.foo_col2 = column_name)
where table_schema = 'foo_schema'
and table_name = 'foo_table'
order by ordinal_position
You can use a variation of row count for all tables for this:
select c.table_schema, c.table_name,
c.column_name,
c.data_type,
(xpath('/table/row/'||column_name||'/text()',
query_to_xml(format('select %I
from %I.%I limit 1', c.column_name, c.table_schema, c.table_name), true, false, '')))[1]::text as sample_value
from information_schema.columns c
where table_schema = 'foo_schema';
query_to_xml() will run the query and format the result as XML. The xpath() function then extracts that column value from the XML.
This is quite expensive as the query is run once for each column, not once for each table. Note that the sample values might not be from the same row.
You can optimize this by just running one query per table and then joining that result back to the columns:
with samples as (
select table_schema,
table_name,
query_to_xml(format('select * from %I.%I limit 1', table_schema, table_name), true, false, '') as sample_row
from information_schema.tables
where table_schema = 'foo_schema'
)
select c.table_schema, c.table_name,
c.column_name,
c.data_type,
(xpath('/table/row/'||column_name||'/text()', s.sample_row))[1]::text as sample_value
from information_schema.columns c
join samples s on (s.table_schema, s.table_name) = (c.table_schema, c.table_name);
With the above, all sample values are from the same row.
Assuming I have a parent table with child partitions that are created based on the value of a field.
If the value of that field changes, is there a way to have Postgres automatically move the row into the appropriate partition?
For example:
create table my_table(name text)
partition by list (left(name, 1));
create table my_table_a
partition of my_table
for values in ('a');
create table my_table_b
partition of my_table
for values in ('b');
In this case, if I change the value of name in a row from aaa to bbb, how can I get it to automatically move that row into my_table_b.
When I tried to do that, (i.e. update my_table set name = 'bbb' where name = 'aaa';), I get the following error:
ERROR: new row for relation "my_table_a" violates partition constraint
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63
it doesn't handle updates that cross partition boundaries.
thus you need to create one yourself... here's your set:
t=# insert into my_table select 'abc';
INSERT 0 1
t=# insert into my_table select 'bcd';
INSERT 0 1
t=# select tableoid::regclass,* from my_table;
tableoid | name
------------+------
my_table_a | abc
my_table_b | bcd
(2 rows)
here's rule and fn():
t=# create or replace function puf(_j json,_o text) returns void as $$
begin
raise info '%',': '||left(_j->>'name',1);
execute format('insert into %I select * from json_populate_record(null::my_table, %L)','my_table_'||left(_j->>'name',1), _j);
execute format('delete from %I where name = %L','my_table_'||left(_o,1), _o);
end;
$$language plpgsql;
CREATE FUNCTION
t=# create rule psr AS ON update to my_table do instead select puf(row_to_json(n),OLD.name) from (select NEW.*) n;
CREATE RULE
here's update:
t=# update my_table set name = 'bbb' where name = 'abc';
INFO: : b
puf
-----
(1 row)
UPDATE 0
checking result:
t=# select tableoid::regclass,* from my_table;
tableoid | name
------------+------
my_table_b | bcd
my_table_b | bbb
(2 rows)
once again:
t=# update my_table set name = 'a1' where name = 'bcd';
INFO: : a
puf
-----
(1 row)
UPDATE 0
t=# select tableoid::regclass,* from my_table;
tableoid | name
------------+------
my_table_a | a1
my_table_b | bbb
(2 rows)
Of course using json to pass NEW record looks ugly. And it is ugly indeed. But I did not have time to study the new PARTITION feature of 10, so don't know the elegant way to do this task. Hopefully I could give the generic idea of how you can possible solve the problem and you will produce a better neat code.
update
its probablygood idea to limit such rule to ON update to my_table where left(NEW.name,1) <> left(OLD.name,1) do instead, to release the heavy manipulations need
I would like to create a function that returns a column of a table as an integer array.
In other words, how can I transfrom the result of SELECT id FROM mytable to integer[] ?
You can do:
SELECT ARRAY(SELECT id FROM mytable)
Or:
SELECT array_agg(id) FROM mytable
In Postgres 8.4 or higher, what is the most efficient way to get a row of data populated by defaults without actually creating the row. Eg, as a transaction (pseudocode):
create table "mytable"
(
id serial PRIMARY KEY NOT NULL,
parent_id integer NOT NULL DEFAULT 1,
random_id integer NOT NULL DEFAULT random(),
)
begin transaction
fake_row = insert into mytable (id) values (0) returning *;
delete from mytable where id=0;
return fake_row;
end transaction
Basically I'd expect a query with a single row where parent_id is 1 and random_id is a random number (or other function return value) but I don't want this record to persist in the table or impact on the primary key sequence serial_id_seq.
My options seem to be using a transaction like above or creating views which are copies of the table with the fake row added but I don't know all the pros and cons of each or whether a better way exists.
I'm looking for an answer that assumes no prior knowledge of the datatypes or default values of any column except id or the number or ordering of the columns. Only the table name will be known and that a record with id 0 should not exist in the table.
In the past I created the fake record 0 as a permanent record but I've come to consider this record a type of pollution (since I typically have to filter it out of future queries).
You can copy the table definition and defaults to the temp table with:
CREATE TEMP TABLE table_name_rt (LIKE table_name INCLUDING DEFAULTS);
And use this temp table to generate dummy rows. Such table will be dropped at the end of the session (or transaction) and will only be visible to current session.
You can query the catalog and build a dynamic query
Say we have this table:
create table test10(
id serial primary key,
first_name varchar( 100 ),
last_name varchar( 100 ) default 'Tom',
age int not null default 38,
salary float default 100.22
);
When you run following query:
SELECT string_agg( txt, ' ' order by id )
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, -9999 || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
you will get this sting as a result:
"SELECT -9999 as id , null::character varying as first_name ,
'Tom'::character varying as last_name , 38 as age , 100.22 as salary"
then execute this query and you will get the "phantom row".
We can build a function that build and excecutes the query and return our row as a result:
CREATE OR REPLACE FUNCTION get_phantom_rec (p_i test10.id%type )
returns test10 as $$
DECLARE
v_sql text;
myrow test10%rowtype;
begin
SELECT string_agg( txt, ' ' order by id )
INTO v_sql
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, p_i || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
EXECUTE v_sql INTO myrow;
RETURN myrow;
END$$ LANGUAGE plpgsql ;
and then this simple query gives you what you want:
select * from get_phantom_rec ( -9999 );
id | first_name | last_name | age | salary
-------+------------+-----------+-----+--------
-9999 | | Tom | 38 | 100.22
I would just select the fake values as literals:
select 1 id, 1 parent_id, 1 user_id
The returned row will be (virtually) indistinguishable from a real row.
To get the values from the catalog:
select
0 as id, -- special case for serial type, just return 0
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'parent_id') as parent_id,
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'user_id') as user_id;
Note that you must know what the columns are and their type, but this is reasonable. If you change the table schema (except default value), you would need to tweak the query.
See the above as a SQLFiddle.
Hello what is the easiest way to duplicate a DB record over the same table?
My problem is that the table where I am doing this has many column, like 100+, and I don't like how the solution looks like. Here is what I do (this is inside plpqsql function):
...
1. duplicate record
INSERT INTO history
(SELECT NEXTVAL('history_id_seq'), col_1, col_2, ... , col_100)
FROM history
WHERE history_id = 1234
ORDER BY datetime DESC
LIMIT 1)
RETURNING
history_id INTO new_history_id;
2. update some columns
UPDATE history
SET
col_5 = 'test_5',
col_23 = 'test_23',
datetime = CURRENT_TIMESTAMP
WHERE history_id = new_history_id;
Here are the problems I am attempting to solve
Listing all these 100+ columns looks lame
When new column is added eventually the function should be updated too
On separate DB instances the column order might differ, which would cause the function fail
I am not sure if I can list them once more (solving issue 3) like insert into <table> (<columns_list>) values (<query>) but then the query looks even uglier.
I would like to achieve something like 'insert into ', but this seems impossible the unique primary key constraint will raise a duplication error.
Any suggestions?
Thanks in advance for you time.
This isn't pretty or particularly optimized but there are a couple of ways to go about this. Ideally, you might want to do this all in an UPDATE trigger though you could implement a duplication function something like this:
-- create source table
CREATE TABLE history (history_id serial not null primary key, col_2 int, col_3 int, col_4 int, datetime timestamptz default now());
-- add some data
INSERT INTO history (col_2, col_3, col_4)
SELECT g, g * 10, g * 100 FROM generate_series(1, 100) AS g;
-- function to duplicate record
CREATE OR REPLACE FUNCTION fn_history_duplicate(p_history_id integer) RETURNS SETOF history AS
$BODY$
DECLARE
cols text;
insert_statement text;
BEGIN
-- build list of columns
SELECT array_to_string(array_agg(column_name::name), ',') INTO cols
FROM information_schema.columns
WHERE (table_schema, table_name) = ('public', 'history')
AND column_name <> 'history_id';
-- build insert statement
insert_statement := 'INSERT INTO history (' || cols || ') SELECT ' || cols || ' FROM history WHERE history_id = $1 RETURNING *';
-- execute statement
RETURN QUERY EXECUTE insert_statement USING p_history_id;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql';
-- test
SELECT * FROM fn_history_duplicate(1);
history_id | col_2 | col_3 | col_4 | datetime
------------+-------+-------+-------+-------------------------------
101 | 1 | 10 | 100 | 2013-04-15 14:56:11.131507+00
(1 row)
As I noted in my original comment, you might also take a look at the colnames extension as an alternative to querying the information schema.
You don't need the update anyway, you can supply the constant values directly in the SELECT statement:
INSERT INTO history
SELECT NEXTVAL('history_id_seq'),
col_1,
col_2,
col_3,
col_4,
'test_5',
...
'test_23',
...,
col_100
FROM history
WHERE history_sid = 1234
ORDER BY datetime DESC
LIMIT 1
RETURNING history_sid INTO new_history_sid;