expand a varchar column very slowly , why? - postgresql

Hi
We need to modify a column of a big product table , usually normall ddl statments will be
excutely fast ,but the above ddl statmens takes about 10 minnutes。I wonder know the reason!
I just want to expand a varchar column。The following is the detailsl
--table size
wapreader_log=> select pg_size_pretty(pg_relation_size('log_foot_mark'));
pg_size_pretty
----------------
5441 MB
(1 row)
--table ddl
wapreader_log=> \d log_foot_mark
Table "wapreader_log.log_foot_mark"
Column | Type | Modifiers
-------------+-----------------------------+-----------
id | integer | not null
create_time | timestamp without time zone |
sky_id | integer |
url | character varying(1000) |
refer_url | character varying(1000) |
source | character varying(64) |
users | character varying(64) |
userm | character varying(64) |
usert | character varying(64) |
ip | character varying(32) |
module | character varying(64) |
resource_id | character varying(100) |
user_agent | character varying(128) |
Indexes:
"pk_log_footmark" PRIMARY KEY, btree (id)
--alter column
wapreader_log=> \timing
Timing is on.
wapreader_log=> ALTER TABLE wapreader_log.log_foot_mark ALTER column user_agent TYPE character varying(256);
ALTER TABLE
Time: 603504.835 ms

ALTER ... TYPE requires a complete table rewrite, that's why it might take some time to complete on large tables. If you don't need a length constraint, than don't use the constraint. Drop these constraints once and and for all, and you will never run into new problems because of obsolete constraints. Just use TEXT or VARCHAR.

When you alter a table, PostgreSQL has to make sure the old version doesn't go away in some cases, to allow rolling back the change if the server crashes before it's committed and/or written to disk. For those reasons, what it actually does here even on what seems to be a trivial change is write out a whole new copy of the table somewhere else first. When that's finished, it then swaps over to the new one. Note that when this happens, you'll need enough disk space to hold both copies as well.
There are some types of DDL changes that can be made without making a second copy of the table, but this is not one of them. For example, you can add a new column that defaults to NULL quickly. But adding a new column with a non-NULL default requires making a new copy instead.

One way to avoid a table rewrite is to use SQL domains (see CREATE DOMAIN) instead of varchars in your table. You can then add and remove constraints on a domain.
Note that this does not work instantly either, since all tables using the domain are checked for constraint validity, but it is less expensive than full table rewrite and it doesn't need the extra disk space.

Not sure if this is any faster, but it may be you will have to test it out.
Try this until PostgreSQL can handle the type of alter you want without re-writing the entire stinking table.
ALTER TABLE log_foot_mark RENAME refer_url TO refer_url_old;
ALTER TABLE log_foot_mark ADD COLUMN refer_url character varying(256);
Then using the indexed primary key or unique key of the table do a looping transaction. I think you will have to do this via Perl or some language that you can do a commit every loop iteration.
WHILE (end < MAX_RECORDS)LOOP
BEGIN TRANSACTION;
UPDATE log_foot_mark
SET refer_url = refer_url_old
WHERE id >= start AND id <= end;
COMMIT TRANSACTION;
END LOOP;
ALTER TABLE log_foot_mark DROP COLUMN refer_url_old;
Keep in mind that loop logic will need to be in something other than PL\PGSQL to get it to commit every loop iteration. Test it with no loop at all and looping with a transaction size of 10k 20k 30k etc until you find the sweet spot.

Related

Declaring system trigger names in PG 12 or PG 13

I just noticed that constraints, such as FOREIGN KEY, automatically generate system triggers, and names them RI_ConstraintTrigger_a or _c + trigger oid. I've looked at the docs, but do not see a way to declare names for these triggers in FOREIGN KEY, etc.. I care because I'm writing a bit of check code to compare objects in two different databases. The local names of system triggers vary, since the oids are naturally going to vary.
Is there a way to declare names for these triggers as they're created? If so, is there some harm in doing so? I think that I read that after a restore or upgrade, the trigger and related function names might be regenerated. If so, using RENAME TRIGGER on these items seems like swimming upstream...and I suspect is a Bad Idea.
I suppose that I could locate a trigger's local name by querying pg_trigger on combination of other attributes...but I'm not seeing what makes a trigger unique, apart from it's unique name. All I can think of is to search against pg_get_triggerdef(oid), and compare the outputs.
For those following along at home, here's a "hello world" example that creates a couple of system triggers.
DROP TABLE if exists calendar_child CASCADE;
CREATE TABLE calendar_child
(
id uuid NOT NULL DEFAULT extensions.gen_random_uuid() PRIMARY KEY,
calendar_id uuid NOT NULL DEFAULT NULL
);
ALTER TABLE calendar_child
ADD CONSTRAINT calendar_year_calendar_fk
FOREIGN KEY (calendar_id) REFERENCES calendar(id)
ON DELETE CASCADE;
select oid,tgrelid::regclass,tgname from pg_trigger where tgrelid::regclass::text = 'calendar_child';
+--------+----------------+-------------------------------+
| oid | tgrelid | tgname |
+--------+----------------+-------------------------------+
| 355281 | calendar_child | RI_ConstraintTrigger_c_355281 |
| 355282 | calendar_child | RI_ConstraintTrigger_c_355282 |
+--------+----------------+-------------------------------+
Here's a sample, lightly formatted, of what pg_get_triggerdef returns.
CREATE CONSTRAINT TRIGGER "RI_ConstraintTrigger_a_352380"
AFTER DELETE ON calendar FROM calendar_year
NOT DEFERRABLE INITIALLY
IMMEDIATE FOR EACH ROW EXECUTE FUNCTION "RI_FKey_cascade_del"()
The linked function names aren't named dynamically, they seem to be calls to C routines for standard behaviors, found in https://doxygen.postgresql.org/ri__triggers_8c_source.html.
It is not supported to change the name of these triggers, and there is no support for renaming triggers in general.
There is no point in comparing these trigger names, because they are just implementation details of the foreign key constraint. Constraints can be renamed, and you can get the constraint definition with the pg_get_constraintdef function. That is what you should compare.

convert incoming text timestamp from rsyslog to timestamp for postrgesql

I have logs from various linux servers being fed by rsyslog to a PostgreSQL database. The incoming timestamp is an rsyslog'd RFC3339 formatted time like so: 2020-10-12T12:01:18.162329+02:00.
In the original test setup of the database logging table, I created that timestamp field as 'text'. Most things I need parsed are working right, so I was hoping to convert that timestamp table column from text to a timestamp datatype (and retain the subseconds and timezone if possible).
The end result should be a timestamp datatype so that I can do date-range queries using PostgreSQL data functions.
Is this doable in PostgreSQL 11? Or is it just better to re-create the table with the correct timestamp column datatype to begin with?
Thanks in advance for any pointers, advice, places to look, or snippets of code.
Relevant rsyslog config:
$template CustomFormat,"%timegenerated:::date-rfc3339% %syslogseverity-text:::uppercase% %hostname% %syslogtag% %msg%\n"
$ActionFileDefaultTemplate CustomFormat
...
template(name="rsyslog" type="list" option.sql="on") {
constant(value="INSERT INTO log (timestamp, severity, hostname, syslogtag, message)
values ('")
property(name="timegenerated" dateFormat="rfc3339") constant(value="','")
property(name="syslogseverity-text" caseConversion="upper") constant(value="','")
property(name="hostname") constant(value="','")
property(name="syslogtag") constant(value="','")
property(name="msg") constant(value="')")
}
and the log table structure:
CREATE TABLE public.log
(
id integer NOT NULL DEFAULT nextval('log_id_seq'::regclass),
"timestamp" text COLLATE pg_catalog."default" DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP),
severity character varying(10) COLLATE pg_catalog."default",
hostname character varying(20) COLLATE pg_catalog."default",
syslogtag character varying(24) COLLATE pg_catalog."default",
program character varying(24) COLLATE pg_catalog."default",
process text COLLATE pg_catalog."default",
message text COLLATE pg_catalog."default",
CONSTRAINT log_pkey PRIMARY KEY (id)
)
some sample data already fed into the table (ignore the timestamps in the messsage, they are done with an independent handmade logging system by my predecessor):
You can in theory convert the TEXT column to TIMESTAMP WITH TIME ZONE with ALTER TABLE .. ALTER COLUMN ... SET DATA TYPE ... USING, e.g.:
postgres=# CREATE TABLE tstest (tsval TEXT NOT NULL);
CREATE TABLE
postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1
postgres=# ALTER TABLE tstest
ALTER COLUMN tsval SET DATA TYPE TIMESTAMP WITH TIME ZONE
USING tsval::TIMESTAMPTZ;
ALTER TABLE
postgres=# \d tstest
Table "public.tstest"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
tsval | timestamp with time zone | | not null |
postgres=# SELECT * FROM tstest ;
tsval
-------------------------------
2020-10-12 12:01:18.162329+02
(1 row)
PostgreSQL can parse the RFC3339 format, so subsequent inserts should just work:
postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1
postgres=# SELECT * FROM tstest ;
tsval
-------------------------------
2020-10-12 12:01:18.162329+02
2020-10-12 12:01:18.162329+02
(2 rows)
But note that any bad data in the table (i.e. values which cannot be parsed as timestamps) will cause the ALTER TABLE operation to fail, so you should consider verifying the values before converting the data. Something like SELECT "timestamp"::TIMESTAMPTZ FROM public.log would fail with an error like invalid input syntax for type timestamp with time zone: "somebadvalue".
Also bear in mind this kind of ALTER TABLE requires a table rewrite which may take some time to complete (depending on how large the table is), and which requires a ACCESS EXCLUSIVE lock, rendering the table inaccessible for the duration of the operation.
If you want to avoid a long-running ACCESS EXCLUSIVE lock, you could probably do something like this (not tested):
add a new TIMESTAMPTZ column (adding a column doesn't rewrite the table and is fairly cheap provided you don't use a volatile default value)
creating a trigger to copy any values inserted into the original column
copy the existing values (using a bunch of batched updateds like UPDATE public.foo SET newlog = log::TIMESTAMPTZ
(in a single transaction) drop the trigger and the existing column, and rename the new column to the old one

Can I change type of auto-increment id in postgresql and not broke all around?

I have a postgresql table with auto-increment id which is integer (it was created by python django). So now I reached integer max_value 2.14 billion rows.
And according to this answer (postgresql - integer out of range) I want to change type of id column to biginteger. And my general question is - if change it it will not broke auto-increment? And shouldn't it change my data in this table? Should I create some new sequences after these changes?
This is a postgresql description for this column:
Column| Type | Modifiers | Storage
id | integer | not null default nextval('parsedata_app_ticket_id_seq'::regclass) | plain
No, you just need to change the data type of the column, the sequence will generate biginteger values anyway:
alter table the_table
alter id type bigint

Why PostgreSQL does not like UPPERCASE table names?

I have recently tried to create some tables in PostgreSQL all in uppercase names. However in order to query them I need to put the table name inside the quotation "TABLE_NAME". Is there any way to avoid this and tell the postgres to work with uppercase name as normal ?
UPDATE
this query create a table with lowercase table_name
create table TABLE_NAME
(
id integer,
name varchar(255)
)
However, this query creates a table with uppercase name "TABLE_NAME"
create table "TABLE_NAME"
(
id integer,
name varchar(255)
)
the problem is the quotations are part of the name now!!
in my case I do not create the tables manually, another Application creates the table and the names are in capital letters. this cause problems when I want to use CQL filters via Geoserver.
put table name into double quotes if you want postgres to preserve case for relation names.
Quoting an identifier also makes it case-sensitive, whereas unquoted
names are always folded to lower case. For example, the identifiers
FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo"
and "FOO" are different from these three and each other. (The folding
of unquoted names to lower case in PostgreSQL is incompatible with the
SQL standard, which says that unquoted names should be folded to upper
case. Thus, foo should be equivalent to "FOO" not "foo" according to
the standard. If you want to write portable applications you are
advised to always quote a particular name or never quote it.)
from docs (emphasis mine)
example with quoting:
t=# create table "UC_TNAME" (i int);
CREATE TABLE
t=# \dt+ UC
t=# \dt+ "UC_TNAME"
List of relations
Schema | Name | Type | Owner | Size | Description
--------+----------+-------+----------+---------+-------------
public | UC_TNAME | table | postgres | 0 bytes |
(1 row)
example without quoting:
t=# create table UC_TNAME (i int);
CREATE TABLE
t=# \dt+ UC_TNAME
List of relations
Schema | Name | Type | Owner | Size | Description
--------+----------+-------+----------+---------+-------------
public | uc_tname | table | postgres | 0 bytes |
(1 row)
So if you created table with quotes, you should not skip quotes querying it. But if you skipped quotes creating object, the name was folded to lowercase and so will be with uppercase name in query - this way you "won't notice" it.
The question implies that double quotes, when used to force PostgreSQL to recognize casing for an identifier name, actually become part of the identifier name. That's not correct. What does happen is that if you use double quotes to force casing, then you must always use double quotes to reference that identifier.
Background:
In PostgreSQL, names of identifiers are always folded to lowercase unless you surround the identifier name with double quotes. This can lead to confusion.
Consider what happens if you run these two statements in sequence:
CREATE TABLE my_table (
t_id serial,
some_value text
);
That creates a table named my_table.
Now, try to run this:
CREATE TABLE My_Table (
t_id serial,
some_value text
);
PostgreSQL ignores the uppercasing (because the table name is not surrounded by quotes) and tries to make another table called my_table. When that happens, it throws an error:
ERROR: relation "my_table" already exists
To make a table with uppercase letters, you'd have to run:
CREATE TABLE "My_Table" (
t_id serial,
some_value text
);
Now you have two tables in your database:
Schema | Name | Type | Owner
--------+---------------------------+-------+----------
public | My_Table | table | postgres
public | my_table | table | postgres
The only way to ever access My_Table is to then surround the identifier name with double quotes, as in:
SELECT * FROM "My_Table"
If you leave the identifier unquoted, then PostgreSQL would fold it to lowercase and query my_table.
In simple words, Postgres treats the data in (double-quotes) "" as case-sensitive. And remaining as lowercase.
Example: we can create 2-columns with names DETAILS and details and while querying:
select "DETAILS"
return DETAILS column data and
select details/DETAILS/Details/"details"
returns details column data.

PostgreSQL 9.1.3 Database hangs when the application tries to make insertion in table

PostgreSQL 9.1.3 Database hangs when the application tries to make insertion in EventLogEntry table.
1.Schema of the table is
Column | Type | Modifiers
-------------+-----------------------+-----------
tableindex | integer |
object | character varying(80) |
method | character varying(80) |
bgwuser | character varying(80) |
time | character(23) |
realuser | character varying(80) |
host | character varying(80) |
application | character varying(80) |
Indexes:
"ind1_eventlogentry" UNIQUE, btree (tableindex), tablespace "mmindex"
Tablespace: "mmdata"
2.On the database which we have taken from customer site when we were running the following query , query is hung.
INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex)
VALUES (E'Server', E'Start', E'bgw', E'20140512122404', NULL, NULL, NULL, 539 );
3.But when we used index after 586 like, then the insert is happening correctly on the same database.
INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex )
VALUES (E'Server', E'Start', E'bgw', E'20140512122404', NULL, NULL, NULL, 587 );
4.When we did debugging , then we found that PostgreSQL has stored some other transaction id and it asks for the current transaction to wait till the time previous transaction Either gets committed or rolled back. But we are not sure how and where PostgreSQL has stored the transaction id inside ?
5 It seems that somehow the indexes are corrupted and postgreSQL has stored some transaction id somewhere, and so it is not allowing the current transaction to happen For that particular index range.
6.When the index has been dropped and table has been recreated then everything works fine.
Plz help me with this issue by providing answers to the following queries
why postgreSQL is not allowing the insertion to happen for a particular index range and allowing it after that particular range?
where does the postgreSQL stores the transaction ids and why it is not getting refreshed ?