I have logs from various linux servers being fed by rsyslog to a PostgreSQL database. The incoming timestamp is an rsyslog'd RFC3339 formatted time like so: 2020-10-12T12:01:18.162329+02:00.
In the original test setup of the database logging table, I created that timestamp field as 'text'. Most things I need parsed are working right, so I was hoping to convert that timestamp table column from text to a timestamp datatype (and retain the subseconds and timezone if possible).
The end result should be a timestamp datatype so that I can do date-range queries using PostgreSQL data functions.
Is this doable in PostgreSQL 11? Or is it just better to re-create the table with the correct timestamp column datatype to begin with?
Thanks in advance for any pointers, advice, places to look, or snippets of code.
Relevant rsyslog config:
$template CustomFormat,"%timegenerated:::date-rfc3339% %syslogseverity-text:::uppercase% %hostname% %syslogtag% %msg%\n"
$ActionFileDefaultTemplate CustomFormat
...
template(name="rsyslog" type="list" option.sql="on") {
constant(value="INSERT INTO log (timestamp, severity, hostname, syslogtag, message)
values ('")
property(name="timegenerated" dateFormat="rfc3339") constant(value="','")
property(name="syslogseverity-text" caseConversion="upper") constant(value="','")
property(name="hostname") constant(value="','")
property(name="syslogtag") constant(value="','")
property(name="msg") constant(value="')")
}
and the log table structure:
CREATE TABLE public.log
(
id integer NOT NULL DEFAULT nextval('log_id_seq'::regclass),
"timestamp" text COLLATE pg_catalog."default" DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP),
severity character varying(10) COLLATE pg_catalog."default",
hostname character varying(20) COLLATE pg_catalog."default",
syslogtag character varying(24) COLLATE pg_catalog."default",
program character varying(24) COLLATE pg_catalog."default",
process text COLLATE pg_catalog."default",
message text COLLATE pg_catalog."default",
CONSTRAINT log_pkey PRIMARY KEY (id)
)
some sample data already fed into the table (ignore the timestamps in the messsage, they are done with an independent handmade logging system by my predecessor):
You can in theory convert the TEXT column to TIMESTAMP WITH TIME ZONE with ALTER TABLE .. ALTER COLUMN ... SET DATA TYPE ... USING, e.g.:
postgres=# CREATE TABLE tstest (tsval TEXT NOT NULL);
CREATE TABLE
postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1
postgres=# ALTER TABLE tstest
ALTER COLUMN tsval SET DATA TYPE TIMESTAMP WITH TIME ZONE
USING tsval::TIMESTAMPTZ;
ALTER TABLE
postgres=# \d tstest
Table "public.tstest"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
tsval | timestamp with time zone | | not null |
postgres=# SELECT * FROM tstest ;
tsval
-------------------------------
2020-10-12 12:01:18.162329+02
(1 row)
PostgreSQL can parse the RFC3339 format, so subsequent inserts should just work:
postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1
postgres=# SELECT * FROM tstest ;
tsval
-------------------------------
2020-10-12 12:01:18.162329+02
2020-10-12 12:01:18.162329+02
(2 rows)
But note that any bad data in the table (i.e. values which cannot be parsed as timestamps) will cause the ALTER TABLE operation to fail, so you should consider verifying the values before converting the data. Something like SELECT "timestamp"::TIMESTAMPTZ FROM public.log would fail with an error like invalid input syntax for type timestamp with time zone: "somebadvalue".
Also bear in mind this kind of ALTER TABLE requires a table rewrite which may take some time to complete (depending on how large the table is), and which requires a ACCESS EXCLUSIVE lock, rendering the table inaccessible for the duration of the operation.
If you want to avoid a long-running ACCESS EXCLUSIVE lock, you could probably do something like this (not tested):
add a new TIMESTAMPTZ column (adding a column doesn't rewrite the table and is fairly cheap provided you don't use a volatile default value)
creating a trigger to copy any values inserted into the original column
copy the existing values (using a bunch of batched updateds like UPDATE public.foo SET newlog = log::TIMESTAMPTZ
(in a single transaction) drop the trigger and the existing column, and rename the new column to the old one
Related
I have a simple table that we use to record debug logs.
There are only 1000 rows in the table.
There are a handful of columns - id (primary), date (indexed), level, a few other small fields and message which could be a large string.
if I query:
select id from public.log
the query completes very quickly (less than 1 second)
if I query:
select id,date from public.log
or
select * from public.log
it takes 1 minute and 28 seconds to complete!
90 Seconds to read 1000 records from a database!
however if I query:
select *
from public.log
where id in (select id from public.log)
it completes in about 1 second.
And here is the CREATE - I just had pgAdmin generate them for me
-- Table: public.inettklog
-- DROP TABLE public.inettklog;
CREATE TABLE public.inettklog
(
id integer NOT NULL DEFAULT nextval('inettklog_id_seq'::regclass),
date timestamp without time zone NOT NULL,
thread character varying(255) COLLATE pg_catalog."default" NOT NULL,
level character varying(20) COLLATE pg_catalog."default" NOT NULL,
logger character varying(255) COLLATE pg_catalog."default" NOT NULL,
customlevel integer,
source character varying(64) COLLATE pg_catalog."default",
logentry json,
CONSTRAINT inettklog_pkey PRIMARY KEY (id)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.inettklog
OWNER to postgres;
-- Index: inettklog_ix_logdate
-- DROP INDEX public.inettklog_ix_logdate;
CREATE INDEX inettklog_ix_logdate
ON public.inettklog USING btree
(date)
TABLESPACE pg_default;
Your table is extremely bloated. Given your other question, this extreme bloat is not surprising.
You can fix this with a VACUUM FULL.
Going forward, you should avoid getting into this situation in the first place, by deleting records as they become obsolete rather than waiting until 99.998% of them are obsolete before acting.
In my postgreSQL DB applications, I sometimes need to retrieve the next value of a sequence BEFORE running an insert.
I used to make this by giving a “usage” privilege on such sequences to my users and using the “nextval” function.
I recently begun to use “GENERATED BY DEFAULT AS IDENTITY” columns as primary keys, I am still able to retrieve nextval as superuser, but I cannot grant such privilege to other users. Where’s my mistake?
Here's an example:
-- <sequence>
CREATE SEQUENCE public.apps_apps_id_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;
ALTER SEQUENCE public.apps_apps_id_seq
OWNER TO postgres;
GRANT USAGE ON SEQUENCE public.apps_apps_id_seq TO udocma;
GRANT ALL ON SEQUENCE public.apps_apps_id_seq TO postgres;
-- </sequence>
-- <table>
CREATE TABLE public.apps
(
apps_id integer NOT NULL DEFAULT nextval('apps_apps_id_seq'::regclass),
apps_born timestamp without time zone NOT NULL DEFAULT now(),
apps_vrsn character varying(50) COLLATE pg_catalog."default",
apps_ipad character varying(200) COLLATE pg_catalog."default",
apps_dscr character varying(500) COLLATE pg_catalog."default",
apps_date timestamp without time zone DEFAULT now(),
CONSTRAINT apps_id_pkey PRIMARY KEY (apps_id)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.apps
OWNER to postgres;
GRANT INSERT, SELECT, UPDATE, DELETE ON TABLE public.apps TO udocma;
-- </table>
The client application is connected as ‘udocma’ and can use the “nextval” function to retrieve the next key of the sequence.
If I use the identity column instead, I still can do this if I log as postgres, but if I log as udocma I don’t have a privilege to execute nextval on the “hidden” sequence that generates values for the identity column.
Thanyou. I realized that the statements
GRANT USAGE ON SEQUENCE public.apps_apps_id_seq TO udocma;
and
select nextval('apps_apps_id_seq'::regclass);
are still working if I define apps.apps_id as identity column instead of serial. So I guess that a field named 'somefield' defined as identity column in a table named 'sometable' should have some 'hidden' underlying sequence named 'sometable_somefiled_seq'. Is it right?
I have a table that is defined like this:
CREATE TABLE session_requests
(
id character varying(255) NOT NULL,
authorization_enc character varying(255),
auto_close integer,
date_created character varying(255) DEFAULT '1970-01-01 01:00:00'::character varying,,
....
)
I'm trying to do
alter table session_requests alter column date_created type timestamp using date_created::timestamp;
the error that I'm getting is
ERROR: default for column "date_created" cannot be cast automatically to type timestamp
Anyone has any suggestions?
Do it in one transaction. You can even do it in a single statement:
ALTER TABLE session_requests
ALTER date_created DROP DEFAULT
,ALTER date_created type timestamp USING date_created::timestamp
,ALTER date_created SET DEFAULT '1970-01-01 01:00:00'::timestamp;
SQL Fiddle.
Aside: character varying(255) is almost always a bad (pointless) choice in Postgres. More:
Refactor foreign key to fields
Hey I have just started working on PostgreSQL, and I am wondering how can we change a column's data type, I tried the following command:
alter table tableName alter column columnName type timestamp with time zone;
However I got the following message:
column "columnName" cannot be cast to type timestamp with time zone
The current column's data type is int, and i would like to change it to timestamp
Postgres doesn't know how to translate int to timestamp. There are several cases and usually they have different starting date.
Create temporary column with timestamp
Update table and copy data from old column to temporary column using your own translation
Drop old column
Rename temporary column.
If you look into documentation, you will find one line syntax with example how to convert unix time integer type:
ALTER [ COLUMN ] column [ SET DATA ] TYPE type [ USING expression ]
Postgres does't allow int type column to change directly into timezone. To achive this, you have to first change column type to varchar and then change it to timezone.
alter table tableName alter column columnName type varchar(64);
alter table tableName alter column columnName type timestamp with time zone;
There is a better way to do this, with the USING clause. Like so:
ALTER TABLE tableName
ALTER columnName type TIMESTAMP WITH TIME ZONE
USING to_timestamp(columnName) AT TIME ZONE 'America/New_York';
I achieved it from timestamp to timestamp with time zone by:
ALTER TABLE tableName ALTER COLUMN columnName SET DATA TYPE timestamp with time zone;
but if it is from timestamp to int or bigint you may need to do this:
ALTER TABLE tableName ALTER COLUMN columnName SET DATA TYPE int8 USING columnName::bigint
Hi
We need to modify a column of a big product table , usually normall ddl statments will be
excutely fast ,but the above ddl statmens takes about 10 minnutes。I wonder know the reason!
I just want to expand a varchar column。The following is the detailsl
--table size
wapreader_log=> select pg_size_pretty(pg_relation_size('log_foot_mark'));
pg_size_pretty
----------------
5441 MB
(1 row)
--table ddl
wapreader_log=> \d log_foot_mark
Table "wapreader_log.log_foot_mark"
Column | Type | Modifiers
-------------+-----------------------------+-----------
id | integer | not null
create_time | timestamp without time zone |
sky_id | integer |
url | character varying(1000) |
refer_url | character varying(1000) |
source | character varying(64) |
users | character varying(64) |
userm | character varying(64) |
usert | character varying(64) |
ip | character varying(32) |
module | character varying(64) |
resource_id | character varying(100) |
user_agent | character varying(128) |
Indexes:
"pk_log_footmark" PRIMARY KEY, btree (id)
--alter column
wapreader_log=> \timing
Timing is on.
wapreader_log=> ALTER TABLE wapreader_log.log_foot_mark ALTER column user_agent TYPE character varying(256);
ALTER TABLE
Time: 603504.835 ms
ALTER ... TYPE requires a complete table rewrite, that's why it might take some time to complete on large tables. If you don't need a length constraint, than don't use the constraint. Drop these constraints once and and for all, and you will never run into new problems because of obsolete constraints. Just use TEXT or VARCHAR.
When you alter a table, PostgreSQL has to make sure the old version doesn't go away in some cases, to allow rolling back the change if the server crashes before it's committed and/or written to disk. For those reasons, what it actually does here even on what seems to be a trivial change is write out a whole new copy of the table somewhere else first. When that's finished, it then swaps over to the new one. Note that when this happens, you'll need enough disk space to hold both copies as well.
There are some types of DDL changes that can be made without making a second copy of the table, but this is not one of them. For example, you can add a new column that defaults to NULL quickly. But adding a new column with a non-NULL default requires making a new copy instead.
One way to avoid a table rewrite is to use SQL domains (see CREATE DOMAIN) instead of varchars in your table. You can then add and remove constraints on a domain.
Note that this does not work instantly either, since all tables using the domain are checked for constraint validity, but it is less expensive than full table rewrite and it doesn't need the extra disk space.
Not sure if this is any faster, but it may be you will have to test it out.
Try this until PostgreSQL can handle the type of alter you want without re-writing the entire stinking table.
ALTER TABLE log_foot_mark RENAME refer_url TO refer_url_old;
ALTER TABLE log_foot_mark ADD COLUMN refer_url character varying(256);
Then using the indexed primary key or unique key of the table do a looping transaction. I think you will have to do this via Perl or some language that you can do a commit every loop iteration.
WHILE (end < MAX_RECORDS)LOOP
BEGIN TRANSACTION;
UPDATE log_foot_mark
SET refer_url = refer_url_old
WHERE id >= start AND id <= end;
COMMIT TRANSACTION;
END LOOP;
ALTER TABLE log_foot_mark DROP COLUMN refer_url_old;
Keep in mind that loop logic will need to be in something other than PL\PGSQL to get it to commit every loop iteration. Test it with no loop at all and looping with a transaction size of 10k 20k 30k etc until you find the sweet spot.