I have a simple table that we use to record debug logs.
There are only 1000 rows in the table.
There are a handful of columns - id (primary), date (indexed), level, a few other small fields and message which could be a large string.
if I query:
select id from public.log
the query completes very quickly (less than 1 second)
if I query:
select id,date from public.log
or
select * from public.log
it takes 1 minute and 28 seconds to complete!
90 Seconds to read 1000 records from a database!
however if I query:
select *
from public.log
where id in (select id from public.log)
it completes in about 1 second.
And here is the CREATE - I just had pgAdmin generate them for me
-- Table: public.inettklog
-- DROP TABLE public.inettklog;
CREATE TABLE public.inettklog
(
id integer NOT NULL DEFAULT nextval('inettklog_id_seq'::regclass),
date timestamp without time zone NOT NULL,
thread character varying(255) COLLATE pg_catalog."default" NOT NULL,
level character varying(20) COLLATE pg_catalog."default" NOT NULL,
logger character varying(255) COLLATE pg_catalog."default" NOT NULL,
customlevel integer,
source character varying(64) COLLATE pg_catalog."default",
logentry json,
CONSTRAINT inettklog_pkey PRIMARY KEY (id)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.inettklog
OWNER to postgres;
-- Index: inettklog_ix_logdate
-- DROP INDEX public.inettklog_ix_logdate;
CREATE INDEX inettklog_ix_logdate
ON public.inettklog USING btree
(date)
TABLESPACE pg_default;
Your table is extremely bloated. Given your other question, this extreme bloat is not surprising.
You can fix this with a VACUUM FULL.
Going forward, you should avoid getting into this situation in the first place, by deleting records as they become obsolete rather than waiting until 99.998% of them are obsolete before acting.
Related
I have logs from various linux servers being fed by rsyslog to a PostgreSQL database. The incoming timestamp is an rsyslog'd RFC3339 formatted time like so: 2020-10-12T12:01:18.162329+02:00.
In the original test setup of the database logging table, I created that timestamp field as 'text'. Most things I need parsed are working right, so I was hoping to convert that timestamp table column from text to a timestamp datatype (and retain the subseconds and timezone if possible).
The end result should be a timestamp datatype so that I can do date-range queries using PostgreSQL data functions.
Is this doable in PostgreSQL 11? Or is it just better to re-create the table with the correct timestamp column datatype to begin with?
Thanks in advance for any pointers, advice, places to look, or snippets of code.
Relevant rsyslog config:
$template CustomFormat,"%timegenerated:::date-rfc3339% %syslogseverity-text:::uppercase% %hostname% %syslogtag% %msg%\n"
$ActionFileDefaultTemplate CustomFormat
...
template(name="rsyslog" type="list" option.sql="on") {
constant(value="INSERT INTO log (timestamp, severity, hostname, syslogtag, message)
values ('")
property(name="timegenerated" dateFormat="rfc3339") constant(value="','")
property(name="syslogseverity-text" caseConversion="upper") constant(value="','")
property(name="hostname") constant(value="','")
property(name="syslogtag") constant(value="','")
property(name="msg") constant(value="')")
}
and the log table structure:
CREATE TABLE public.log
(
id integer NOT NULL DEFAULT nextval('log_id_seq'::regclass),
"timestamp" text COLLATE pg_catalog."default" DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP),
severity character varying(10) COLLATE pg_catalog."default",
hostname character varying(20) COLLATE pg_catalog."default",
syslogtag character varying(24) COLLATE pg_catalog."default",
program character varying(24) COLLATE pg_catalog."default",
process text COLLATE pg_catalog."default",
message text COLLATE pg_catalog."default",
CONSTRAINT log_pkey PRIMARY KEY (id)
)
some sample data already fed into the table (ignore the timestamps in the messsage, they are done with an independent handmade logging system by my predecessor):
You can in theory convert the TEXT column to TIMESTAMP WITH TIME ZONE with ALTER TABLE .. ALTER COLUMN ... SET DATA TYPE ... USING, e.g.:
postgres=# CREATE TABLE tstest (tsval TEXT NOT NULL);
CREATE TABLE
postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1
postgres=# ALTER TABLE tstest
ALTER COLUMN tsval SET DATA TYPE TIMESTAMP WITH TIME ZONE
USING tsval::TIMESTAMPTZ;
ALTER TABLE
postgres=# \d tstest
Table "public.tstest"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
tsval | timestamp with time zone | | not null |
postgres=# SELECT * FROM tstest ;
tsval
-------------------------------
2020-10-12 12:01:18.162329+02
(1 row)
PostgreSQL can parse the RFC3339 format, so subsequent inserts should just work:
postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1
postgres=# SELECT * FROM tstest ;
tsval
-------------------------------
2020-10-12 12:01:18.162329+02
2020-10-12 12:01:18.162329+02
(2 rows)
But note that any bad data in the table (i.e. values which cannot be parsed as timestamps) will cause the ALTER TABLE operation to fail, so you should consider verifying the values before converting the data. Something like SELECT "timestamp"::TIMESTAMPTZ FROM public.log would fail with an error like invalid input syntax for type timestamp with time zone: "somebadvalue".
Also bear in mind this kind of ALTER TABLE requires a table rewrite which may take some time to complete (depending on how large the table is), and which requires a ACCESS EXCLUSIVE lock, rendering the table inaccessible for the duration of the operation.
If you want to avoid a long-running ACCESS EXCLUSIVE lock, you could probably do something like this (not tested):
add a new TIMESTAMPTZ column (adding a column doesn't rewrite the table and is fairly cheap provided you don't use a volatile default value)
creating a trigger to copy any values inserted into the original column
copy the existing values (using a bunch of batched updateds like UPDATE public.foo SET newlog = log::TIMESTAMPTZ
(in a single transaction) drop the trigger and the existing column, and rename the new column to the old one
I'm just getting into Postgis and I'm running into an interesting problem:
I've added a unique constraint to my table between a varchar name column with a generic geography geom column:
CREATE TABLE public.locations
(
id uuid NOT NULL,
name character varying(255) COLLATE pg_catalog."default",
geom geography,
inserted_at timestamp(0) without time zone NOT NULL,
updated_at timestamp(0) without time zone NOT NULL,
CONSTRAINT locations_pkey PRIMARY KEY (id)
)
I've added in a unique constraint using btree
CREATE INDEX locations_geom_index
ON public.locations USING btree
(geom ASC NULLS LAST)
TABLESPACE pg_default;
-- Index: locations_name_geom_index
-- DROP INDEX public.locations_name_geom_index;
CREATE UNIQUE INDEX locations_name_geom_index
ON public.locations USING btree
(name COLLATE pg_catalog."default" ASC NULLS LAST, geom ASC NULLS LAST)
TABLESPACE pg_default;
It looks like the unique index is not being respected. I read online that I need to use a GIST index (but that won't allow unique values). How can I properly add a unique constraint so I can be sure that something with the same name and GPS location won't be duplicated?
Since I will be storing points, should I change this to be a geography(Point, 4326)?
I have trigger function that automatically creates child tables based on date column from parent table (table 1). Hovewer I have to make modification to do that based on date column from another table (table 2)!
Is this possible at all? I have foreign key in table 1 which is linked with a id column in table 2.
I searched over the internet but mostly found different scripts for task I already solved (date column in parent table, not in another table).
EXAMPLE: Make monthly partitions of table invoice_details based on invoice_date in table invoice (foreign key invoice_details.invoice_id - > invoice.invoice_id)
CREATE TABLE public.invoice_details
(
id integer NOT NULL,
invoice_id integer NOT NULL,
charge_type integer,
charge_amount numeric(15,5),
charge_tax numeric(15,5),
charge_status character varying COLLATE pg_catalog."default")
TABLESPACE pg_default;
CREATE TABLE public.invoice
(
invoice_id integer NOT NULL,
customer character varying COLLATE pg_catalog."default",
invoice_date date NOT NULL)
I created two tables using Django models and the scripts look something like this
I am using PostgreSQL 10
production table:
CREATE TABLE public.foods_food(
id integer NOT NULL DEFAULT nextval('foods_food_id_seq'::regclass),
code character varying(100) COLLATE pg_catalog."default",
product_name character varying(255) COLLATE pg_catalog."default",
brands character varying(255) COLLATE pg_catalog."default",
quantity character varying(255) COLLATE pg_catalog."default",
last_modified_datetime timestamp with time zone NOT NULL,
created_at timestamp with time zone NOT NULL
)
staging table:
CREATE TABLE public.foods_temp(
id integer NOT NULL DEFAULT nextval('foods_temp_id_seq'::regclass),
code character varying(100) COLLATE pg_catalog."default",
product_name character varying(255) COLLATE pg_catalog."default",
)
I copied a CSV file to the staging table and than I tried to copy the columns from the staging table to the production table using the following query.
INSERT INTO foods_food
SELECT * FROM foods_temp;
But I got this error.
ERROR: null value in column "created_at" violates not-null constraint
I can set the created_at column to accept null in order to make it work but I want the created_at values to be auto populated when entries are inserted.
Is there other way to copy columns to the production table and automatically insert the timestamp?
Then you need to set default values:
ALTER TABLE public.foods_food ALTER last_modified_datetime
SET DEFAULT current_timestamp;
ALTER TABLE public.foods_food ALTER created_at
SET DEFAULT current_timestamp;
I'm experiencing a peculiar problem with a Postgres table. When I try to perform a simple INSERT, it returns an error - duplicate key value violates unique constraint.
For starters, here's the schema for the table:
CREATE TABLE app.guardians
(
guardian_id serial NOT NULL,
first_name character varying NOT NULL,
middle_name character varying,
last_name character varying NOT NULL,
id_number character varying NOT NULL,
telephone character varying,
email character varying,
creation_date timestamp without time zone NOT NULL DEFAULT now(),
created_by integer,
active boolean NOT NULL DEFAULT true,
occupation character varying,
address character varying,
marital_status character varying,
modified_date timestamp without time zone,
modified_by integer,
CONSTRAINT "PK_guardian_id" PRIMARY KEY (guardian_id ),
CONSTRAINT "U_id_number" UNIQUE (id_number )
)
WITH (
OIDS=FALSE
);
ALTER TABLE app.guardians
OWNER TO postgres;
The table has 400 rows. Now suppose I try to perform this simple INSERT:
INSERT INTO app.guardians(first_name, last_name, id_number) VALUES('This', 'Fails', '123456');
I get the error:
ERROR: duplicate key value violates unique constraint "PK_guardian_id"
DETAIL: Key (guardian_id)=(2) already exists.
If I try running the same query again, the detail on the error message will be:
DETAIL: Key (guardian_id)=(3) already exists.
And
DETAIL: Key (guardian_id)=(4) already exists.
Incrementally until it gets to a non-existing guardian_id.
What could have gone wrong on this particular table and how is it rectified? I reckon it might have to do with the fact that the table had earlier been dropped using cascade and data re-entered afresh but I'm not sure on this theory.
The reason of this error is that you have incorrect sequence next_val. It happens when you insert field with auto increment manually
So, you have to alter your sequence next_val
alter sequence "PK_guardian_id"
start with (
select max(quardian_id) + 1
from app.guardians
)
Note:
To avoid blocking of concurrent transactions that obtain numbers from the same sequence, ALTER SEQUENCE's effects on the sequence generation parameters are never rolled back; those changes take effect immediately and are not reversible. However, the OWNED BY, OWNER TO, RENAME TO, and SET SCHEMA clauses cause ordinary catalog updates that can be rolled back.
ALTER SEQUENCE will not immediately affect nextval results in backends, other than the current one, that have preallocated (cached) sequence values. They will use up all cached values prior to noticing the changed sequence generation parameters. The current backend will be affected immediately.
Documentation:
https://www.postgresql.org/docs/9.6/static/sql-altersequence.html