Optimize the PostgreSQL query

Optimize the PostgreSQL query - postgresql

I have a table likes the below.
id session_id start_time answer_time
1 111 2022-12-06 13:40:50 2022-12-06 13:40:55
2 111 2022-12-06 13:40:51 Null
3 111 2022-12-06 13:40:57 Null
4 222 2022-12-06 13:40:58 Null
5 222 2022-12-06 13:41:10 Null
6 222 2022-12-06 13:41:10 Null
7 333 2022-12-06 13:46:10 2022-12-06 13:46:15
8 333 2022-12-06 13:46:18 2022-12-06 13:46:20
There are three sessions in the table, with session ids 111, 222, and 333; Each session has multiple records, but the session_id is the same; and the session is successful or unsuccessfulis depends on answer_time is Null or not of the smallest id record of that session.
The id 1 and id 4 and id 7 records in the above sample table determine whether a session is successful or unsuccessful.
I have the below SQL to query it, and it works well.
WITH t AS
(
SELECT DISTINCT ON (session_id) start_time, answer_time
FROM logs
WHERE ((SELECT NOW() AT TIME ZONE 'UTC') - start_time < interval '24 HOURS')
ORDER BY logs.session_id, id
)
SELECT
COUNT(*) FILTER (WHERE (answer_time IS NOT NULL)) AS sccess_count,
COUNT(*) FILTER (WHERE (answer_time IS NULL)) AS fail_count
FROM t;
But if the DB table have about 50M records, the query taken 20 seconds, this is unacceptable in the production environment, how can I optimize it? My goal is less than 1 second for the 50M records.
Edit, the below is the table SQL file:
/*
Navicat PostgreSQL Data Transfer
Source Server : Test Server
Source Server Type : PostgreSQL
Source Server Version : 140004 (140004)
Source Host : localhost:5832
Source Catalog : pserver1
Source Schema : public
Target Server Type : PostgreSQL
Target Server Version : 140004 (140004)
File Encoding : 65001
Date: 07/12/2022 19:53:07
*/
-- ----------------------------
-- Table structure for logs
-- ----------------------------
DROP TABLE IF EXISTS "public"."logs";
CREATE TABLE "public"."logs" (
"id" int8 NOT NULL,
"company_id" int8 NOT NULL,
"session_id" int8 NOT NULL,
"start_time" timestamp(6) NOT NULL,
"answer_time" timestamp(6)
)
;
-- ----------------------------
-- Indexes structure for table logs
-- ----------------------------
CREATE INDEX "calllog_start_time_idx_copy1" ON "public"."logs" USING btree (
"start_time" "pg_catalog"."timestamp_ops" DESC NULLS FIRST
);
CREATE INDEX "idx_calllog_session_id_copy1" ON "public"."logs" USING btree (
"session_id" "pg_catalog"."int8_ops" ASC NULLS LAST
);
-- ----------------------------
-- Triggers structure for table logs
-- ----------------------------
CREATE TRIGGER "ts_insert_blocker" BEFORE INSERT ON "public"."logs"
FOR EACH ROW
EXECUTE PROCEDURE "_timescaledb_internal"."insert_blocker"();
-- ----------------------------
-- Primary Key structure for table logs
-- ----------------------------
ALTER TABLE "public"."logs" ADD CONSTRAINT "calllog_copy1_pkey" PRIMARY KEY ("id", "start_time");

Related

Is there a way to populate only column which is updated for Audit

We are trying to capture every action happened on Table like insert/update/delete using postgresql and plpgsql. There will be more than 20 column in the audit Table.
Code:
elseif (TG_OP = 'UPDATE') THEN
insert into logs (
new.Emp_name,new.Leave_status,new.Reason,new.Mangr_remark,actions
)
values
(
new.Emp_name,new.Leave_status,new.Reason,new.Mangr_remark,'UPDATE'
)
sample TableA
id Emp_name Leave_status Reason Mangr_remark
123 stack pending ABC null
After approval TableA
id Emp_name Leave_status Reason Mangr_remark
123 stack Approved ABC xyz
Data in audit table
id Emp_name Leave_status Reason Mangr_remark
123 stack pending ABC null
123 null Approved null xyz
Concepts used: Triggers TG_OP,new.,old.

Why Postgres id column serial4 type not increment by 1 even not deleting

In postgres 11.2 I created a table with id serial4 NOT NULL, I thought the id would always increment by 1 but I noticed it's not.
Table Order creation SQL
CREATE TABLE public.order (
"createdAt" timestamptz NOT NULL DEFAULT now(),
"updatedAt" timestamptz NOT NULL DEFAULT now(),
"deletedAt" timestamptz NULL,
id serial4 NOT NULL,
.....
);
For example, when I queried select id from device d where id < 4000000 order by "id" desc limit 10, some ids are not incremented by 1
id
3999794
3999791
3999668
3999660
3999585
3999578
3999543
3999541
3999334
3999023
Although it looks like in some id ranges, the ids are always increment added by 1.
For example:
select id from device d where id < 2000 order by "id" desc limit 10 or
select id from device d where id < 5000000 order by "id" desc limit 10
id
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
id
4999984
4999983
4999982
4999981
4999980
4999979
4999978
4999977
4999976
4999975
are all sequence id rows added by 1.
Since I always use soft delete by the deletedAt column and I've checked my code that it would never delete the rows.
Would postgres 11.2 increment data column serial4 not by 1 but other values in some conditions? How can we prevent it happening?

Thanks #Laurenz Albe for the link to the awesome article GAPS IN SEQUENCES IN POSTGRESQL
.
After testing, in my case, the gaps are caused by column unique constraint, whenever inserting a row of data violate the unique constraint, the serial4 column would go to the next value.
Some scripts below for reproducing my case:
Run postgres11.2 in docker:
#!/bin/bash
docker rm -f pg-test
docker run -d --name pg-test postgres:11.2
sleep 5
docker exec -it pg-test bash -c 'psql -U postgres'
In psql
\e
CREATE TABLE public.order (
"createdAt" timestamptz NOT NULL DEFAULT now(),
"updatedAt" timestamptz NOT NULL DEFAULT now(),
"deletedAt" timestamptz NULL,
id serial4 NOT NULL,
n1 varchar UNIQUE
);
INSERT INTO "order" (n1) VALUES (1);
// return INSERT 0 1
INSERT INTO "order" (n1) VALUES (1);
// return ERROR: duplicate key value violates unique constraint "order_n1_key" DETAIL: Key (n1)=(4) already exists.
INSERT INTO "order" (n1) VALUES (2);
// return INSERT 0 1
select * from "order";
The select result would be:
createdAt | updatedAt | deletedAt | id | n1
-------------------------------+-------------------------------+-----------+----+----
2022-08-26 02:18:03.716084+00 | 2022-08-26 02:18:03.716084+00 | | 1 | 1
2022-08-26 02:18:46.831221+00 | 2022-08-26 02:18:46.831221+00 | | 3 | 2
The id serial4 column has a gap from 1 to 3 because of INSERT that violate the unique contraint.

how to retrieve data from multiple tables (postgresql)

I have 4 different tables that are linked to each other in the following way (I only kept the essential columns in each table to emphasise the relationships between them):
create TABLE public.country (
country_code varchar(2) NOT NULL PRIMARY KEY,
country_name text NOT NULL,
);
create table public.address
(
id integer generated always as identity primary key,
country_code text not null,
CONSTRAINT FK_address_2 FOREIGN KEY (country_code) REFERENCES public.country (country_code)
);
create table public.client_order
(
id integer generated always as identity primary key,
address_id integer null,
CONSTRAINT FK_client_order_1 FOREIGN KEY (address_id) REFERENCES public.address (id)
);
create table public.client_order_line
(
id integer generated always as identity primary key,
client_order_id integer not null,
product_id integer not null,
client_order_status_id integer not null default 0,
quantity integer not null,
CONSTRAINT FK_client_order_line_0 FOREIGN KEY (client_order_id) REFERENCES public.client_order (id)
);
I want to get the data in the following way: for each client order line to show the product_id, quantity and country_name(corresponding to that client order line).
I tried this so far:
SELECT country_name FROM public.country WHERE country_code = (
SELECT country_code FROM public.address WHERE id = (
SELECT address_id FROM public.client_order WHERE id= 5
)
)
to get the country name given a client_order_id from client_order_line table. I don't know how to change this to get all the information mentioned above, from client_order_line table which looks like this:
id client_order_id. product_id. status. quantity
1 1 122 0 1000
2 2 122 0 3000
3 2 125 0 3000
4 3 445 0 2000
Thanks a lot!

You need a few join-s.
select col.client_order_id,
col.product_id,
col.client_order_status_id as status,
col.quantity,
c.country_name
from client_order_line col
left join client_order co on col.client_order_id = co.id
left join address a on co.address_id = a.id
left join country c on a.country_code = c.country_code
order by col.client_order_id;
Alternatively you can use your select query as a scalar subquery expression.

Populating Records From 1 table into 2 tables and retreiving ID to be used

I have records from a SOURCE1 table and I need to move those records into 2 different tables called DESTINATION1 and DESTINATION2
I know how to copy records from the SOURCE1 table into the DESTINATION1 table by using a INSERT INTO SELECT statement, but I run into a problem. What I need is when copying the REMARKS data from SOURCE1, I need to copy that into the DESTINATION2 table, retrieve the REFID and copy that REFID into the respective record in my DESTINATION1 table in the column FK_DESTINATION2_REFID.
The criteria is to copy only the records in the SOURCE1 table with the STATUS of 1 and only copy the respective REMARKS data into the DESTINATION2 table if its not null. Also, is it possible to do this without a Stored Procedure, if not, not a big deal.
CREATE TABLE #Source1 (
RefID int IDENTITY(1,1) NOT NULL,
Status bit NULL,
ProviderID int NULL,
Remarks varchar(max) NULL
)
Create Table #Destination1 (
RefID int IDENTITY(1,1) NOT NULL,
Status bit NULL,
ProviderID int NULL,
FK_Destination2_RefID int
)
Create Table #Destination2 (
RefID int IDENTITY(1,1) NOT NULL,
Remarks varchar(max) NULL
)
-- Insert Records into #Source1
Insert Into #Source1 values (1,100,'Test 555')
Insert Into #Source1 values (0,400,'Test 123')
Insert Into #Source1 values (1,300,NULL)
Insert Into #Source1 values (1,500,'Test 999')
Insert Into #Source1 values (1,200,NULL)
--Drop table #Source1
--Drop table #Destination1
--Drop table #Destination2
Results would look like this:
Source1 Table
RefID Status ProviderID Remarks
----------- ------ ----------- -----------
1 1 100 Test 555
2 0 400 Test 123
3 1 300 NULL
4 1 500 Test 999
5 1 200 NULL
Destination1 Table
RefID Status ProviderID FK_Destination2_RefID
----------- ------ ----------- ---------------------
1 1 100 1
2 1 300 NULL
3 1 500 2
4 1 200 NULL
Destination2 Table
RefID Remarks
------ ---------
1 Test 555
2 Test 999
EDIT: My #SOURCE1 table will be hold a dynamic set amount of records. In this instance I have 5 Records. But next time, it could be 50 records. At each time using the #SOURCE1 table, I will truncate the table each time and the REFID will start back to 1. Since this is a temporary holding table for a batch of records, I need to move them permanently to the 2 Destination tables as indicated when finished so in essence they can look like the #SOURCE1 table originally.

Well, you are using IDENTITY property on the #Destination tables. This means you are trying to assign a new PK to them, and it would thus remove the uniqueness / PK --> FK link to the #Source table... and it's unnecessary since your source table is already handling this. So, just remove this property from m the #Destination tables and do your inserts as you suspect. You can still add a UNIQUE CONSTRAINT on the destination tables if you want... but if this is all it is used for you should never run into non-uniqueness. Your FK will not be sequential, but that's because you are restricting what data to insert into it. If you want another PK IDENTITY column, just keep that separate. I have included that below as an example
CREATE TABLE #Source1 (
RefID int IDENTITY(1,1) NOT NULL,
Status bit NULL,
ProviderID int NULL,
Remarks varchar(max) NULL
)
Create Table #Destination1 (
SomePK int IDENTITY(1,1),
RefID int ,
Status bit NULL,
ProviderID int NULL,
FK_Destination2_RefID int
)
Create Table #Destination2 (
SomePK int IDENTITY(1,1),
RefID int ,
Remarks varchar(max) NULL
)
-- Insert Records into #Source1
Insert Into #Source1 values (1,100,'Test 555')
Insert Into #Source1 values (0,400,'Test 123')
Insert Into #Source1 values (1,300,NULL)
Insert Into #Source1 values (1,500,'Test 999')
Insert Into #Source1 values (1,200,NULL)
insert into #Destination2
select
RefID
,Remarks
from #Source1
where
Remarks is not null and Status = 1
insert into #Destination1
select
s.RefID
,s.Status
,s.ProviderID
,d.RefID
from
#Source1 s
left join #Destination2 d on d.RefID = s.RefID
where
s.Status = 1
select * from #Source1
select * from #Destination1
select * from #Destination2
Drop table #Source1
Drop table #Destination1
Drop table #Destination2

What's the PostgreSQL datatype equivalent to MySQL AUTO INCREMENT?

I'm switching from MySQL to PostgreSQL and I was wondering how can I have an INT column with AUTO INCREMENT. I saw in the PostgreSQL docs a datatype called SERIAL, but I get syntax errors when using it.

Yes, SERIAL is the equivalent function.
CREATE TABLE foo (
id SERIAL,
bar varchar
);
INSERT INTO foo (bar) VALUES ('blah');
INSERT INTO foo (bar) VALUES ('blah');
SELECT * FROM foo;
+----------+
| 1 | blah |
+----------+
| 2 | blah |
+----------+
SERIAL is just a create table time macro around sequences. You can not alter SERIAL onto an existing column.

You can use any other integer data type, such as smallint.
Example :
CREATE SEQUENCE user_id_seq;
CREATE TABLE user (
user_id smallint NOT NULL DEFAULT nextval('user_id_seq')
);
ALTER SEQUENCE user_id_seq OWNED BY user.user_id;
Better to use your own data type, rather than user serial data type.

If you want to add sequence to id in the table which already exist you can use:
CREATE SEQUENCE user_id_seq;
ALTER TABLE user ALTER user_id SET DEFAULT NEXTVAL('user_id_seq');

Starting with Postgres 10, identity columns as defined by the SQL standard are also supported:
create table foo
(
id integer generated always as identity
);
creates an identity column that can't be overridden unless explicitly asked for. The following insert will fail with a column defined as generated always:
insert into foo (id)
values (1);
This can however be overruled:
insert into foo (id) overriding system value
values (1);
When using the option generated by default this is essentially the same behaviour as the existing serial implementation:
create table foo
(
id integer generated by default as identity
);
When a value is supplied manually, the underlying sequence needs to be adjusted manually as well - the same as with a serial column.
An identity column is not a primary key by default (just like a serial column). If it should be one, a primary key constraint needs to be defined manually.

Whilst it looks like sequences are the equivalent to MySQL auto_increment, there are some subtle but important differences:
1. Failed Queries Increment The Sequence/Serial
The serial column gets incremented on failed queries. This leads to fragmentation from failed queries, not just row deletions. For example, run the following queries on your PostgreSQL database:
CREATE TABLE table1 (
uid serial NOT NULL PRIMARY KEY,
col_b integer NOT NULL,
CHECK (col_b>=0)
);
INSERT INTO table1 (col_b) VALUES(1);
INSERT INTO table1 (col_b) VALUES(-1);
INSERT INTO table1 (col_b) VALUES(2);
SELECT * FROM table1;
You should get the following output:
uid | col_b
-----+-------
1 | 1
3 | 2
(2 rows)
Notice how uid goes from 1 to 3 instead of 1 to 2.
This still occurs if you were to manually create your own sequence with:
CREATE SEQUENCE table1_seq;
CREATE TABLE table1 (
col_a smallint NOT NULL DEFAULT nextval('table1_seq'),
col_b integer NOT NULL,
CHECK (col_b>=0)
);
ALTER SEQUENCE table1_seq OWNED BY table1.col_a;
If you wish to test how MySQL is different, run the following on a MySQL database:
CREATE TABLE table1 (
uid int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
col_b int unsigned NOT NULL
);
INSERT INTO table1 (col_b) VALUES(1);
INSERT INTO table1 (col_b) VALUES(-1);
INSERT INTO table1 (col_b) VALUES(2);
You should get the following with no fragementation:
+-----+-------+
| uid | col_b |
+-----+-------+
| 1 | 1 |
| 2 | 2 |
+-----+-------+
2 rows in set (0.00 sec)
2. Manually Setting the Serial Column Value Can Cause Future Queries to Fail.
This was pointed out by #trev in a previous answer.
To simulate this manually set the uid to 4 which will "clash" later.
INSERT INTO table1 (uid, col_b) VALUES(5, 5);
Table data:
uid | col_b
-----+-------
1 | 1
3 | 2
5 | 5
(3 rows)
Run another insert:
INSERT INTO table1 (col_b) VALUES(6);
Table data:
uid | col_b
-----+-------
1 | 1
3 | 2
5 | 5
4 | 6
Now if you run another insert:
INSERT INTO table1 (col_b) VALUES(7);
It will fail with the following error message:
ERROR: duplicate key value violates unique constraint "table1_pkey"
DETAIL: Key (uid)=(5) already exists.
In contrast, MySQL will handle this gracefully as shown below:
INSERT INTO table1 (uid, col_b) VALUES(4, 4);
Now insert another row without setting uid
INSERT INTO table1 (col_b) VALUES(3);
The query doesn't fail, uid just jumps to 5:
+-----+-------+
| uid | col_b |
+-----+-------+
| 1 | 1 |
| 2 | 2 |
| 4 | 4 |
| 5 | 3 |
+-----+-------+
Testing was performed on MySQL 5.6.33, for Linux (x86_64) and PostgreSQL 9.4.9

Sorry, to rehash an old question, but this was the first Stack Overflow question/answer that popped up on Google.
This post (which came up first on Google) talks about using the more updated syntax for PostgreSQL 10:
https://blog.2ndquadrant.com/postgresql-10-identity-columns/
which happens to be:
CREATE TABLE test_new (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
);
Hope that helps :)

You have to be careful not to insert directly into your SERIAL or sequence field, otherwise your write will fail when the sequence reaches the inserted value:
-- Table: "test"
-- DROP TABLE test;
CREATE TABLE test
(
"ID" SERIAL,
"Rank" integer NOT NULL,
"GermanHeadword" "text" [] NOT NULL,
"PartOfSpeech" "text" NOT NULL,
"ExampleSentence" "text" NOT NULL,
"EnglishGloss" "text"[] NOT NULL,
CONSTRAINT "PKey" PRIMARY KEY ("ID", "Rank")
)
WITH (
OIDS=FALSE
);
-- ALTER TABLE test OWNER TO postgres;
INSERT INTO test("Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (1, '{"der", "die", "das", "den", "dem", "des"}', 'art', 'Der Mann küsst die Frau und das Kind schaut zu', '{"the", "of the" }');
INSERT INTO test("ID", "Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (2, 1, '{"der", "die", "das"}', 'pron', 'Das ist mein Fahrrad', '{"that", "those"}');
INSERT INTO test("Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (1, '{"der", "die", "das"}', 'pron', 'Die Frau, die nebenen wohnt, heißt Renate', '{"that", "who"}');
SELECT * from test;

In the context of the asked question and in reply to the comment by #sereja1c, creating SERIAL implicitly creates sequences, so for the above example-
CREATE TABLE foo (id SERIAL,bar varchar);
CREATE TABLE would implicitly create sequence foo_id_seq for serial column foo.id. Hence, SERIAL [4 Bytes] is good for its ease of use unless you need a specific datatype for your id.

Since PostgreSQL 10
CREATE TABLE test_new (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
payload text
);

This way will work for sure, I hope it helps:
CREATE TABLE fruits(
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL
);
INSERT INTO fruits(id,name) VALUES(DEFAULT,'apple');
or
INSERT INTO fruits VALUES(DEFAULT,'apple');
You can check this the details in the next link:
http://www.postgresqltutorial.com/postgresql-serial/

Create Sequence.
CREATE SEQUENCE user_role_id_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 3
CACHE 1;
ALTER TABLE user_role_id_seq
OWNER TO postgres;
and alter table
ALTER TABLE user_roles ALTER COLUMN user_role_id SET DEFAULT nextval('user_role_id_seq'::regclass);