How can I make this delete query being run on postgres faster? - postgresql

There are two schema in the same database - oatarchival and oat
The schemas are completely similar to each other.
Here is the query that I am running, which is taking lot of time
DELETE FROM oat.oat_user_tag_verification
using oatarchival.oat_user_tag_verification outv, oat.fp_archived f
WHERE outv.tag_id = f.tag_id and f.is_archived=false
and oat_user_tag_verification.user_id = outv.user_id and
oat_user_tag_verification.tag_id = outv.tag_id and
oat_user_tag_verification.verification_status = outv.verification_status
and oat_user_tag_verification.created_at=outv.created_at
and oat_user_tag_verification.updated_at=outv.updated_at
Here is the explain verbose out of this query -
"Delete on oat.oat_user_tag_verification (cost=14989031.30..16227081.67 rows=1 width=18)"
" -> Nested Loop (cost=14989031.30..16227081.67 rows=1 width=18)"
" Output: oat_user_tag_verification.ctid, outv.ctid, f.ctid"
" Join Filter: (outv.tag_id = f.tag_id)"
" -> Merge Join (cost=14989031.30..16021422.32 rows=1 width=28)"
" Output: oat_user_tag_verification.ctid, oat_user_tag_verification.tag_id, outv.ctid, outv.tag_id"
" Merge Cond: ((oat_user_tag_verification.tag_id = outv.tag_id) AND (oat_user_tag_verification.user_id = outv.user_id) AND (oat_user_tag_verification.verification_status = outv.verification_status) AND (oat_user_tag_verification.created_at = ou (...)"
" -> Sort (cost=13223314.06..13368102.38 rows=57915328 width=38)"
" Output: oat_user_tag_verification.ctid, oat_user_tag_verification.user_id, oat_user_tag_verification.tag_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
" Sort Key: oat_user_tag_verification.tag_id, oat_user_tag_verification.user_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
" -> Seq Scan on oat.oat_user_tag_verification (cost=0.00..1005001.28 rows=57915328 width=38)"
" Output: oat_user_tag_verification.ctid, oat_user_tag_verification.user_id, oat_user_tag_verification.tag_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
" -> Materialize (cost=1765717.25..1812477.56 rows=9352062 width=38)"
" Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
" -> Sort (cost=1765717.25..1789097.40 rows=9352062 width=38)"
" Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
" Sort Key: outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
" -> Seq Scan on oatarchival.oat_user_tag_verification outv (cost=0.00..171454.62 rows=9352062 width=38)"
" Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
" -> Seq Scan on oat.fp_archived f (cost=0.00..191863.83 rows=1103642 width=14)"
" Output: f.ctid, f.tag_id"
" Filter: (NOT f.is_archived)"
Here is the create table structure of all tables involved:
Table fp_archived:
CREATE TABLE fp_archived
(
tag_id bigint NOT NULL,
detection_url text,
image_id bigint NOT NULL,
pixel_x smallint NOT NULL,
camera_num smallint NOT NULL,
pixel_y smallint NOT NULL,
width smallint NOT NULL,
height smallint NOT NULL,
is_archived boolean DEFAULT false,
id bigint NOT NULL DEFAULT nextval('fp_archived_seq'::regclass),
drive_id character varying(255),
CONSTRAINT fp_archived_pkey PRIMARY KEY (id)
)
Table oat_user_tag_verification:
CREATE TABLE oatarchival.oat_user_tag_verification
(
user_id integer NOT NULL,
tag_id bigint NOT NULL,
verification_status integer NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
updated_at timestamp without time zone DEFAULT now(),
CONSTRAINT oat_user_tag_verification_pkey PRIMARY KEY (user_id, tag_id, verification_status, created_at),
CONSTRAINT oat_user_tag_verification_tag_id_fkey FOREIGN KEY (tag_id)
REFERENCES oatarchival.oat_tags (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT oat_user_tag_verification_user_id_fkey FOREIGN KEY (user_id)
REFERENCES oatarchival.oat_users (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT oat_user_tag_verification_verification_status_fkey FOREIGN KEY (verification_status)
REFERENCES oatarchival.oat_tag_verification_status (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
The delete query runs for hours and hours. How can I optimize it?
What indexes should I be created for this query to become faster?

Based on your EXPLAIN output (unfortunately you didn't run EXPLAIN (ANALYZE)) I'd suggest the following indexes:
CREATE INDEX ON oatarchival.oat_user_tag_verification(
ctid,
tag_id,
user_id,
verification_status,
created_at,
updated_at
);
CREATE INDEX ON oat.oat_user_tag_verification(
tag_id,
user_id,
verification_status,
created_at,
updated_at
);
These can help with the merge join.
Then I'd create the following index:
CREATE INDEX ON oat.fp_archived(tag_id);
This will speed up the nested loop join.
Not sure if that is the best way to run the query, but it's a starting point.

One hint out of bad experience - try to fiddle with work_mem setting for the session. I had similar problem with incredible costs of queries on new PostgreSQL 9.6 and fount that it simply needs higher limit of work_mem.

Related

sqlalchemy seems have no support for insert cte

By given table creation statement and query it's necessary to get old values before update:
CREATE TABLE IF NOT EXISTS products(
id INT GENERATED BY DEFAULT AS IDENTITY NOT NULL PRIMARY KEY,
product_id INT UNIQUE,
image_link CHARACTER VARYING NOT NULL,
additional_image_links CHARACTER VARYING[] NOT NULL
);
WITH temp AS (
INSERT INTO products(product_id, image_link, additional_image_links)
VALUES(1, 'http://www.e1xazm1ple1k113.com',ARRAY['http://www.examkple1113.com','http://www.example2.com'])
ON CONFLICT (product_id) DO UPDATE SET image_link = EXCLUDED.image_link, additional_image_links = EXCLUDED.additional_image_links
WHERE products.image_link != EXCLUDED.image_link OR products.additional_image_links != EXCLUDED.additional_image_links OR products.image_link != EXCLUDED.image_link
RETURNING id, image_link, additional_image_links
)
SELECT image_link, additional_image_links FROM products WHERE id IN (SELECT id FROM temp);
If conflict happens and new values conform criteria result is generated, however I need to use sqlalchemy machinery for it. Approximate but not working example:
def upsert(table, rows, constraint, update_cols):
query = insert(table).values(rows)
return query.on_conflict_do_update(
constraint=constraint,
set_={c: getattr(query.excluded, c) for c in update_cols},
where=getattr(table.c, "additional_image_link") != getattr(query.excluded, "additional_image_link"),
).cte("upsert")
Calling which produces the exception:
sesh = session(autocommit=False, autoflush=False, engine=DEFAULT)
sesh.execute(upsert(*args))
sqlalchemy.exc.ArgumentError: Executable SQL or text() construct expected, got <sqlalchemy.sql.selectable.CTE at 0x1042c3f10; upsert>.

TSQL - Select values with same IS

have a view like this:
Table
The record "NDocumento" is populated only in the first row of a transaction by design. These rows are grouped by the column "NMov" which is the ID.
Since this is a view, I would like to populate each empty "NDocumento" record with the corresponding value contained in the first transaction through a SELECT statement.
As you can see by the picture this is MS-SQL Server 2008, so the lack of LAG makes the game harder.
I would immensely appreciate any help,
thanks
Try this:
SELECT
T1.NDocumento
, T2.NMov
, T2.NRiga
-- , T2. Rest of the fields
FROM NDocumentoTable T1
JOIN NDocumentoTable T2 ON T2.NMov = T1.NMov
WHERE T1.NRiga = 1
I used LAG() over the partition of NMov,Causale by based on your data. You cna change the partition with your requirement. The logic is you get the previous value if the NDocument is empty for the given partition.
CREATE TABLE myTable_1
(
NMov int
,NRiga int
,CodiceAngrafica varchar(100)
,Causale varchar(100)
,DateRegistration date
,DateDocumented date
,NDocument varchar(100)
)
INSERT INTO myTable_1 VALUES (5133, 1, '', 'V05', '01/14/2021', '01/14/2021', 'VI-2100001')
,(5133, 2, '', 'V05', null, null, '')
,(5134, 1, '', 'V05', '01/14/2021', '01/14/2021', 'VI-2100002')
,(5134, 2, '', 'V05', null, null, '')
SELECT
NMov
,NRiga
,CASE WHEN ISNULL(NDocument,'') = ''
THEN LAG(NDocument) OVER (PARTITION BY NMov,Causale ORDER BY NMov)
ELSE NDocument END AS [NDocument]
FROM myTable_1

Postgres - Update Performance degraded

Can someone please help assist in identifying why below statement which used to take 2 hours is not taking 6 hours without volume increase being a factor.
with P as
(SELECT DISTINCT CD.CASE_DETAIL_ID, SVL.SERVICE_LEVEL_ID\n
FROM report_fct CD LEFT JOIN SERVICE_LEVEL SVL ON SVL.ORDER_TYPE_CD = CD.ORDER_TYPE_CD\n
AND SVL.SOURCE_ID = CD.SOURCE_ID\n AND SVL.AREA_ID = CD.HQ_AREA_ID\n AND SVL.CATEGORY_ID = CD.CATEGORY_ID\n AND SVL.STATE_CD = CD.CUST_STATE\n
WHERE CD.LINE_OF_BIZ = 'CLOTH'\n
AND CD.HQ_AREA_ID is NOT NULL\n
AND CD.SOURCE_ID is NOT NULL\n
AND CD.CATEGORY_ID is NOT NULL\n
AND CD.CUST_STATE is NOT NULL)\n
update report_fct rpt\n
set service_level_id = P.service_level_id\n
from P\n
where rpt.case_detail_id = P.case_detail_id;"}
CREATE TABLE report_fct
...
..
case_detail_id bigint NOT NULL,
...
CREATE INDEX report_fct _ix1
ON report_fct USING btree
(case_detail_id)
TABLESPACE pg_default;
CREATE INDEX report_fct _ix2
ON report_fct USING btree
(insert_dt)
TABLESPACE pg_default;
One doubt I have is whether statistics can be skewed on this table which is resulting in degradation.
relname inserts updates deletes live_tuples dead_tupes last autovacuum last autoanalyze
report_fct 262746347 5387849450 0 2473523 3573914 5/19/20 3:38 5/19/20 1:13
EXPLAIN:
"Update on report_fct rpt (cost=24847.47..27881.35 rows=415 width=3772)"
" CTE p"
" -> Unique (cost=24844.02..24847.05 rows=405 width=16)"
" -> Sort (cost=24844.02..24845.03 rows=405 width=16)"
" Sort Key: cd.case_detail_id, svl.service_level_id"
" -> Nested Loop Left Join (cost=0.41..24826.48 rows=405 width=16)"
" -> Seq Scan on report_fct cd (cost=0.00..21915.21 rows=405 width=44)"
" Filter: ((hq_area_id IS NOT NULL) AND (source_id IS NOT NULL) AND (category_id IS NOT NULL) AND (cust_state IS NOT NULL) AND ((line_of_biz)::text = 'CLOTH'::text))"
" -> Index Scan using service_level_unq on service_level svl (cost=0.41..7.18 rows=1 width=45)"
" Index Cond: ((area_id = cd.hq_area_id) AND ((order_type_cd)::text = (cd.order_type_cd)::text) AND (source_id = cd.source_id) AND (state_cd = (cd.cust_state)::bpchar) AND (category_id = cd.category_id))"
" -> Nested Loop (cost=0.41..3034.30 rows=415 width=3772)"
" -> CTE Scan on p (cost=0.00..8.10 rows=405 width=56)"
" -> Index Scan using report_fct_ix1 on report_fct rpt (cost=0.41..7.46 rows=1 width=3724)"
" Index Cond: (case_detail_id = p.case_detail_id)"

COALESCE failing following CTE Deletion. (PostgreSQL)

PostgreSQL 11.1 PgAdmin 4.1
This works some of the time:
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
WITH _in(trx, lastname, firstname, birthdate, old_disp, old_medname, old_sig, old_form, new_disp, new_medname, new_sig, new_form, new_refills) AS (
VALUES ('2001-06-07 00:00:00'::timestamp,
UPPER(TRIM('JONES')), UPPER(TRIM('TOM')), '1952-12-30'::date,
64::integer,
LOWER(TRIM('adipex 37.5mg tab')), LOWER(TRIM('one tab po qd')), LOWER(TRIM('tab')),
63::integer,
LOWER(TRIM('adipex 37.5mg tab')), LOWER(TRIM('one tab po qd')), LOWER(TRIM('tab')),
33::integer
)
),
_s AS ( -- RESOLVE ALL SURROGATE KEYS.
SELECT n.*, d1.recid as old_medication_recid, d2.recid as new_medication_recid, pt.recid as patient_recid
FROM _in n
JOIN medications d1 ON (n.old_medname, n.old_sig, n.old_form) = (d1.medname, d1.sig, d1.form)
JOIN medications d2 ON (n.new_medname, n.new_sig, n.new_form) = (d2.medname, d2.sig, d2.form)
JOIN patients pt ON (pt.lastname, pt.firstname, pt.birthdate) = (n.lastname, n.firstname, n.birthdate)
),
_t AS ( -- REMOVE CONFLICTING RECORD, IF ANY.
DELETE FROM rx r
USING _s n
WHERE (r.trx::date, r.disp, r.patient_recid, r.medication_recid)=(n.trx::date, n.new_disp, n.patient_recid, n.new_medication_recid)
RETURNING r.*
),
_u AS( -- GET NEW SURROGATE KEY.
SELECT COALESCE(_t.recid, r.recid) as target_recid, r.recid as old_recid
FROM _s n
JOIN rx r ON (r.trx, r.disp, r.patient_recid, r.medication_recid) = (n.trx, n.old_disp, n.patient_recid, n.old_medication_recid)
LEFT JOIN _t ON (_t.trx::date, _t.disp, _t.patient_recid, _t.medication_recid) = (n.trx::date, n.new_disp, n.patient_recid, n.new_medication_recid)
)
UPDATE rx r -- UPDATE ORIGINAL RECORD WITH NEW VALUES.
SET disp = n.new_disp, medication_recid = n.new_medication_recid, refills = n.new_refills, recid = _u.target_recid
FROM _s n, _u
WHERE r.recid = _u.old_recid
RETURNING r.*;
COMMIT;
Where table rx is defined as:
CREATE TABLE phoenix.rx
(
recid integer NOT NULL DEFAULT nextval('rx_recid_seq'::regclass),
trx timestamp without time zone NOT NULL,
disp integer NOT NULL,
refills integer,
tprinted timestamp without time zone,
tstop timestamp without time zone,
modified timestamp without time zone DEFAULT now(),
patient_recid integer NOT NULL,
medication_recid integer NOT NULL,
dposted date NOT NULL,
CONSTRAINT pk_rx_recid PRIMARY KEY (recid),
CONSTRAINT rx_unique UNIQUE (dposted, disp, patient_recid, medication_recid)
DEFERRABLE,
CONSTRAINT rx_medication_fk FOREIGN KEY (medication_recid)
REFERENCES phoenix.medications (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
DEFERRABLE,
CONSTRAINT rx_patients FOREIGN KEY (patient_recid)
REFERENCES phoenix.patients (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
)
After many hours, it is found that the "Delete.." of a conflicting record works as expected, but the "COALESCE" STATEMENT seems to fail when deciding on the new surrogate key (primary key) of rx.recid -- it does not seem to receive the result of the delete. (Or maybe the timing is wrong???)
Any help would be most appreciated.
TIA
This is documented:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13, so they cannot “see” one another's effects on the target tables.
Don't use the same table twice in a statement with a CTE if it occurs in a DML statement. Rather, use DELETE ... RETURNING and use the returned values in the other parts of the statement.
If you cannot rewrite the statement like that, use more than one statement instead of putting everything into a single CTE.
#LaurenzAlbe is totally correct in his answer. Below is a working solution to my problem. There are a few things to note:
The unique constraint is formed on a column in rx defined as a date and created by a trigger on update/insert that casts the timestamp of trx to a date: as in trx::date. For reasons I am not clear on, using r.trx::date in place of r.dposted leads to many records being identified and not the one record I want. Not sure why???. So the first fix was to use r.dposted, not r.trx::date.
Although the cte's are designed to be independent of each other, by using "RETURNING..." and incorporating the cte's in a step-wise fashion, one can be built upon another to obtain a final result set.
The working code is:
WITH _in(trx, lastname, firstname, birthdate, old_disp, old_medname, old_sig, old_form, new_disp, new_medname, new_sig, new_form, new_refills) AS (
VALUES ('2001-06-07 00:00:00'::timestamp,
UPPER(TRIM('smith')), UPPER(TRIM('john')), '1957-12-30'::date,
28::integer,
LOWER(TRIM('test')), LOWER(TRIM('i am sig')), LOWER(TRIM('tab')),
28::integer,
LOWER(TRIM('test 1')), LOWER(TRIM('i am sig')), LOWER(TRIM('tab')),
8::integer
)
),
_m AS (
SELECT n.*, d1.recid as old_medication_recid, d2.recid as new_medication_recid, pt.recid as patient_recid
FROM _in n
JOIN patients pt ON (pt.lastname, pt.firstname, pt.birthdate) = (n.lastname, n.firstname, n.birthdate)
JOIN medications d1 ON (n.old_medname, n.old_sig, n.old_form) = (d1.medname, d1.sig, d1.form)
LEFT JOIN medications d2 ON (n.new_medname, n.new_sig, n.new_form) = (d2.medname, d2.sig, d2.form)
),
_t AS ( -- REMOVE CONFLICTING RECORD, IF ANY.
DELETE FROM rx r
USING _m
WHERE (r.dposted, r.disp, r.patient_recid, r.medication_recid) = (_m.trx::date,_m.new_disp, _m.patient_recid, _m.new_medication_recid)
RETURNING r.*
),
_s AS ( -- GET NEW SURROGATE KEY
SELECT _m.*, r1.recid as old_recid, r2.recid as new_recid, COALESCE(r2.recid, r1.recid) as target_recid
FROM _m
JOIN rx r1 ON (r1.dposted, r1.disp, r1.patient_recid, r1.medication_recid) = (_m.trx::date,_m.old_disp, _m.patient_recid, _m.old_medication_recid)
LEFT JOIN rx r2 ON (r2.dposted, r2.disp, r2.patient_recid, r2.medication_recid) = (_m.trx::date,_m.new_disp, _m.patient_recid, _m.new_medication_recid)
LEFT JOIN _t ON (_t.recid = r2.recid)
)
UPDATE rx -- UPDATE ORIGINAL RECORD WITH NEW VALUES.
SET disp = _s.new_disp, medication_recid = _s.new_medication_recid, refills = _s.new_refills, recid = _s.target_recid
FROM _s
WHERE rx.recid = _s.old_recid
RETURNING rx.*;
COMMIT;
Hope this helps somebody.

T-Sql update and avoid conflict

I'm trying to migrate a Tomcat app from using Postgres 9.5 to SQL Server 2016 and I've got a problem statement I can't seem to duplicate.
It's basically an upsert but one of the complications is the request supplies arguments to do the update, but when there is conflict I need to use some of the existing values from conflicting rows to insert/update.
The primary keys in the table can sometimes cause a conflict, which requires updating rows and deleting the old ones.
The table schema in MS SQL looks like:
CREATE TABLE [dbo].[signup](
[site_key] [varchar](32) NOT NULL,
[list_id] [bigint] NOT NULL,
[email_address] [varchar](256) NOT NULL,
[customer_id] [bigint] NULL,
[attribute1] [varchar](64) NULL,
[date1] [datetime] NOT NULL,
[date2] [datetime] NULL,
CONSTRAINT [pk_signup] PRIMARY KEY CLUSTERED
(
[site_key] ASC,
[list_id] ASC,
[email_address] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The old Postgres SQL looked like this:
WITH updated_rows AS (
INSERT INTO signup
(site_key, list_id, email_address, customer_id, attribute1, date1, date2)
SELECT site_key, list_id, :emailAddress, customer_id, attribute1, date1, date2
FROM signup WHERE customer_id = :customerId and email_address <> :emailAddress
ON CONFLICT (site_key, list_id, email_address) DO UPDATE SET customer_id = excluded.customer_id
RETURNING site_key, customer_id, email_address, list_id
)
DELETE FROM signup AS signup_delete USING updated_rows
WHERE
signup_delete.site_key = updated_rows.site_key
AND signup_delete.customer_id = updated_rows.customer_id
AND signup_delete.list_id = updated_rows.list_id
AND signup_delete.email_address <> :emailAddress;
Two arguments are supplied, customer id and email address, shown here as Spring NamedParameterJdbcTemplate values :customerId and :emailAddress
It's trying to change the email address of the customer id to be the supplied one, but sometimes the supplied email address already exists in the primary key constraint.
In which case it needs to change the existing customer id to be supplied one, and remove the rows with that don't match the new email address.
I also need to try and maintain isolation so that nothing can change the data whilst I'm updating.
I'm trying to do it with a MERGE statement but I can't seem to get it to work, it's complaining I cant use values that aren't in the clause scope, but I think I've probably got other issues here too.
This is what I had so far. It doesn't even address the deleting part - only the upserting, but I can't even get this part to work. I was planning to use the OUTPUT from this as input to something to delete the rows similar to the postgres version.
WITH source AS (
SELECT cs.[site_key] as existing_site_key,
cs.list_id as existing_list_id,
cs.email_address as existing_email,
cs.customer_id as existing_customer_id,
cs.attribute1 as existing_attribute1,
cs.date1 as existing_date1,
cs.date2 as existing_date2,
cs2.email_address as conflicting_email,
cs2.customer_id AS conflicting_customer_id
FROM [dbo].[signup] cs
LEFT JOIN [dbo].[signup] cs2 ON cs2.email_address = :emailAddress
AND cs.site_key = cs2.site_key
AND cs.list_id = cs2.list_id
WHERE cs.customer_id = :customerId
)
MERGE signup WITH (HOLDLOCK) AS target
USING source
ON ( source.conflicting_customer_id is not null )
WHEN MATCHED AND source.existing_site_key = target.site_key AND source.existing_list_id = target.list_id AND source.conflicting_email = target.email_address THEN UPDATE
SET customer_id = :customerId
WHEN NOT MATCHED BY target AND source.existing_site_key = target.site_key AND source.existing_list_id = target.list_id AND source.conflicting_customer_id = :customerId THEN INSERT
(site_key, list_id, email_address, customer_id, attribute1, date1, date2) VALUES
(source.existing_site_key, source.existing_list_id, :emailAddress, source.customer_id, source.existing_attribute1, source.existing_date1, source.existing_date2)
Thanks,
mikee