Need help understanding NpgSQL connection opening process - postgresql

I have been trying to optimize a web service that is using NpgSQL 3.2.7 to connect to a PostgreSQL 9.3 database. Today I installed pgBouncer and noticed when running "select * from pg_stat_activity;" that all of my NpgSQL connections had this query listed:
SELECT ns.nspname, a.typname, a.oid, a.typrelid, a.typbasetype,
CASE WHEN pg_proc.proname='array_recv' THEN 'a' ELSE a.typtype END AS type,
CASE
WHEN pg_proc.proname='array_recv' THEN a.typelem
WHEN a.typtype='r' THEN rngsubtype
ELSE 0
END AS elemoid,
CASE
WHEN pg_proc.proname IN ('array_recv','oidvectorrecv') THEN 3 /* Arrays last */
WHEN a.typtype='r' THEN 2 /* Ranges before */
WHEN a.typtype='d' THEN 1 /* Domains before */
ELSE 0 /* Base types first */
END AS ord
FROM pg_type AS a
JOIN pg_namespace AS ns ON (ns.oid = a.typnamespace)
JOIN pg_proc ON pg_proc.oid = a.typreceive
LEFT OUTER JOIN pg_type AS b ON (b.oid = a.typelem)
LEFT OUTER JOIN pg_range ON (pg_range.rngtypid = a.oid)
WHERE
(
a.typtype IN ('b', 'r', 'e', 'd') AND
(b.typtype IS NULL OR b.typtype IN ('b', 'r', 'e', 'd')) /* Either non-array or array of supported element type */
)
When I run this query in pgAdmin it takes 3 to 5 seconds to complete the second time I run it when everything should be cached. When I have run my code interactively executing the first open command in a web service call has taken 3 to 5 seconds.
Does this run every time a connection is created? It looks to me like this is an expensive query to get some relatively static data. If this does have to run every time a connection is created, does anyone have any suggestions on how to architect around this in a web service? 3 to 5 seconds is just too much overhead for every call to a web service. Does using pooling have any affect on whether or not this query is run?
ADDED: 03/14/2018
These are log entries I am seeing after creating a table to hold the results of the types query. It runs it successfully and then later cannot find the table for some reason.
2018-03-14 15:35:42 EDT LOG: duration: 0.715 ms parse : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:42 EDT LOG: duration: 0.289 ms bind : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:42 EDT LOG: execute : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:42 EDT LOG: duration: 0.391 ms
2018-03-14 15:35:44 EDT ERROR: relation "public.npgsqltypes" does not exist at character 71
2018-03-14 15:35:44 EDT STATEMENT: select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:44 EDT LOG: statement: DISCARD ALL
2018-03-14 15:35:44 EDT LOG: duration: 0.073 ms
ADDED: 03/15/2018
Explain output of types query:
Sort (cost=3015139.78..3018795.67 rows=1462356 width=213)
Sort Key: (CASE WHEN (pg_proc.proname = ANY ('{array_recv,oidvectorrecv}'::name[])) THEN 3 WHEN (a.typtype = 'r'::"char") THEN 2 WHEN (a.typtype = 'd'::"char") THEN 1 ELSE 0 END)
-> Hash Left Join (cost=920418.37..2779709.53 rows=1462356 width=213)
Hash Cond: (a.oid = pg_range.rngtypid)
-> Hash Join (cost=920417.24..2752289.21 rows=1462356 width=209)
Hash Cond: ((a.typreceive)::oid = pg_proc.oid)
-> Hash Join (cost=919817.78..2724270.58 rows=1462356 width=149)
Hash Cond: (a.typnamespace = ns.oid)
-> Hash Left Join (cost=919305.50..2687199.40 rows=1462356 width=89)
Hash Cond: (a.typelem = b.oid)
Filter: (((a.typtype = ANY ('{b,r,e,d}'::"char"[])) AND ((b.typtype IS NULL) OR (b.typtype = ANY ('{b,r,e,d}'::"char"[])))) OR ((a.typname = ANY ('{record,void}'::name[])) AND (a.typtype = 'p'::"char")))
-> Seq Scan on pg_type a (cost=0.00..694015.89 rows=13731889 width=89)
-> Hash (cost=694015.89..694015.89 rows=13731889 width=5)
-> Seq Scan on pg_type b (cost=0.00..694015.89 rows=13731889 width=5)
-> Hash (cost=388.79..388.79 rows=9879 width=68)
-> Seq Scan on pg_namespace ns (cost=0.00..388.79 rows=9879 width=68)
-> Hash (cost=465.87..465.87 rows=10687 width=68)
-> Seq Scan on pg_proc (cost=0.00..465.87 rows=10687 width=68)
-> Hash (cost=1.06..1.06 rows=6 width=8)
-> Seq Scan on pg_range (cost=0.00..1.06 rows=6 width=8)

You're right, this query is issued by Npgsql to load all the types from a PostgreSQL backend - different database can have different data types (due to extensions, user-defined types, etc.).
However, this query is sent only on the first physical connection to a specific database, as identified by its connection string. In other words, if you connect to the same database X times - to the same connection string - you should only see this query being sent once. Npgsql caches this information internally. I just verified that this is the behavior in 3.2.7, are you seeing something else?

Related

Avoid Materialize in Explain Plan while running Postgres Query

I am trying to know the explain plan and optimize my query. Here is the query that I am using. While I am joining with pd_ontology table, I am seeing that the cost is increasing heavily.
explain create table l1.test as
select
null as empi,
coalesce(nullif(a.pid_2_1,''),nullif(a.pid_3_1,''),nullif(a.pid_4_1,'')) as id,
coalesce(nullif(pid_3_5,''),'Patient ID') as idt,
upper(trim(pid_5_2)) as fn,
upper(trim(pid_5_3)) as mn,
upper(trim(pid_5_1)) as ln,
nullif(pid_7_1,'')::date as dob,
upper(trim(pid_8_1)) as gn,
nullif(pid_29_1,'')::date as dod,
upper(trim(pid_30_1)) as df,
upper(trim(pid_11_1)) as psa1,
upper(trim(pid_11_2)) as psa2,
upper(trim(pid_11_3)) as pci,
upper(trim(pid_11_4)) as pst,
upper(trim(pid_11_5)) as pz,
upper(trim(pid_11_6)) as pcy,
coalesce(nullif(a.pid_13_1,''),nullif(a.pid_13_2,''),nullif(a.pid_13_3,''),nullif(a.pid_14_1,''),nullif(a.pid_14_2,''),nullif(a.pid_14_3,'')) as tel1,
coalesce(nullif(a.pid_13_1,''),nullif(a.pid_13_2,''),nullif(a.pid_13_3,''),nullif(a.pid_14_1,''),nullif(a.pid_14_2,''),nullif(a.pid_14_3,'')) as cell1,
lower(trim(pid_13_4)) as eml1,
upper(trim(pid_10_1)) as race,
upper(trim(pid_10_2)) as racen,
upper(trim(pid_22_1)) as ethn,
upper(trim(pid_22_2)) as ethnm,
upper(trim(pid_24_1)) as mbi,
upper(trim(pid_16_1)) as ms,
upper(trim(pid_16_2)) as msn,
coalesce(nullif(a.pid_11_9,''),nullif(a.pid_12_1,'')) as pct,
upper(trim(pid_15_1)) as pl,
upper(trim(pid_17_1)) as rel,
upper(trim(pid_19_1)) as ssn,
trim(obx_3_1) as rc,
--trim(o.cdscs) as rn,
null as rn,
trim(obx_3_3) as rcs,
trim(obx_5_1) as rv,
obx_6_1 as uru,
obx_8_1 as oac,
obr_25_1 as rst,
rtrim(trim(replace(replace(regexp_replace(replace(obx_7_1,'x10E3','*10^3'),'[a-zA-Z%]','','g'),'^','E'),'*','x')),'/') as onrr,
trim(split_part(rtrim(trim(replace(replace(regexp_replace(replace(obx_7_1,'x10E3','*10^3'),'[a-zA-Z%]','','g'),'^','E'),'*','x')),'/'),'-',1)) as rrl,
trim(split_part(rtrim(trim(replace(replace(regexp_replace(replace(obx_7_1,'x10E3','*10^3'),'[a-zA-Z%]','','g'),'^','E'),'*','x')),'/'),'-',2)) as rrh,
obx_10_1 as natc,
orc_2_1 as "pon",
left(nullif(obx_14_1,''),8)::date as rdt,
case when to_date(nullif(obx_14_1,''),'yyyyMMddHH24miss') not between '1800-01-01' and current_date then null else to_date(nullif(obx_14_1,''),'yyyyMMddHH24miss') end as efdt,
case when to_date(nullif(obx_14_1,''),'yyyyMMddHH24miss') not between '1800-01-01' and current_date then null else to_date(nullif(obx_14_1,''),'yyyyMMddHH24miss') end as eldt,
coalesce(obr_16_1,'') as opid,
nullif(obr_16_13,'null') as opidt,
trim(orc_12_1) as opnpi,
--trim(upper(n.name)) as opn,
null as opn,
trim(nullif(obr_4_1,'null')) as oc,
trim(nullif(obr_4_3,'null')) as ocs,
trim(nullif(obr_4_2,'null')) as on,
to_date(nullif(obr_7_1,''),'yyyyMMddHH24miss') as ofdt,
trim(orc_5_1) as os,
--left(e.nte_3_1,512) as cmd,
split_part(b.filename,'/',5) as sfn,
'Clinical' as st,
now() AS ingdt,
'4' as acoid ,
'Test' as acon,
'result' as cltp,
'Test' as sstp,
'202' as sid
from l1.vipn_pal_historical_all_oru_pid a
join l1.vipn_pal_historical_all_oru_obx b
on a.control_id = b.control_id
and b.cross_join_tuple_count = '0'
left join l1.vipn_pal_historical_all_oru_obr c
on a.control_id = c.control_id
and b.order_observation_order = c.order_observation_order
and a.cross_join_tuple_count = '1'
left join l1.vipn_pal_historical_all_oru_orc d
on a.control_id = d.control_id
and d.order_observation_order = b.order_observation_order
and a.cross_join_tuple_count = '1'
left join (select control_id ,order_observation_order ,observation_order,replace(string_agg(nte_3_1 ,' '),'\.br\',chr(13)||chr(10)) as nte_3_2
from l1.vipn_pal_historical_all_oru_nte
group by control_id ,order_observation_order ,observation_order ) e
on a.control_id = e.control_id and e.observation_order = b.observation_order
and e.order_observation_order = b.order_observation_order
--and e.cross_join_tuple_count = '1'
left join (select * from l2.pd_ontology where dtp = 'result') o
on (b.obx_3_1 = o.cval or b.obx_3_1 = cvald)
left join l2.pd_npi n
on d.orc_12_1 = n.npi;
Here is the explain Plan generated where you can see that the materialize is taking load.
Merge Left Join (cost=106313.03..7599360149686.98 rows=329075452 width=1641)
Merge Cond: ((a.control_id)::text = (c.control_id)::text)
Join Filter: (((a.cross_join_tuple_count)::text = '1'::text) AND ((b.order_observation_order)::text = (c.order_observation_order)::text))
-> Merge Left Join (cost=106311.69..7599175158271.60 rows=329075452 width=244)
Merge Cond: ((a.control_id)::text = (d.control_id)::text)
Join Filter: (((a.cross_join_tuple_count)::text = '1'::text) AND ((d.order_observation_order)::text = (b.order_observation_order)::text))
-> Merge Join (cost=106310.57..7599144659758.97 rows=329075452 width=236)
Merge Cond: ((a.control_id)::text = (b.control_id)::text)
-> Index Scan using vipn_pal_historical_all_oru_pid_control_id_idx on vipn_pal_historical_all_oru_pid a (cost=0.56..800918.31 rows=9353452 width=96)
-> Materialize (cost=106309.92..7599139604853.41 rows=282211264 width=161)
-> Nested Loop Left Join (cost=106309.92..7599138899325.25 rows=282211264 width=161)
Join Filter: (((b.obx_3_1)::text = (pd_ontology.cval)::text) OR ((b.obx_3_1)::text = (pd_ontology.cvald)::text))
-> Index Scan using vipn_pal_historical_all_oru_obx_control_id_idx on vipn_pal_historical_all_oru_obx b (cost=0.57..53285968.32 rows=282211264 width=161)
Filter: ((cross_join_tuple_count)::text = '0'::text)
-> **Materialize (cost=106309.35..1255207.79 rows=1538682 width=19)**
-> Bitmap Heap Scan on pd_ontology (cost=106309.35..1247514.38 rows=1538682 width=19)
Recheck Cond: ((dtp)::text = 'result'::text)
-> Bitmap Index Scan on pd_ont_idx_dtp (cost=0.00..105924.68 rows=1538682 width=0)
Index Cond: ((dtp)::text = 'result'::text)
-> Materialize (cost=1.12..14373643.76 rows=18706904 width=29)
-> Nested Loop Left Join (cost=1.12..14326876.50 rows=18706904 width=29)
-> Index Scan using vipn_pal_historical_all_oru_orc_control_id_idx on vipn_pal_historical_all_oru_orc d (cost=0.56..2587122.40 rows=18706904 width=29)
-> Index Only Scan using idx_pd_npi_npi on pd_npi n (cost=0.56..0.62 rows=1 width=11)
Index Cond: (npi = (d.orc_12_1)::text)
-> Materialize (cost=0.57..12676277.17 rows=80915472 width=60)
-> Index Scan using vipn_pal_historical_all_oru_obr_control_id_idx on vipn_pal_historical_all_oru_obr c (cost=0.57..12473988.49 rows=80915472 width=60)
Is there a way to avoid Materialize in query and optimize it?
I removed the index to solve this issue. The Materialize is re-scanning and to avoid that, I dropped the index. Now materialize will not do an index scan and hence, it does not need to re-scan. Saving cost!!

Simple batch DELETE then INSERT procedure some 1000 times slower than executing the statements one after the other

In arather simple table with an composite primary key (see DDL) there are about 40k records.
create table la_ezg
(
be_id integer not null,
usage text not null,
area numeric(18, 6),
sk_area numeric(18, 6),
count_ezg numeric(18, 6),
...
...
constraint la_ezg_pkey
primary key (be_id, usage)
);
There is also a simple procedure which purpose is to delete rows with a certain be_id and persist the rows from another view where they are "generated"
CREATE OR REPLACE function pr_create_la_ezg(pBE_ID numeric) returns void as
$$
begin
delete from la_ezg where be_id = pBE_ID;
insert into la_ezg_(BE_ID, USAGE, ...)
select be_id, usage, ...
from vw_la_ezg_with_usage
where be_id = pBE_ID;
END;
$$ language plpgsql;
The procedure need about 7 Minutes to execute...
Both Statements (DELETE and INSERT) execute in less than 100ms on the very same be_id.
There are a lot of different locks happening in pg_lock during that 7 Minutes but I wasn't able to figure out what exactly is going on inside this transaction and if there is some kind of deadlocking. After all the procedure is returning successful, but it needs way too much time doing it.
EDIT (activated 'auto_explain' and ran all three queries again):
duration: 1.420 ms plan:
Query Text: delete from la_ezg where be_id=790696
Delete on la_ezg (cost=4.33..22.89 rows=5 width=6)
-> Bitmap Heap Scan on la_ezg (cost=4.33..22.89 rows=5 width=6)
Output: ctid
Recheck Cond: (la_ezg.be_id = 790696)
-> Bitmap Index Scan on sys_c0073325 (cost=0.00..4.33 rows=5 width=0)
Index Cond: (la_ezg.be_id = 790696)
1 row affected in 107 ms
duration: 71.645 ms plan:
Query Text: insert into la_ezg(BE_ID,USAGE,...)
select be_id,USAGE,... from vw_la_ezg_with_usage where be_id=790696
Insert on la_ezg (cost=1343.71..2678.87 rows=1 width=228)
-> Nested Loop (cost=1343.71..2678.87 rows=1 width=228)
Output: la_ezg_geo.be_id, usage.nutzungsart, COALESCE(round(((COALESCE(st_area(la_ezg_geo.geometry), '3'::double precision) / '10000'::double precision))::numeric, 2), '0'::numeric), NULL::numeric, COALESCE((count(usage.nutzungsart)), '0'::bigint), COALESCE(round((((sum(st_area(st_intersection(ezg.geometry, usage.geom)))) / '10000'::double precision))::numeric, 2), '0'::numeric), COALESCE(round(((((sum(st_area(st_intersection(ezg.geometry, usage.geom)))) * '100'::double precision) / COALESCE(st_area(la_ezg_geo.geometry), '3'::double precision)))::numeric, 2), '0'::numeric), NULL::character varying, NULL::timestamp without time zone, NULL::character varying, NULL::timestamp without time zone
-> GroupAggregate (cost=1343.71..1343.76 rows=1 width=41)
Output: ezg.be_id, usage.nutzungsart, sum(st_area(st_intersection(ezg.geometry, usage.geom))), count(usage.nutzungsart)
Group Key: ezg.be_id, usage.nutzungsart
-> Sort (cost=1343.71..1343.71 rows=1 width=1834)
Output: ezg.be_id, usage.nutzungsart, ezg.geometry, usage.geom
Sort Key: usage.nutzungsart
-> Nested Loop (cost=0.42..1343.70 rows=1 width=1834)
Output: ezg.be_id, usage.nutzungsart, ezg.geometry, usage.geom
-> Seq Scan on la_ezg_geo ezg (cost=0.00..1335.00 rows=1 width=1516)
Output: ezg.objectid, ezg.be_id, ezg.name, ezg.se_anno_cad_data, ezg.benutzer_geaendert, ezg.datum_geaendert, ezg.status, ezg.benutzer_erstellt, ezg.datum_erstellt, ezg.len, ezg.geometry, ezg.temp_char, ezg.vulgo, ezg.flaeche, ezg.hauptgemeinde, ezg.prozessart, ezg.verbauungsgrad, ezg.verordnung_txt, ezg.gemeinden_txt, ezg.hinderungsgrund, ezg.kompetenz, ezg.seehoehe_min, ezg.seehoehe_max, ezg.neigung_min, ezg.neigung_max, ezg.exposition
Filter: (ezg.be_id = 790696)
-> Index Scan using dkm_nutz_fl_geom_1551355663100174000 on dkm.dkm_nutz_fl nutzung (cost=0.42..8.69 rows=1 width=318)
Output: usage.gdo_gid, usage.gst, usage.nutzungsart, usage.nutzungsabschnitt, usage.statistik, usage.flaeche, usage.kennung, usage.von_datum, usage.bis_datum, usage.von_az, usage.bis_az, usage.projekt, usage.fme_basename, usage.fme_dataset, usage.fme_feature_type, usage.fme_type, usage.oracle_srid, usage.geom
Index Cond: ((usage.geom && ezg.geometry) AND (usage.geom && ezg.geometry))
Filter: _st_intersects(usage.geom, ezg.geometry)
-> Seq Scan on la_ezg_geo (cost=0.00..1335.00 rows=1 width=1516)
Output: la_ezg_geo.objectid, la_ezg_geo.be_id, la_ezg_geo.name, la_ezg_geo.se_anno_cad_data, la_ezg_geo.benutzer_geaendert, la_ezg_geo.datum_geaendert, la_ezg_geo.status, la_ezg_geo.benutzer_erstellt, la_ezg_geo.datum_erstellt, la_ezg_geo.len, la_ezg_geo.geometry, la_ezg_geo.temp_char, la_ezg_geo.vulgo, la_ezg_geo.flaeche, la_ezg_geo.hauptgemeinde, la_ezg_geo.prozessart, la_ezg_geo.verbauungsgrad, la_ezg_geo.verordnung_txt, la_ezg_geo.gemeinden_txt, la_ezg_geo.hinderungsgrund, la_ezg_geo.kompetenz, la_ezg_geo.seehoehe_min, la_ezg_geo.seehoehe_max, la_ezg_geo.neigung_min, la_ezg_geo.neigung_max, la_ezg_geo.exposition
Filter: (la_ezg_geo.be_id = 790696)
1 row affected in 149 ms
duration: 421851.819 ms plan:
Query Text: select pr_create_la_ezg(790696)
Result (cost=0.00..0.26 rows=1 width=4)
Output: pr_create_la_ezg('790696'::numeric)
1 row retrieved starting from 1 in 7 m 1 s 955 ms (execution: 7 m 1 s 929 ms, fetching: 26 ms)
P.S. I shortened some of the queries and names for the sake of readability
P.P.S. This database is a legacy migration project. Like in this case there are often views dependent on views in multiple layers. I´d like to streamline all this but Ia m in a desperate need to debug whats going on inside such an transaction, otherwise I would have to rebuild nearly all with the risk of breaking things

Postgres - Update Performance degraded

Can someone please help assist in identifying why below statement which used to take 2 hours is not taking 6 hours without volume increase being a factor.
with P as
(SELECT DISTINCT CD.CASE_DETAIL_ID, SVL.SERVICE_LEVEL_ID\n
FROM report_fct CD LEFT JOIN SERVICE_LEVEL SVL ON SVL.ORDER_TYPE_CD = CD.ORDER_TYPE_CD\n
AND SVL.SOURCE_ID = CD.SOURCE_ID\n AND SVL.AREA_ID = CD.HQ_AREA_ID\n AND SVL.CATEGORY_ID = CD.CATEGORY_ID\n AND SVL.STATE_CD = CD.CUST_STATE\n
WHERE CD.LINE_OF_BIZ = 'CLOTH'\n
AND CD.HQ_AREA_ID is NOT NULL\n
AND CD.SOURCE_ID is NOT NULL\n
AND CD.CATEGORY_ID is NOT NULL\n
AND CD.CUST_STATE is NOT NULL)\n
update report_fct rpt\n
set service_level_id = P.service_level_id\n
from P\n
where rpt.case_detail_id = P.case_detail_id;"}
CREATE TABLE report_fct
...
..
case_detail_id bigint NOT NULL,
...
CREATE INDEX report_fct _ix1
ON report_fct USING btree
(case_detail_id)
TABLESPACE pg_default;
CREATE INDEX report_fct _ix2
ON report_fct USING btree
(insert_dt)
TABLESPACE pg_default;
One doubt I have is whether statistics can be skewed on this table which is resulting in degradation.
relname inserts updates deletes live_tuples dead_tupes last autovacuum last autoanalyze
report_fct 262746347 5387849450 0 2473523 3573914 5/19/20 3:38 5/19/20 1:13
EXPLAIN:
"Update on report_fct rpt (cost=24847.47..27881.35 rows=415 width=3772)"
" CTE p"
" -> Unique (cost=24844.02..24847.05 rows=405 width=16)"
" -> Sort (cost=24844.02..24845.03 rows=405 width=16)"
" Sort Key: cd.case_detail_id, svl.service_level_id"
" -> Nested Loop Left Join (cost=0.41..24826.48 rows=405 width=16)"
" -> Seq Scan on report_fct cd (cost=0.00..21915.21 rows=405 width=44)"
" Filter: ((hq_area_id IS NOT NULL) AND (source_id IS NOT NULL) AND (category_id IS NOT NULL) AND (cust_state IS NOT NULL) AND ((line_of_biz)::text = 'CLOTH'::text))"
" -> Index Scan using service_level_unq on service_level svl (cost=0.41..7.18 rows=1 width=45)"
" Index Cond: ((area_id = cd.hq_area_id) AND ((order_type_cd)::text = (cd.order_type_cd)::text) AND (source_id = cd.source_id) AND (state_cd = (cd.cust_state)::bpchar) AND (category_id = cd.category_id))"
" -> Nested Loop (cost=0.41..3034.30 rows=415 width=3772)"
" -> CTE Scan on p (cost=0.00..8.10 rows=405 width=56)"
" -> Index Scan using report_fct_ix1 on report_fct rpt (cost=0.41..7.46 rows=1 width=3724)"
" Index Cond: (case_detail_id = p.case_detail_id)"

psql: intermittent segmentation fault: server closed the connection unexpectedly

I looked at similar-sounding questions but none seemed to address my case:
On Mac OS Siera 16GB RAM, localhost (no other postgres running anywhere)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The logs say:
2019-03-23 08:12:04.076 MDT [841] LOG: server process (PID 1175) was terminated by signal 11: Segmentation fault
2019-03-23 07:13:10.459 MDT [841] LOG: terminating any other active
server processes 2019-03-23 07:13:10.459 MDT [951] WARNING:
terminating connection because of crash of another server process
2019-03-23 07:13:10.459 MDT [951] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory. 2019-03-23 07:13:10.459 MDT [951] HINT: In a
moment you should be able to reconnect to the database and repeat your
command. 2019-03-23 07:13:10.460 MDT [980] FATAL: the database system
is in recovery mode 2019-03-23 07:13:10.461 MDT [841] LOG: all server
processes terminated; reinitializing 2019-03-23 07:13:10.470 MDT [981]
LOG: database system was interrupted; last known up at 2019-03-23
07:06:47 MDT 2019-03-23 07:13:10.744 MDT [981] LOG: database system
was not properly shut down; automatic recovery in progress 2019-03-23
07:13:10.746 MDT [981] LOG: redo starts at 28/15BF74F0 2019-03-23
07:13:10.746 MDT [981] LOG: invalid record length at 28/15BF7528:
wanted 24, got 0 2019-03-23 07:13:10.746 MDT [981] LOG: redo done at
28/15BF74F0 2019-03-23 07:13:10.755 MDT [841] LOG: database system is
ready to accept connections
PSQL version:
psql --version
psql (PostgreSQL) 11.1
Happens in both psql terminal and pgAdmin. No CPU or memory spikes when this happens.
It doesn't happen on simple result sets. See this example: it's the same query, the first time returning a count, the second time returning rows (which triggers the error):
shill=# with yards_manual as (
select device_id,loc, sum(sq_meters)*10.7639 as manual_yard_sq_ft from device d
inner join zones z on (z.device_id=d.id)
where z.enabled and z.sq_meters<46 or z.sq_meters>47
group by 1,2
)
select count(device_id) from yards_manual;
count
-------
84983
shill=# with yards_manual as (
shill(# select device_id,loc, sum(sq_meters)*10.7639 as manual_yard_sq_ft from device d
shill(# inner join zones z on (z.device_id=d.id)
shill(# where z.enabled and z.sq_meters<46 or z.sq_meters>47 --and z.crop_type in ('WARM_SEASON_GRASS','COOL_SEASON_GRASS')
shill(# group by 1,2
shill(# )
shill-#
shill-# select distinct device_id, y.manual_yard_sq_ft, build_area_ft2 , prop_area_ft2,(prop_area_ft2-build_area_ft2) as gis_yard_sq_ft2 --, st_npoints(property_geom) as corners
shill-# from yards_manual y inner join yards b on st_contains(b.property_geom,y.loc)
shill-# where (prop_area_ft2-build_area_ft2)>0 and (prop_area_ft2-build_area_ft2)<20000
shill-# ;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Although, this last query sometimes returns. Once it errors-out, it always errors out until I sart/stop the db. But sarting/stopping does not always work. I have retried restarting postgres, backing up and restoring the database, to no avail. The problem just started happening. VACCUUM FULL worked fine, error still happens. The db is 24GB.
Here is the same query now randomly returning:
device_id | manual_yard_sq_ft | build_area_ft2 | prop_area_ft2 | gis_yard_sq_ft2
----------+-------------------+------------------+------------------+------------------
0022682e | 3999.9944068 | 1666.25757779497 | 12948.051385913 | 11281.793808118
002a4379 | 1934.99812741536 | 2907.60847006035 | 15872.352961764 | 12964.7444917037
002adeb4 | 1599.9984516096 | 2856.54321331877 | 9800.49184470172 | 6943.94863138295
But when I ran it a second time, it errored out as described above.
Here's the SQL execution plan:
Unique (cost=137590686.48..137602981.21 rows=819649 width=548)
Output: y.device_id, y.manual_yard_sq_ft, b.build_area_ft2, b.prop_area_ft2, ((b.prop_area_ft2 - b.build_area_ft2))
CTE yards_manual
-> Finalize GroupAggregate (cost=163766.01..227836.10 rows=519752 width=77)
Output: z.device_id, d.loc, (sum(z.sq_meters) * '10.7639'::double precision)
Group Key: z.device_id, d.loc
-> Gather Merge (cost=163766.01..218090.75 rows=433126 width=77)
Output: z.device_id, d.loc, (PARTIAL sum(z.sq_meters))
Workers Planned: 2
-> Partial GroupAggregate (cost=162765.98..167097.24 rows=216563 width=77)
Output: z.device_id, d.loc, PARTIAL sum(z.sq_meters)
Group Key: z.device_id, d.loc
-> Sort (cost=162765.98..163307.39 rows=216563 width=77)
Output: z.device_id, d.loc, z.sq_meters
Sort Key: z.device_id, d.loc
-> Parallel Hash Join (cost=8564.46..133948.71 rows=216563 width=77)
Output: z.device_id, d.loc, z.sq_meters
Hash Cond: ((z.device_id)::text = (d.id)::text)
-> Parallel Seq Scan on public.zones z (cost=0.00..118450.79 rows=216563 width=45)
Output: z.device_id, z.sq_meters
Filter: ((z.enabled AND (z.sq_meters < '46'::double precision)) OR (z.sq_meters > '47'::double precision))
-> Parallel Hash (cost=5648.76..5648.76 rows=120376 width=69)
Output: d.loc, d.id
-> Parallel Seq Scan on public.device d (cost=0.00..5648.76 rows=120376 width=69)
Output: d.loc, d.id
-> Sort (cost=137362850.38..137364899.50 rows=819649 width=548)
Output: y.device_id, y.manual_yard_sq_ft, b.build_area_ft2, b.prop_area_ft2, ((b.prop_area_ft2 - b.build_area_ft2))
Sort Key: y.device_id, y.manual_yard_sq_ft, b.build_area_ft2, b.prop_area_ft2, ((b.prop_area_ft2 - b.build_area_ft2))
-> Nested Loop (cost=0.41..136878917.80 rows=819649 width=548)
Output: y.device_id, y.manual_yard_sq_ft, b.build_area_ft2, b.prop_area_ft2, (b.prop_area_ft2 - b.build_area_ft2)
-> CTE Scan on yards_manual y (cost=0.00..10395.04 rows=519752 width=556)
Output: y.device_id, y.loc, y.manual_yard_sq_ft
-> Index Scan using prop_geom_idx on public.yards b (cost=0.41..263.31 rows=2 width=173)
Output: b.block_id, b.property_geom, b.building_geom, b.prop_area_ft2, b.build_area_ft2, b.yard_area_ft, b.vegetation, b.yard_id
Index Cond: (b.property_geom ~ y.loc)
Filter: (((b.prop_area_ft2 - b.build_area_ft2) > '0'

postgresql- slow update query

I'm working with a table that has about 19 million rows and 60 columns (bigtable). Of the 19 million records, about 17 million have x and y coordinates (about 1.8 million distinct combinations of x and y). I needed to add some additional geocoded information to the table from another file (census_geocode). I've created a lookup table (distinct_xy) that has a list of all the distinct x and y coordinate pairs and an ID. I have indexes on bigtable (x_coord, y_coord), census_geocode (x_coord, y_coord), and distinct_xy (x_coord, y_coord), and a primary key in distinct_xy (xy_id) and census_geocode (xy_id). So here's the query:
Update bigtable
set block_grp = cg.blkgrp,
block = cg.block,
tract = cg.tractce10
from census_geocode cg, distinct_xy xy
where bigtable.x_coord = xy.x_coord and
bigtable.y_coord=xy.y_coord and cg.xy_id=xy.xy_id;
This is running extremely slowly. as in:
"Update on bigtable (cost=17675751.51..17827040.74 rows=22 width=327)"
" -> Nested Loop (cost=17675751.51..17827040.74 rows=22 width=327)"
" -> Merge Join (cost=17675751.51..17826856.26 rows=22 width=312)"
" Merge Cond: ((bigtable.x_coord = xy.x_coord) AND (bigtable.y_coord = xy.y_coord))"
" -> Sort (cost=17318145.58..17366400.81 rows=19302092 width=302)"
" Sort Key: bigtable.x_coord, bigtable.y_coord"
" -> Seq Scan on bigtable (cost=0.00..1457709.92 rows=19302092 width=302)"
" -> Materialize (cost=357588.42..366887.02 rows=1859720 width=26)"
" -> Sort (cost=357588.42..362237.72 rows=1859720 width=26)"
" Sort Key: xy.x_coord, xy.y_coord"
" -> Seq Scan on distinct_xy xy (cost=0.00..30443.20 rows=1859720 width=26)"
" -> Index Scan using census_geocode_pkey on census_geocode cg (cost=0.00..8.37 rows=1 width=23)"
" Index Cond: (xy_id = xy.xy_id)"
I've also tried splitting this apart and inserting the lookup key back into the big table to avoid the multi-table join.
Update bigtable
set xy_id = xy.xy_id
from distinct_xy xy
where bigtable.x_coord = xy.x_coord and bigtable.y_coord=xy.y_coord;
this also runs for hours without completing.
"Update on bigtable (cost=0.00..20577101.71 rows=22 width=404)"
" -> Nested Loop (cost=0.00..20577101.71 rows=22 width=404)"
" -> Seq Scan on distinct_xy xy (cost=0.00..30443.20 rows=1859720 width=26)"
" -> Index Scan using rae_xy_idx on bigtable (cost=0.00..11.03 rows=1 width=394)"
" Index Cond: ((x_coord = xy.x_coord) AND (y_coord = xy.y_coord))"
Can someone please help me improve this query's performance?