Long running query hangs application despite multiple cores - postgresql

Our server has 8 cores and is running a web application(DHIS2) which uses postgres as database.
There is big select query which takes a few hours to execute. (The query is executed from the terminal)
When that query is run the cpu utilization of that query's process jumps to a constant 100%.
This hangs the application and the application's page does not even load in the browser. This must be because other postgres processes are waiting for that query's process to comeplete.
BUT, when we have muliple cores in the machine then why utilization of a single core to 100% should stop the rest of the processes from executing?
I am unable to understand the concept of multiple cpus and cores in this context. Does postgres not recognize them? What can be the dependency of a select query on another query?
Could somebody please explain this behaviour and suggest ways in which to manage execution of big queries though some kind of postgres configuration may be?
Postgres Version - 9.6
OS - Ubuntu 16
Database Size - 200GB on disk
DHIS2 Version - 2.30
Query (Calucates outliers) -
select datasets,
max(ou1.name) Country, ou1.organisationunitid as Country__Id ,
max(ou2.name) state, ou2.organisationunitid as State__Id ,
max(ou3.name) Division, ou3.organisationunitid as Division__Id ,
max(ou4.name) District, max(ou4.code) as District__Code, ou4.organisationunitid as District__Id,
max(ou5.name) Block, max(ou5.code) as Block__Code, ou5.organisationunitid as Block__Id,
max(ou6.name) Facility, max(ou6.code) as Facility__Code,ou6.organisationunitid as Facility__Id,
max(ou.name) as outlierfacility,
max(de.name) as dataelement,
max(coc.name) as category,
concat(max(p.startdate),':',max(p.enddate)) as period,
max(pt.name) as frequency,
_dv.value,
u upperbound,
l lowerbound,
mean,
std
from
(
with stats as (
select dv.sourceid,
dv.dataelementid,
dv.categoryoptioncomboid,
dv.attributeoptioncomboid,
array_agg(distinct dv.periodid) as periods,
array_agg(distinct ds.name) as datasets,
avg(dv.value::float) as mean,
stddev(dv.value::float) as std
from datavalue dv
inner join datasetmembers dsm on dsm.dataelementid = dv.dataelementid
inner join dataelement de on de.dataelementid = dsm.dataelementid
inner join dataset ds on ds.datasetid = dsm.datasetid
inner join period pe on pe.periodid = dv.periodid
inner join periodtype pt on pt.periodtypeid = pe.periodtypeid
inner join categoryoptioncombo coc on dv.categoryoptioncomboid = coc.categoryoptioncomboid
inner join _orgunitstructure ous on ous.organisationunitid = dv.sourceid
where pe.startdate between date('2019-04-29') - interval '6 months' and date('2019-04-29') and pt.name='Monthly'
and de.valueType in ('NUMBER','INTEGER')
and ds.uid in ('123qwe123','123ewq123')
group by dv.sourceid,dv.dataelementid,dv.categoryoptioncomboid,dv.attributeoptioncomboid
)
select dv.*,datasets,mean,std,mean+3*std u,mean-3*std l
from datavalue dv
inner join period pe on pe.periodid = dv.periodid
inner join periodtype pt on pt.periodtypeid = pe.periodtypeid
inner join stats on
stats.dataelementid = dv.dataelementid and
stats.sourceid= dv.sourceid and
stats.categoryoptioncomboid = dv.categoryoptioncomboid and
stats.attributeoptioncomboid = dv.attributeoptioncomboid
where dv.periodid = any(periods)
and (dv.value::float > mean+3*std or dv.value::float < mean-3*std)
) _dv
inner join dataelement de on _dv.dataelementid = de.dataelementid
inner join categoryoptioncombo coc on _dv.categoryoptioncomboid = coc.categoryoptioncomboid
inner join _orgunitstructure ous on _dv.sourceid = ous.organisationunitid
inner join organisationunit ou on ou.organisationunitid = ous.organisationunitid
left join organisationunit ou1 on ou1.organisationunitid = ous.idlevel1
left join organisationunit ou2 on ou2.organisationunitid = ous.idlevel2
left join organisationunit ou3 on ou3.organisationunitid = ous.idlevel3
left join organisationunit ou4 on ou4.organisationunitid = ous.idlevel4
left join organisationunit ou5 on ou5.organisationunitid = ous.idlevel5
left join organisationunit ou6 on ou6.organisationunitid = ous.idlevel6
inner join period p on _dv.periodid = p.periodid
inner join periodtype pt on p.periodtypeid = pt.periodtypeid
group by ou1.organisationunitid,
ou2.organisationunitid,
ou3.organisationunitid,
ou4.organisationunitid,
ou5.organisationunitid,
ou6.organisationunitid,
_dv.dataelementid,_dv.sourceid,_dv.categoryoptioncomboid,_dv.attributeoptioncomboid,_dv.periodid,_dv.value,u,l,mean,std,datasets
order by country,state,division,district,block,facility,dataelement,category

Related

Unable to improve query performance in postgresql

I am trying to join 9 tables together. The count and index of each tables are given below along with the query. Green color in screenshot indicates keys used to join. But please note that I have used another column for visit_occurrence table called visit_occurrence_id to join but it's not indexed
DROP MATERIALIZED VIEW IF EXISTS cdm.dummy CASCADE;
CREATE MATERIALIZED VIEW cdm.dummy as
select
f.person_id,f.gender_id
from cdm.visit_occurrence a
left join
cdm.condition_occurrence b
on a.person_id = b.person_id and a.visit_occurrence_id =
b.visit_occurrence_id
left join
cdm.measurement c
on a.person_id = c.person_id and a.visit_occurrence_id =
c.visit_occurrence_id
left join
cdm.drug_exposure d
on a.person_id = d.person_id and a.visit_occurrence_id =
d.visit_occurrence_id
left join
cdm.procedure_occurrence e
on a.person_id = e.person_id and a.visit_occurrence_id =
e.visit_occurrence_id
left join
cdm.person f
on a.person_id = f.person_id
left join
cdm.observation g
on a.person_id = g.person_id and a.visit_occurrence_id =
g.visit_occurrence_id
left join
cdm.observation_period h
on a.person_id = g.person_id
left join
cdm.death i
on a.person_id = i.person_id
explain output
explain outpt with enable_nestloop = off;
Please note that visit_occurrence is the base table. I pick the columns person_id and visit_occurrence_id from visit_occurrence table to join with other tables as shown in the query. I see that visit_occurrence_id used to join (from base table) with other tables isn't a index column (in base table).
a) Is this the reason for slow performance because it's a base table? But in all other tables, the joining keys are used as index as shown in screenshot above(green color indicates - keys used to join)
b) Are the records count an issue?
Can you help me adapt my query to fix this?
Its been running for more than 5-6 hours but not output yet.
Any help is much appreciated. Will be really helpful

Avoid duplication in SQL Server

I got the below result when i run this query.
SELECT DISTINCT PT.F_PRO AS F_PRODUCT, PT.F_TEXT_CODE AS F_TEXT_CODE, PHT.F_PHRASE AS F_PHRASE FROM T_PROD_TEXT PT
LEFT JOIN T_P_LINKAGE PHL
ON PT.F_TEXT_CODE = PHL.F_TEXT_CODE
INNER JOIN T_P_TRANSLATIONS PHT
ON PHL.F_PHRASE_ID = PHT.F_PHRASE_ID
WHERE PT.F_DATA_CODE = 'MANU' AND PHT.F_LANGUAGE = 'EN'
OUTPUT
F_PRODUCT F_TEXT_CODE F_PHRASE
294264_B MANU0008 Alcoa, Inc
294264_B MANU0012 BioSensory
00091A MANU0006 3M Company
00094A MANU0006 4M Company
00094A MANU0006 5M Company
The above query returns duplication in F_PRODUCT COLUMN.i want to display F_product without duplication. only one record should display for each F_product.(First record) without using top command
Required Output
F_PRODUCT F_TEXT_CODE F_PHRASE
294264_B MANU0008 Alcoa, Inc.
00091A MANU0006 3M Company|par
You can use row_number() to assign a number to each row within a group of f_pro. Then retrieve only rows that are number 1. You can change the order by if something else determines the order.
SELECT *
FROM
(SELECT PT.F_PRO AS F_PRODUCT, PT.F_TEXT_CODE AS F_TEXT_CODE, PHT.F_PHRASE AS F_PHRASE, ROW_NUMBER() OVER (PARTITION BY PT.F_PRO ORDER BY PHT.F_PHRASE ASC) AS RowNum
FROM T_PROD_TEXT PT
LEFT JOIN T_P_LINKAGE PHL
ON PT.F_TEXT_CODE = PHL.F_TEXT_CODE
INNER JOIN T_P_TRANSLATIONS PHT
ON PHL.F_PHRASE_ID = PHT.F_PHRASE_ID
WHERE PT.F_DATA_CODE = 'MANU' AND PHT.F_LANGUAGE = 'EN') dt
WHERE RowNum = 1
SELECT PT.F_PRO AS F_PRODUCT,
MIN(PT.F_TEXT_CODE) AS F_TEXT_CODE,
MIN(PHT.F_PHRASE) AS F_PHRASE FROM T_PROD_TEXT PT
LEFT JOIN T_P_LINKAGE PHL
ON PT.F_TEXT_CODE = PHL.F_TEXT_CODE
INNER JOIN T_P_TRANSLATIONS PHT
ON PHL.F_PHRASE_ID = PHT.F_PHRASE_ID
WHERE PT.F_DATA_CODE = 'MANU' AND PHT.F_LANGUAGE = 'EN'
group By PT.F_PRO;
is one way to do that. It doesn't do it for the "FIRST" since it is vague how would you define the "FIRST".

PostgreSQL - weird query planner behavior

Assume I have a query like this:
SELECT *
FROM clients c
INNER JOIN clients_balances cb ON cb.id_clients = c.id
LEFT JOIN clients com ON com.id = c.id_companies
LEFT JOIN clients com_real ON com_real.id = c.id_companies_real
LEFT JOIN rate_tables rt_orig ON rt_orig.id = c.orig_rate_table
LEFT JOIN rate_tables rt_term ON rt_term.id = c.term_rate_table
LEFT JOIN payment_terms pt ON pt.id = c.id_payment_terms
LEFT JOIN paygw_clients_profiles cpgw ON (cpgw.id_clients = c.id AND cpgw.id_companies = c.id_companies_real)
WHERE
EXISTS (SELECT * FROM accounts WHERE (name LIKE 'x' OR accname LIKE 'x' OR ani LIKE 'x') AND id_clients = c.id)
AND c."type" = '0'
AND c."id" > 0
ORDER BY c."name";
This query takes around 35 seconds to run when used in the production environment ("clients" has about 1 million records). However, if I take out ANY join - the query will take only about 300 ms to execute.
I've played around with the query planner settings, but to no avail.
Here are a few explain analyze outputs:
http://explain.depesz.com/s/hzy (slow - 48049.574 ms)
http://explain.depesz.com/s/FWCd (fast - 286.234 ms, rate_tables JOIN removed)
http://explain.depesz.com/s/MyRf (fast - 539.733 ms, paygw_clients_profiles JOIN removed)
It looks like in the fast case the planner starts from the EXISTS statement and has to perform join for only two rows in total. However, in the slow case it will first join all the tables and then filter by EXISTS.
What I need to do is to make this query run in a reasonable time with all seven join in place.
Postgres version is 9.3.10 on CentOS 6.3.
Thanks.
UPDATE
Rewriting the query like this:
SELECT *
FROM clients c
INNER JOIN clients_balances cb ON cb.id_clients = c.id
INNER JOIN accounts a ON a.id_clients = c.id AND (a.name = 'x' OR a.accname = 'x' OR a.ani = 'x')
LEFT JOIN clients com ON com.id = c.id_companies
LEFT JOIN clients com_real ON com_real.id = c.id_companies_real
LEFT JOIN rate_tables rt_orig ON rt_orig.id = c.orig_rate_table
LEFT JOIN rate_tables rt_term ON rt_term.id = c.term_rate_table
LEFT JOIN payment_terms pt ON pt.id = c.id_payment_terms
LEFT JOIN paygw_clients_profiles cpgw ON (cpgw.id_clients = c.id AND cpgw.id_companies = c.id_companies_real)
WHERE
c."type" = '0' AND c.id > 0
ORDER BY c."name";
makes it run fast, however, this is not acceptable, as account filtration parameters are optional, and I still need the result if there are no matches in that table. Using "LEFT JOIN accounts" instead of "INNER JOIN accounts" kills the performance again.
As suggested by Tome Lane, I've changed the following two parameters: join_collapse_limit and from_collapse_limit to 10 instead of the default 8, and this solved the issue.

how to solve this complicated sql query

these are the five given tables
http://i58.tinypic.com/53wcxe.jpg
this is the recomanded result
http://i58.tinypic.com/2vsrts7.jpg
please help how can i write a query to have this result.
no idea how!!!!
SELECT K.* , COUNT (A.Au_ID) AS AnzahlAuftr
FROM Kunde K
LEFT JOIN Auftrag A ON K.Kd_ID = A.Au_Kd_ID
GROUP BY K.Kd_ID,K.Kd_Firma,K.Kd_Strasse,K.Kd_PLZ,K.Kd_Ort
ORDER BY K.Kd_PLZ DESC;
SELECT COUNT (F.F_ID) AS AnzahlFahrt
FROM Fahrten F
RIGHT JOIN Auftrag A ON A.Au_ID = F.F_Au_ID
SELECT SUM (T.Ts_Strecke) AS SumStrecke
FROM Teilstrecke T
LEFT JOIN Fahrten F ON F.F_ID = T.Ts_F_ID
how to join these 3 in one?
Grouping on Strasse etc. is not necessary and can be quite expensive. What about this approach:
SELECT K.*, ISNULL(Au.AnzahlAuftr,0) AS AnzahlAuftr, ISNULL(Au.AnzahlFahrt,0) AS AnzahlFahrt, ISNULL(Au.SumStrecke,0) AS SumStrecke
FROM Kunde K
LEFT OUTER JOIN
(SELECT A.Au_Kd_ID, COUNT(*) AS AnzahlAuftr, SUM(Fa.AnzahlFahrt1) AS AnzahlFahrt, SUM(Fa.SumStrecke2) AS SumStrecke
FROM Auftrag A LEFT OUTER JOIN
(SELECT F.F_Au_ID, COUNT(*) AS AnzahlFahrt1, SUM(Ts.SumStrecke1) AS SumStrecke2
FROM Fahrten F LEFT OUTER JOIN
(SELECT T.Ts_F_ID, SUM(T.Ts_Strecke) AS SumStrecke1
FROM Teilstrecke T
GROUP BY T.Ts_F_ID) AS Ts
ON Ts.Ts_F_ID = F.F_ID
GROUP BY F.F_Au_ID) AS Fa
ON Fa.F_Au_ID = A.Au_ID
GROUP BY A.Au_Kd_ID) AS Au
ON Au.Au_Kd_ID = K.Kd_ID

Eliminating NULL rows in TSQL query [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
How to eliminate NULL fields in TSQL
I am using SSMS 2008 R2 and am developing a TSQL query. I want just 1 record / profile_name. Because some of these values are NULL, I am currently doing LEFT JOINS on most of the tables. But the problem with the LEFT JOINs is that now I get > 1 record for some profile_names!
But if I change this to INNER JOINs then some profile_names are excluded entirely because they have NULL values for these columns. How do I limit the query result to just one record / profile_name regardless of NULL values? And if there are non-NULL values then I want it to choose the record with non-NULL values. Here is initial query:
select distinct
gp.group_profile_id,
gp.profile_name,
gp.license_number,
gp.is_accepting,
case when gp.is_accepting = 1 then 'Yes'
when gp.is_accepting = 0 then 'No '
end as is_accepting_placement,
mo.profile_name as managing_office,
regions.[region_description] as region,
pv.vendor_name,
pv.id as vendor_id,
at.description as applicant_type,
dbo.GetGroupAddress(gp.group_profile_id, null, 0) as [Office Address],
gsv.status_description
from group_profile gp With (NoLock)
inner join group_profile_type gpt With (NoLock) on gp.group_profile_type_id = gpt.group_profile_type_id and gpt.type_code = 'FOSTERHOME' and gp.agency_id = #agency_id and gp.is_deleted = 0
inner join group_profile mo With (NoLock) on gp.managing_office_id = mo.group_profile_id
left outer join payor_vendor pv With (NoLock) on gp.payor_vendor_id = pv.payor_vendor_id
left outer join applicant_type at With (NoLock) on gp.applicant_type_id = at.applicant_type_id and at.is_foster_home = 1
inner join group_status_view gsv With (NoLock) on gp.group_profile_id = gsv.group_profile_id and gsv.status_value = 'OPEN' and gsv.effective_date =
(Select max(b.effective_date) from group_status_view b With (NoLock)
where gp.group_profile_id = b.group_profile_id)
left outer join regions With (NoLock) on isnull(mo.regions_id, gp.regions_id) = regions.regions_id
left join enrollment en on en.group_profile_id = gp.group_profile_id
join event_log el on el.event_log_id = en.event_log_id
left join people client on client.people_id = el.people_id
As you can see, the results of the above query is 1 row / profile_name:
group_profile_id profile_name license_number is_accepting is_accepting_placement managing_office region vendor_name vendor_id applicant_type Office Address status_description Cert Date2
But now watch what happens when I add in 2 LEFT JOINs and 1 additional column:
select distinct
gp.group_profile_id,
gp.profile_name,
gp.license_number,
gp.is_accepting,
case when gp.is_accepting = 1 then 'Yes'
when gp.is_accepting = 0 then 'No '
end as is_accepting_placement,
mo.profile_name as managing_office,
regions.[region_description] as region,
pv.vendor_name,
pv.id as vendor_id,
at.description as applicant_type,
dbo.GetGroupAddress(gp.group_profile_id, null, 0) as [Office Address],
gsv.status_description,
ri.[description] as race
from group_profile gp With (NoLock)
inner join group_profile_type gpt With (NoLock) on gp.group_profile_type_id = gpt.group_profile_type_id and gpt.type_code = 'FOSTERHOME' and gp.agency_id = #agency_id and gp.is_deleted = 0
inner join group_profile mo With (NoLock) on gp.managing_office_id = mo.group_profile_id
left outer join payor_vendor pv With (NoLock) on gp.payor_vendor_id = pv.payor_vendor_id
left outer join applicant_type at With (NoLock) on gp.applicant_type_id = at.applicant_type_id and at.is_foster_home = 1
inner join group_status_view gsv With (NoLock) on gp.group_profile_id = gsv.group_profile_id and gsv.status_value = 'OPEN' and gsv.effective_date =
(Select max(b.effective_date) from group_status_view b With (NoLock)
where gp.group_profile_id = b.group_profile_id)
left outer join regions With (NoLock) on isnull(mo.regions_id, gp.regions_id) = regions.regions_id
left join enrollment en on en.group_profile_id = gp.group_profile_id
join event_log el on el.event_log_id = en.event_log_id
left join people client on client.people_id = el.people_id
left join race With (NoLock) on el.people_id = race.people_id
left join race_info ri with (nolock) on ri.race_info_id = race.race_info_id
The above query results in all of the same profile_names, but some with NULL race values:
group_profile_id profile_name license_number is_accepting is_accepting_placement managing_office region vendor_name vendor_id applicant_type Office Address status_description Cert Date2 race
Unfortunately it complicates matters that I need to join in 2 additional tables for this one additional field value (race). If I simply change the last two LEFT JOINs above to INNER JOINs then I eliminate the NULL rows above. But I also eliminate some of the profile_names:
group_profile_id profile_name license_number is_accepting is_accepting_placement managing_office region vendor_name vendor_id applicant_type Office Address status_description Cert Date2 race
Hopefully I have provided all of the details that you need for this question.
Not the most elegant solution, but one that will work:
select [stuff]
from group_profile gp With (NoLock)
inner join group_profile_type gpt With (NoLock) on gp.group_profile_type_id = gpt.group_profile_type_id and gpt.type_code = 'FOSTERHOME' and gp.agency_id = #agency_id and gp.is_deleted = 0
inner join group_profile mo With (NoLock) on gp.managing_office_id = mo.group_profile_id
join payor_vendor pv on ISNULL(gp.payor_vendor_id, 'THISVALUEWILLNEVEROCCUR') = ISNULL(pv.payor_vendor_id, 'THISVALUEWILLNEVEROCCUR')
...etc...
Biggest issue with what I posted is that you'll be doing a whole lot of table scans.