Optimising hql for better performance - postgresql

I came across some legacy hql. This query is taking around 150 ms . As you can see the code is quite complex and almost unreadable not to mention the performance issue.
select distinct pub.id, ret.sector.id, ret.poiSector.id, ret.id, man.id, 0
from Publisher pub
left join pub.retailer ret
left join pub.manufacturer man
where (
pub.id in (
select publisher_id from Store AS s
where s.lat >= 52.297382 and s.lat <= 52.746616 and s.lng >= 13.047624 and s.lng <= 13.785276
group by 1
having min( sqrt( pow(111.3*(s.lat - 52.522), 2) + pow(67.7*(s.lng - 13.416), 2) ) ) is null or min( sqrt( pow(111.3*(s.lat - 52.522), 2) + pow(67.7*(s.lng - 13.416), 2) ) ) < 25
)
or
(man is not null)
)
and (ret is null or ret.hidden is false)
group by 1, 2, 3, 4, 5, 6
I am planning to break up the code segments as well as trying to modify this to increase performance . My question is how can I break this in an efficient way ? will this help in some way ? I dont have full visibility of the system so this are the information I can give you for now .

Related

DB2 - SQL0347W The recursive common table expression \"IINYGBKY.TBNEW\" may contain an infinite loop. SQLSTATE=01605

with tbNew(sts) as (
select
s
FROM
(values(timestamp('${startDate}','00:00:00'))) t(s)
union all
select
sts + ${period} SECONDS
FROM
tbNew
WHERE
sts + ${period} SECONDS < timestamp('${endDate}','23:59:59'))
)
select
sts AS dummy_interval
FROM
tbNew
the above query is running find in dbeaver but throwing error in code:
The recursive common table expression "IINYGBKY.TBNEW" may contain an infinite loop. SQLSTATE=01605
Not able to find out what I am doing wrong in here.
You can adjust your code to something like:
with tbNew(sts, n) as (
select
s, 0
FROM
(values(timestamp('${startDate}','00:00:00'))) t(s)
union all
select
sts + ${period} SECONDS, n+1
FROM
tbNew
WHERE
sts + ${period} SECONDS < timestamp('${endDate}','23:59:59'))
AND n<100000
)
select
sts AS dummy_interval
FROM
tbNew
You can for what its worth simplify your base case to:
with tbNew(sts, n) as (
(values(timestamp('${startDate}','00:00:00')), 0)
union all
...

I keep getting a unexpected select error in my snosql statement

I keep getting an unexpected select error as well as an unexpected ON error in rows 61 AND 64 in my snowsql statement.
Not sure why if anyone can help that would be great. I've added the portion of my snowsql statement below.
I'm trying to use a select statement within a where clause is there a way to do this?
AS select
t1.sunday_date,
t1.sunday_year_month,
t1.sunday_month,
t1.dc,
t1.source_sku,
t1.Product_Family,
t1.Product_type,
t1.Product_Subtype,
t1.Material,
t1.Color,
t1.Size,
t1.EOL_Date,
t1.NPI_Date,
t1.period_start,
t1.period_month,
IIF( t4.period_start < t1.sunday_date, iif(ISNULL(ta.actual_quantity), 0, ta.actual_quantity),
IIF(ISNULL(tfc.SOPFCSTOVERRIDE ), iif(ISNULL(tf.Period_Start), 0, tf.dc_forecast) , tfc.SOPFCSTOVERRIDE
)) AS forecast_updated,
iif(ISNULL(tf.Period_Start),t4.period_start,tf.Period_Start) AS period_start_forecast,
iif(ISNULL(ti.VALUATED_UNRESTRICTED_USE_STOCK), 0, ti.VALUATED_UNRESTRICTED_USE_STOCK) AS inventory_quantity,
iif(ISNULL(ti.HCI_DS_KEYFIGURE_QUANTITY), 0, ti.HCI_DS_KEYFIGURE_QUANTITY) AS in_transit_quantity,
iif(ISNULL(ti.planned_quantity), 0, ti.planned_quantity) AS inbound_quantity,
iif(ISNULL(tbac.backlog_ecomm ), 0, tbac.backlog_ecomm) + iif(ISNULL(tbac_sap.backlog_sap_open), 0, tbac_sap.backlog_sap_open) AS backlog_quantity,
iif(ISNULL(ta.actual_quantity), 0, ta.actual_quantity) AS actual_quantity,
iif(ISNULL(tso.open_orders), 0, tso.open_orders) AS open_orders,
iif(ISNULL(tf.Period_Start), 0, tf.dc_forecast) AS forecast,
tfc.SOPFCSTOVERRIDE AS forecast_consumption,
iif(ISNULL(tpc.SHIP_DATE), 0, tpc.SHIP_DATE) AS production_current_week,
iif(ISNULL(tpc.SHIP_DATE), 0, tpc.SHIP_DATE) AS production_next_week,
NOW() AS updated_timestamp
FROM ( ( ( ( ( ( ( ( (
SELECT
e.sunday_date,
e.sunday_month,
e.sunday_year_month,
d.dc,
c.SOURCE_SKU,
c.Product_Family,
c.Product_Type,
c.Product_Subtype,
c.Material,
c.Color,
c.Size,
c.EOL_Date,
c.NPI_Date,
b.period_start,
b.period_month
FROM
(SELECT sunday_date, sunday_month, sunday_year_month FROM bas_report_date) AS e,
(SELECT distinct Week_Date AS period_start, DateSerial('445_Year','445_Month',1) AS period_month from inv_bas_445_Month_Alignment) AS b,
(SELECT source_sku AS source_sku, Product_Family, Product_Type, Product_Subtype, Material, Color, Size, EOL_Date, NPI_Date from inv_vw_product_dev ) AS c,
(SELECT dc AS dc FROM inv_bas_dc_site_lookup) AS d
WHERE b.period_start >=
( select
MIN(mt.Reference_Date )
FROM BAS_report_date tr
INNER JOIN inv_bas_445_Month_Alignment mt ON tr.sunday_month = DateSerial(mt.'445_Year',mt.'445_Month,1')
)
AND b.period_start <= DateAdd("ww", 26,e.sunday_date)
) t1
LEFT JOIN
(
SELECT
MATERIAL_NUMBER,
CINT(LOCATION_NUMBER) AS Int_Location_ID,
HCI_DS_KEYFIGURE_DATE,
HCI_DS_KEYFIGURE_QUANTITY,
PLANNED_QUANTITY,
VALUATED_UNRESTRICTED_USE_STOCK
FROM inv_vw_ibp_transit_inventorry_dev
) ti
You can replace the DateSerial() function
(which is from VBA / MS Access / Excel from the Microsoft universe)
with DATE_FROM_PARTS().
DATE_FROM_PARTS() also supports the non-obvious functionality of DateSerial():
DateSerial(2020, 1, 1 - 1) gets you New Year's Eve - the day before New Year's Day
DATE_FROM_PARTS(2020, 1 - 1, 1 - 1) is the month before the day before New Year's Day
DATE_FROM_PARTS(y, m + 1, 0) is End Of Month (EOM).
etc., etc.

replace elements in a 2d array

Given a 2d array
select (ARRAY[[1,2,3], [4,0,0], [7,8,9]]);
{{1,2,3},{4,0,0},{7,8,9}}
Is there a way to replace the slice at [2:2][2:] (the {{0,0}}) with values 5 and 6? array_replace replaces a specific value so I'm not sure how to approach this.
I believe it's more readable to code a function in plpgsql. However, a pure SQL solution also exists:
select (
select array_agg(inner_array order by outer_index)
from (
select outer_index,
array_agg(
case
when outer_index = 2 and inner_index = 2 then 5
when outer_index = 2 and inner_index = 3 then 6
else item
end
order by inner_index
) inner_array
from (
select item,
1 + (n - 1) % array_length(a, 1) inner_index,
1 + (n - 1) / array_length(a, 2) outer_index
from
unnest(a) with ordinality x (item, n)
) _
group by outer_index
)_
)
from (
select (ARRAY[[1,2,3], [4,0,0], [7,8,9]]) a
)_;

Slow running query comparing dates

Could someone please offer some advice. I have the following query that is using roughly 200,000 records. I need to evaluate a 'DateTime' field to evaluate if the revenue occurs during the correct time slot. I am currently using CASE statements to evaluate the DateTime field and it is an absolute pig, it runs over 5 minutes. Is there a faster more efficient way to do this? Note the variables #cur_date, #end_date, #prev_yr_qtr_start, #cur_date_yr_prev etc are all strings and r.pw_ship_date is of type DATETIME. So in essence I'm comparing r.pw_ship_date to strings ie:'2017-01-01 00:00'
Note: it took 4:00 minutes to run this query when I added 'SELECT TOP(500)' for 200,000 records it would take forever.
Thanks in advance
DECLARE #total TABLE
(
acct_number VARCHAR(50),
pro_nbr VARCHAR(50),
sales_rep VARCHAR(50),
bill_to_name VARCHAR(50),
billing_addr1 VARCHAR(50),
billing_addr2 VARCHAR(50),
billing_city CHAR(50),
billing_state CHAR(2),
billing_zip CHAR(10),
cur_month_bills INT,
cur_month_rev DECIMAL(30, 6),
cur_qtr_bills INT,
cur_qtr_rev DECIMAL(30, 6),
prev_yr_qtr_bills INT,
prev_yr_qtr_rev DECIMAL(30, 6),
cur_ytd_bills INT,
cur_ytd_rev DECIMAL(30, 6),
prev_ytd_bills INT
)
INSERT INTO #total
SELECT TOP(50000) f.acct_number ,
r.pro_nbr ,
r.sales_rep ,
r.bill_to_name ,
r.billing_addr1 ,
r.billing_addr2 ,
r.billing_city ,
r.billing_state ,
r.billing_zip ,
'cur_month_bills' = MAX(( CASE WHEN r.pw_ship_date BETWEEN #cur_date AND #end_date THEN 1 ELSE 0 END )) ,
'cur_month_rev' = MAX(ROUND(( CASE WHEN r.pw_ship_date BETWEEN #cur_date AND #end_date THEN f.tot_revenue ELSE 0 END ), 2)) ,
'cur_qtr_bills' = MAX((CASE WHEN r.pw_ship_date BETWEEN #cur_date AND #end_date THEN 1 ELSE 0 END )) ,
'cur_qtr_rev' = MAX(ROUND(CASE WHEN r.pw_ship_date BETWEEN #cur_date AND #end_date THEN f.tot_revenue ELSE 0 END, 2)) ,
'prev_yr_qtr_bills' = MAX(CASE WHEN r.pw_ship_date BETWEEN #prev_yr_qtr_start AND #cur_date_yr_prev THEN 1 ELSE 0 END ) ,
'prev_yr_qtr_rev' = MAX(ROUND(CASE WHEN r.pw_ship_date BETWEEN #prev_yr_qtr_start AND #cur_date_yr_prev THEN f.tot_revenue ELSE 0 END , 2)) ,
'cur_ytd_bills' = MAX(CASE WHEN r.pw_ship_date BETWEEN #first_day_cur_yr AND #end_date THEN 1 ELSE 0 END ),
'cur_ytd_rev' = MAX(ROUND(CASE WHEN r.pw_ship_date BETWEEN #first_day_cur_yr AND #end_date THEN f.tot_revenue ELSE 0 END , 2)) ,
'prev_ytd_bills' = MAX(CASE WHEN r.pw_ship_date BETWEEN #first_day_prev_yr AND #end_date THEN 1 ELSE 0 END )
FROM #summed f
INNER JOIN #raw r ON f.acct_number = r.acct_number AND f.pro_nbr = r.pro_nbr
GROUP BY f.acct_number ,
r.pro_nbr ,
r.sales_rep ,
r.bill_to_name ,
r.billing_addr1 ,
r.billing_addr2 ,
r.billing_city ,
r.billing_state ,
r.billing_zip;
Change your table variables #raw and #summed to temporary tables. Table variables have no statistics and are extremely limited with regard to indexing (you can only have one). Because of this, SQL Server assumes that your table variables have only one row (2012 and older) or 100 rows (2014+). This means that you almost certainly are getting a bad execution plan for your query, and that's going to ruin you.
Once you've changed #raw and #summed into #raw and #summed, put an index on them - at a minimum, index your foreign keys (the fields you're joining on), acct_number and pro-nbr. It may be worth creating a clustered index and/or a primary key as well, but that's something you'll need to experiment with to find the performance you require.
The other thing that is killing your performance is comparing datetimes to strings. This is causing a type conversion and that can drag you down significantly. If you're working with a date/time, use the appropriate data type - not a string that looks like a date.
If this is still not running quickly enough, move your CASE statements out of your aggregate functions.
MAX(( CASE WHEN r.pw_ship_date BETWEEN #cur_date AND #end_date THEN 1 ELSE 0 END ))
Move the CASE statement into the query that populates #raw.pw_ship_date so that when you're performing the aggregate, you're just looking at integers all the way down.

Return values for X and Y where X-Y = Max(X-Y)

(SQL SERVER 2005)
I have a table of multiple products that relate to an ItemCode. I can establish the best saving using the query below (I think) but what I need to include are the RRP and SellingPrice fields for the combination that provides the best saving.
Apologies in advance this is probably a common issue but I can't find a solution that fits.
SELECT ItemCode, MAX(RRP - [SellingPrice]) AS BestSaving
FROM ItemCodePricingDetail
WHERE ([ProductGroup] = N'SHOES') AND ([Stock Flag] = N'Y')
AND (RRP > 0) AND ([SellingPrice] > 0)
GROUP BY ItemCode
Many Thanks
select * from ItemCodePricingDetail
JOIN
(
SELECT ItemCode, MAX(RRP - [SellingPrice]) AS BestSaving
FROM ItemCodePricingDetail
WHERE ([ProductGroup] = N'SHOES') AND ([Stock Flag] = N'Y')
AND (RRP > 0) AND ([SellingPrice] > 0)
GROUP BY ItemCode
) as t1 on ItemCodePricingDetail.ItemCode=t1.ItemCode
and RRP - [SellingPrice]= t1.BestSaving