Postgresql Partition - Funtion calling in Select Query - is slow

Postgresql Partition - Funtion calling in Select Query - is slow - postgresql

Our system is a SAAS based system we use ClientID as a Masking for data fetching.
The DB load is based on the Size of the Company. So we partitioned the DB based on ClientID
Example: Before Partition
clienttable
clientid
clientname
clientaddress
1
ABC
...
2
EMN
...
3
XYZ
...
employeetable
clientid
employeeid
employeename
1
123
AAA
1
124
BBB
2
125
CCC
2
126
DDD
3
127
EEEE
jobtable
clientid
jobid
jobname
1
234
YTR
1
235
DER
2
236
SWE
3
237
VFT
3
238
GHJ
Example: After Partition
clienttable
clientid
clientname
clientaddress
1
ABC
...
2
EMN
...
3
XYZ
...
employeetable
employeetable_1
clientid
employeeid
employeename
1
123
AAA
1
124
BBB
employeetable_2
clientid
employeeid
employeename
2
125
CCC
2
126
DDD
employeetable_3
clientid
employeeid
employeename
3
127
EEE
jobtable
jobtable_1
clientid
jobid
jobname
1
234
YTR
1
235
DER
jobtable_2
clientid
jobid
jobname
2
236
SWE
jobtable_3
clientid
jobid
jobname
3
237
VFT
3
238
GHJ
When we write select queries:
Select employeeid,employeename from employeetable where clientid=2;
This query runs faster after partition. The problem we face is we have some user defined function to manipulate some data.
CREATE OR REPLACE FUNCTION GET_JOB_COUNT(NUMERIC, NUMERIC) RETURNS NUMERIC AS $BODY$
DECLARE
p_client_id ALIAS FOR $1;
p_employee_id ALIAS FOR $2;
v_is_count NUMERIC := 0;
BEGIN
SELECT COUNT(JOB_ID) INTO v_is_count FROM JOBTABLE where CLIENTID=p_client_id AND CREATEDBY=p_employee_id;
RETURN v_is_count;
END; $BODY$
LANGUAGE plpgsql;
Select employeeid,employeename,GET_JOB_COUNT(2,employeeid) from employeetable where clientid=2;
This query is slow after partition. Does this means the GET_JOB_COUNT function is run across Partition?
Is that the problem, then we can't use Functions like this in Select query after partition?

The function will be called once for each and every row from the employeetable (that is selected through the WHERE clause). I doubt you can improve the performance in any significant way using that approach.
It's better to do the aggregation (=count) for all rows at once, rather than for each row separately:
select e.employeeid, employeename, t.cnt
from employeetable e
left join (
select clientid, createdby, count(job_id) as cnt
from jobtable
group by client_id, created_by
) j on j.clientid = e.clientid and j.createdby = e.employeeid
where e.clientid = 2;
Another option to try is to use a lateral join to eliminate rows from the jobtable early - I am not sure if the optimizer is smart enough for that in the query above. So you can try this as an alternative:
select e.employeeid, employeename, j.cnt
from employeetable e
left join lateral (
select count(jt.job_id) as cnt
from jobtable jt
where jt.clientid = e.clientid
and jtcreatedby = e.employeeid
) j on true
where e.clientid = 2;
If you really do want to stick with the function, maybe making it a SQL function helps the optimizer. It at least removes the overhead of calling PL/pgSQL code:
CREATE OR REPLACE FUNCTION get_job_count(p_client_id numeric, p_employee_id numeric)
returns bigint
as
$body$
SELECT COUNT(JOB_ID)
FROM JOBTABLE
where CLIENTID = p_client_id
AND CREATEDBY = p_employee_id;
$BODY$
LANGUAGE sql
stable
parallel safe;
But I doubt that you will see a substantial improve by that.
As a side not: using numeric for an "ID" column seems like a rather strange choice. Why aren't you using int or bigint for that?

Related

split one row to multiple record in redshift

I have following table,
_________________
|id key1 key2 key3|
------------------
1 101 102 103
2 201 202 203
I need a query which will create the following output,
|id key|
--------
1 101
1 102
1 103
2 201
2 202
2 203
Is there anything other than union all? When I used "union all", i came across an error disk utilization full... I have billions of records.

Since the question is tagged Oracle, you could do:
SELECT id, key
FROM table_name
UNPIVOT INCLUDE NULLS ( key FOR key_name IN ( key1, key2, key3 ) );

union all is very efficient on Redshift. Doubt there's anything much better.
create table new_table as
select id, key1 as key from old_table
union all
select id, key2 as key from old_table
union all
select id, key3 as key from old_table
If you want to try something like Mottor suggests you can replace the Oracle specific dual connect by level bit with a Redshift hack like so:
(select row_number() over (order by true) as l from stv_blocklist limit 3) b
The stv_blocklist table reference there could be any table with at least 3 rows.

select a.id, case b.l when 1 then a.key1 when 2 then a.key2 else a.key3 end key
from mytable a
cross join (select level l from dual connect by level < 4) b
order by 1,2

Remove Duplicates from Employees Self Join

I have an employees table where all employees are located. I need to extract a subset of the employees with their corresponding supervisor. The table looks similar to this:
Emp_id | F_name | L_name | Superv_id | Superv_flg
---------------------------------------------------
123 john doe 456 N
456 jane doe 278 Y
234 Jack smith 268 N
My query looks like this so far:
with cte as
(
select f_name + ' ' l_name as supervisor, superv_id, emp_id
from [dbo].[SAP_worker_all]
where supvr_flag = 'Y'
)
SELECT distinct w.[first_name]
,w.[last_name]
,cte.supervisor
FROM [dbo].[SAP_worker_all] w
join cte
on w.[superv_id] = cte.[superv_id];
I am getting duplicate values and the supervisors returned are not the correct values. What did I do wrong?

if empID is unique you should not have duplicates
SELECT w.*, s.*
FROM [SAP_worker_all] w
JOIN [SAP_worker_all] s
ON s.[Emp_id] = w.[Superv_id]
AND s.[Superv_flg] = 'Y'

Select top three values in each group

following is my sample table and rows
create table com (company text,val int);
insert into com values ('com1',1),('com1',2),('com1',3),('com1',4),('com1',5);
insert into com values ('com2',11),('com2',22),('com2',33),('com2',44),('com2',55);
insert into com values ('com3',111),('com3',222),('com3',333),('com3',444),('com3',555);
I want to get the top 3 value of each company, expected output is :
company val
---------------
com1 5
com1 4
com1 3
com2 55
com2 44
com2 33
com3 555
com3 444
com3 333

Try This:
SELECT company, val FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY
company order by val DESC) AS Row_ID FROM com
) AS A
WHERE Row_ID < 4 ORDER BY company
--Quick Demo Here...

Since v9.3 you can do a lateral join
select distinct com_outer.company, com_top.val from com com_outer
join lateral (
select * from com com_inner
where com_inner.company = com_outer.company
order by com_inner.val desc
limit 3
) com_top on true
order by com_outer.company;
It might be faster but, of course, you should test performance specifically on your data and use case.

You can try arrays, which are available since Postgres v9.0.
WITH com_ordered AS (SELECT * FROM com ORDER BY company,val DESC)
SELECT company,unnest((array_agg(val))[0:3])
FROM com_ordered GROUP BY company;

DB2 - update increment based on timestamp

After a complex operation (some database merge-ing) I have a table that needs to be updated based on timestamp.
JobsTable
Id Time_stamp Resource RunNumber
121 1 A 1
122 2 A 1
123 3 B 1
124 4 B 1
125 5 A 2
The point is to Update the RunNumber column incrementally for each resource based on timestamp. So in the end the expected result is:
Id Time_stamp Resource RunNumber
121 1 A 1
122 2 A 2 //changed
123 3 B 1
124 4 B 2 //changed
125 5 A 3 //changed
I tried doing this in multiple ways. Since DB2 update does not support Join or With statements I tried something like:
update JOBSTABLE JT
SET RunNumber =
(SELECT RunNumber
FROM (Select ID, ROW_NUMBER() OVER (ORDER BY TIME_STAMP ) RunNumber from JobsTable, ORDER BY TIME_STAMP) AS AAA
WHERE AAA.ID = JT.ID)
WHERE ID = ?
Error:
Assignment of a NULL value to a NOT NULL column "TBSPACEID=6, TABLEID=16, COLNO=2" is not allowed.. SQLCODE=-407, SQLSTATE=23502, DRIVER=3.64.82 SQL Code: -407, SQL State: 23502
Is this even possible? (I am aiming at doing this operation in a single query rather than using Cursors, etc..)
Thank you

Firstly, your subselect has a syntax error, which tells me it's not the exact statement that you are trying to run. The error message is pretty clear -- in your actual statement the subselect sometimes returns NULL.
Secondly, you should probably be numbering rows within a partition by resource.
Thirdly, you could probably do with a single subselect anyway -- this is based on the statement you published:
update JOBSTABLE JT
SET RunNumber =
(SELECT ROW_NUMBER() OVER (partition by resource ORDER BY TIME_STAMP )
from JobsTable where id = JT.ID)

DB2 query group by id but with max of date and max of sequence

My table is like
ID FName LName Date(mm/dd/yy) Sequence Value
101 A B 1/10/2010 1 10
101 A B 1/10/2010 2 20
101 X Y 1/2/2010 1 15
101 Z X 1/3/2010 5 10
102 A B 1/10/2010 2 10
102 X Y 1/2/2010 1 15
102 Z X 1/3/2010 5 10
I need a query that should return 2 records
101 A B 1/10/2010 2 20
102 A B 1/10/2010 2 10
that is max of date and max of sequence group by id.
Could anyone assist on this.

-----------------------
-- get me my rows...
-----------------------
select * from myTable t
-----------------------
-- limiting them...
-----------------------
inner join
----------------------------------
-- ...by joining to a subselection
----------------------------------
(select m.id, m.date, max(m.sequence) as max_seq from myTable m inner join
----------------------------------------------------
-- first group on id and date to get max-date-per-id
----------------------------------------------------
(select id, max(date) as date from myTable group by id) y
on m.id = y.id and m.date = y.date
group by id) x
on t.id = x.id
and t.sequence = x.max_seq
Would be a simple solution, which does not take account of ties, nor of rows where sequence is NULL.
EDIT: I've added an extra group to first select max-date-per-id, and then join on this to get max-sequence-per-max-date-per-id before joining to the main table to get all columns.

I have considered your table name as employee..
check the below thing helped you.
select * from employee emp1
join (select Id, max(Date) as dat, max(sequence) as seq from employee group by id) emp2
on emp1.id = emp2.id and emp1.sequence = emp2.seq and emp1.date = emp2.dat

I'm a fan of using the WITH clause in SELECT statements to organize the different steps. I find that it makes the code easier to read.
WITH max_date(max_date)
AS (
SELECT MAX(Date)
FROM my_table
),
max_seq(max_seq)
AS (
SELECT MAX(Sequence)
FROM my_table
WHERE Date = (SELECT md.max_date FROM max_date md)
)
SELECT *
FROM my_table
WHERE Date = (SELECT md.max_date FROM max_date md)
AND Sequence = (SELECT ms.max_seq FROM max_seq ms);
You should be able to optimize this further as needed.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgresql Partition - Funtion calling in Select Query - is slow - postgresql

Related

split one row to multiple record in redshift

Remove Duplicates from Employees Self Join

Select top three values in each group

DB2 - update increment based on timestamp

DB2 query group by id but with max of date and max of sequence

Categories

Resources