Distributing the number of records in PostgreSQL cursors - postgresql

I have a stored procedure as below:
CREATE OR REPLACE FUNCTION DELETE_REDUNDANT_RECORDS_STORED_PROCEDURE
RETURNS void AS
$func$
DECLARE
interval_time BIGINT DEFAULT 0;
min_time BIGINT DEFAULT 0;
max_time BIGINT DEFAULT 0;
rec_old RECORD;
rec_new RECORD;
rec_start RECORD;
cursor_file CURSOR FOR
SELECT distinct filename,systemuid FROM BOOKMARK.MONITORING_TESTING;
cursor_data CURSOR FOR
SELECT * FROM BOOKMARK.MONITORING_TESTING WHERE filename = v_filename AND systemuid=v_systemuid ORDER BY mindatetime, maxdatetime;
BEGIN
-- Use cursors for iteration
-- Business logic to delete and update the table records based on certain conditions
END;
$func$
LANGUAGE plpgsql;
The distinct query returns around a million records and is used for iteration on another cursor.
I want to distribute these million records into configurable chunks of data like for example 200k records each till all the records are read.
How can I achieve such functionality within my stored procedure?

You can add a window function call to the cursor's SELECT list:
(row_number() OVER ()) / 10000 AS chunk
That will add a number that you can use to split the result into chunks of 10000.

Related

Fetch records as batches in Postgres using a cursor

I need to fetch data as 100 of records batches from a PostgresSQL table. I have tried the following function,
CREATE OR REPLACE FUNCTION fetch_compare_prices(n integer)
RETURNS SETOF varchar AS $$
DECLARE
curs CURSOR FOR SELECT * FROM compareprices LIMIT n;
row RECORD;
BEGIN
open curs;
LOOP
FETCH FROM curs INTO row;
EXIT WHEN NOT FOUND;
return next row.deal_id;
END LOOP;
END; $$ LANGUAGE plpgsql;
I ran this query to get results from the above function.select fetch_compare_prices(100); But this gives me only same 100 records always. Is there a way to fetch 100 records as batches using a cursor.
also with this return next row.deal_id; statement I can only return just the deal_id but no other columns. Is there a way to get all the columns/row?
Also, It should work like when I run select fetch_compare_prices(100); this for the 1st time, it should return first 100 rows, when I run it 2nd time, it should give rows from 100 to 200(next 100). What's the correct usage of cursor to do this?

how to delete 10 row from a spedific index from a table in PostgreSQL

I'm new in PostgreSQL. Assume that I have a table (tbl_box) with thousands of records and it is growing, I want to delete 10 rows from a specific index (for example I want to delete 10 records from 50th row to 59th row) I wrote a function
You can see below:
- Function: public.signalreject()
-- DROP FUNCTION public.signalreject();
CREATE OR REPLACE FUNCTION public.signalreject()
RETURNS void AS
$BODY$
DECLARE
rec RECORD;
cur CURSOR
FOR SELECT barcode,id
FROM tbl_box where gf is null order by id desc;
counter int ;
BEGIN
-- Open the cursor
OPEN cur;
counter:=0;
LOOP
-- fetch row into the rec
FETCH cur INTO rec;
-- exit when no more row to fetch
EXIT WHEN NOT FOUND;
counter :=counter+1;
-- build the output
IF counter >= 50 and counter < 60 THEN
delete from tbl_box where barcode = rec.barcode;
END IF;
END LOOP;
-- Close the cursor
CLOSE cur;
END; $BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION public.signalreject()
OWNER TO Morteza;
I found that the cursor consumes memory and has a high CPU usage. What else except cursor you guys suggest me?
Is this a good way to do this?
I need the fastest way because it is important for me to delete in a shortest time.
This seems pretty elaborate, why not just do
delete from tbl_box
where barcode in
( select barcode
from tbl_box
where gf is null
order by id desc limit 10 offset 49
);
assuming that barcode is unique. We skip 49 rows to start deleting 10 rows from row 50.

Plpgsql - Iterate over a recordset multiple times

I have a table with series of months with cumulative activity e.g.
month | activity
Jan-15 | 20
Feb-15 | 22
I also have a series of thresholds in another table e.g. 50, 100, 200. I need to get the date when the threshold is reached i.e. activity >= threshold.
The way I thought of doing this is to have a pgsql function that reads in the thresholds table, iterates over that cursor and reads in the months table to a cursor, then iterating over those rows working out the month where the threshold is reached. For performance reasons, rather than selecting all rows in the months table each time, I would then go back to the first row in the cursor and re-iterate over with the new value from the thresholds table.
Is this a sensible way to approach the problem? This is what I have so far - I am getting a
ERROR: cursor "curs" already in use error.
CREATE OR REPLACE FUNCTION schema.function()
RETURNS SETOF schema.row_type AS
$BODY$
DECLARE
rec RECORD;
rectimeline RECORD;
notification_threshold int;
notification_text text;
notification_date date;
output_rec schema.row_type;
curs SCROLL CURSOR FOR select * from schema.another_function_returning_set(); -- this is months table
curs2 CURSOR FOR select * from schema.notifications_table;
BEGIN
OPEN curs;
FOR rec IN curs2 LOOP
notification_threshold := rec.threshold;
LOOP
FETCH curs INTO rectimeline; -- this line seems to be the problem - not sure why cursor is closing
IF notification_threshold >= rectimeline.activity_total THEN
notification_text := rec.housing_notification_text;
notification_date := rectimeline.active_date;
SELECT notification_text, notification_date INTO output_rec.notification_text, output_rec.notification_date;
MOVE FIRST from curs;
RETURN NEXT output_rec;
END IF;
END LOOP;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
select distinct on (t.threshold) *
from
thresholds t
inner join
months m on t.threshold < m.activity
order by t.threshold desc, m.month

postgres count from table efficient way

In my application we are using postgresql,now it has one million records in summary table.
When I run the following query it takes 80,927 ms
SELECT COUNT(*) AS count
FROM summary_views
GROUP BY question_id,category_type_id
Is there any efficient way to do this?
COUNT(*) in PostgreSQL tends to be slow. It's a feature of MVCC. One of the workarounds of the problem is a row counting trigger with a helper table:
create table table_count(
table_count_id text primary key,
rows int default 0
);
CREATE OR REPLACE FUNCTION table_count_update()
RETURNS trigger AS
$BODY$
begin
if tg_op = 'INSERT' then
update table_count set rows = rows + 1
where table_count_id = TG_TABLE_NAME;
elsif tg_op = 'DELETE' then
update table_count set rows = rows - 1
where table_count_id = TG_TABLE_NAME;
end if;
return null;
end;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
Next step is to add proper trigger declaration for each table you'd like to use it with. For example for table tab_name:
begin;
insert into table_count values
('tab_name',(select count(*) from tab_name));
create trigger tab_name_table_count after insert or delete
on tab_name for each row execute procedure table_count_update();
commit;
It is important to run in a transaction block to keep actual count and helper table in sync in case of delete or insert between initial count and trigger creation. Transaction guarantees this. From now on to get current count instantly, just invoke:
select rows from table_count where table_count_id = 'tab_name';
Edit: In case of your group by clause, you'll need more sophisticated trigger function and count table.

Stored function with temporary table in postgresql

Im new to writing stored functions in postgresql and in general . I'm trying to write onw with an input parameter and return a set of results stored in a temporary table.
I do the following in my function .
1) Get a list of all the consumers and store their id's stored in a temp table.
2) Iterate over a particular table and retrieve values corresponding to each value from the above list and store in a temp table.
3)Return the temp table.
Here's the function that I've tried to write by myself ,
create or replace function getPumps(status varchar) returns setof record as $$ (setof record?)
DECLARE
cons_id integer[];
i integer;
temp table tmp_table;--Point B
BEGIN
select consumer_id into cons_id from db_consumer_pump_details;
FOR i in select * from cons_id LOOP
select objectid,pump_id,pump_serial_id,repdate,pumpmake,db_consumer_pump_details.status,db_consumer.consumer_name,db_consumer.wenexa_id,db_consumer.rr_no into tmp_table from db_consumer_pump_details inner join db_consumer on db_consumer.consumer_id=db_consumer_pump_details.consumer_id
where db_consumer_pump_details.consumer_id=i and db_consumer_pump_details.status=$1--Point A
order by db_consumer_pump_details.consumer_id,pump_id,createddate desc limit 2
END LOOP;
return tmp_table
END;
$$
LANGUAGE plpgsql;
However Im not sure about my approach and whether im right at the points A and B as I've marked in the code above.And getting a load of errors while trying to create the temporary table.
EDIT: got the function to work ,but I get the following error when I try to run the function.
ERROR: array value must start with "{" or dimension information
Here's my revised function.
create temp table tmp_table(objectid integer,pump_id integer,pump_serial_id varchar(50),repdate timestamp with time zone,pumpmake varchar(50),status varchar(2),consumer_name varchar(50),wenexa_id varchar(50),rr_no varchar(25));
select consumer_id into cons_id from db_consumer_pump_details;
FOR i in select * from cons_id LOOP
insert into tmp_table
select objectid,pump_id,pump_serial_id,repdate,pumpmake,db_consumer_pump_details.status,db_consumer.consumer_name,db_consumer.wenexa_id,db_consumer.rr_no from db_consumer_pump_details inner join db_consumer on db_consumer.consumer_id=db_consumer_pump_details.consumer_id where db_consumer_pump_details.consumer_id=i and db_consumer_pump_details.status=$1
order by db_consumer_pump_details.consumer_id,pump_id,createddate desc limit 2;
END LOOP;
return query (select * from tmp_table);
drop table tmp_table;
END;
$$
LANGUAGE plpgsql;
AFAIK one can't declare tables as variables in postgres. What you can do is create one in your funcion body and use it thourough (or even outside of function). Beware though as temporary tables aren't dropped until the end of the session or commit.
The way to go is to use RETURN NEXT or RETURN QUERY
As for the function result type I always found RETURNS TABLE to be more readable.
edit:
Your cons_id array is innecessary, just iterate the values returned by select.
Also you can have multiple return query statements in a single function to append result of the query to the result returned by function.
In your case:
CREATE OR REPLACE FUNCTION getPumps(status varchar)
RETURNS TABLE (objectid INTEGER,pump_id INTEGER,pump_serial_id INTEGER....)
AS
$$
BEGIN
FOR i in SELECT consumer_id FROM db_consumer_pump_details LOOP
RETURN QUERY(
SELECT objectid,pump_id,pump_serial_id,repdate,pumpmake,db_consumer_pump_details.status,db_consumer.consumer_name,db_consumer.wenexa_id,db_consumer.rr_no FROM db_consumer_pump_details INNER JOIN db_consumer ON db_consumer.consumer_id=db_consumer_pump_details.consumer_id
WHERE db_consumer_pump_details.consumer_id=i AND db_consumer_pump_details.status=$1
ORDER BY db_consumer_pump_details.consumer_id,pump_id,createddate DESC LIMIT 2
);
END LOOP;
END;
$$
edit2:
You probably want to take a look at this solution for groupwise-k-maximum problem as that's exactly what you're dealing with here.
it might be easier to just return a table (or query)
CREATE FUNCTION extended_sales(p_itemno int)
RETURNS TABLE(quantity int, total numeric) AS $$
BEGIN
RETURN QUERY SELECT quantity, quantity * price FROM sales
WHERE itemno = p_itemno;
END;
$$ LANGUAGE plpgsql;
(copied from postgresql docs)