Returning the nth largest value in postgresql - postgresql

I want to return the nth largest salary in the table using for-loops, and I am still having trouble with the syntax. Please help.
create function get_nth_max(n integer)
returns real
as
$$
declare
nth_max real = max(salary) from company;
pay real;
begin
for pay in select salary from company
order by salary = desc limit n
loop
if pay.salary < nth_max then
nth_max = pay.salary and
end loop;
return nth_max
end;
$$
language plpgsql;
Error message:
ERROR: syntax error at or near "desc"
LINE 10: order by salary = desc limit n loop
^
SQL state: 42601
Character: 182

First you need to define "the nth largest salary in the table" clearly.
What about duplicates? If two people earn 1000 and one earns 800, is 800 then the 2nd or 3rd highest salary? Can salary be NULL? If so, ignore NULL values? What if there are not enough rows?
Assuming ...
Duplicate entries only count once - so 800 is considered 2nd in my example.
salary can be NULL. Ignore NULL values.
Return NULL or nothing (no row) if there are not enough rows.
Proper solution
SELECT salary
FROM (
SELECT salary, dense_rank() OVER (ORDER BY salary DESC NULLS LAST) AS drnk
FROM company
) sub
WHERE drnk = 8;
That's basically the same as Tim's answer, but NULLS LAST prevents NULL from coming first in descending order. Only needed if salary can be NULL, obviously, but never wrong. Best supported with an index using the same sort order DESC NULLS LAST. See:
Sort by column ASC, but NULL values first?
FOR loop as proof of concept
If you insist on using a FOR loop for training purposes, this would be the minimal correct form:
CREATE OR REPLACE FUNCTION get_nth_max(n integer, OUT nth_max numeric)
LANGUAGE plpgsql AS
$func$
BEGIN
FOR nth_max IN
SELECT DISTINCT salary
FROM company
ORDER BY salary DESC NULLS LAST
LIMIT n
LOOP
-- do nothing
END LOOP;
END
$func$;
Working with numeric instead of real, because we don't want to use a floating-point number for monetary values.
Your version does a lot of unnecessary work. You don't need the overall max, you don't need to check anything in the loop, just run through it, the last assignment will be the result. Still inefficient, but not as much.
Notably, SELECT get_nth_max(10) returns NULL if there are only 9 rows, while the above SQL query returns no row. A subtle difference that may be relevant. (You could devise a function with RETURNS SETOF numeric to return no row for no result ...)
Using an OUT parameter to shorten the syntax. See:
Returning from a function with OUT parameter

You don't need a UDF for this, just use DENSE_RANK:
WITH cte AS (
SELECT *, DENSE_RANK() OVER (ORDER BY salary DESC) drnk
FROM company
)
SELECT salary
FROM cte
WHERE drnk = <value of n here>

So I have missing semicolons, did not end my if-statement, and a 'dangling' AND. The following code is the working version:
create or replace function get_nth_max(n integer)
returns real
as
$$
declare
nth_max real = max(salary) from company;
pay record;
begin
for pay in select salary from company
order by salary desc limit n
loop
if pay.salary < nth_max then
nth_max = pay.salary;
end if;
end loop;
return nth_max;
end;
$$
language plpgsql;

Related

Execute multiple select query based on a query count output

I want to execute the query the below "test" table only once. But I'm querying to check for the count and if the count is more than 1, then I need to execute another query.
Any leads on how to store test table result into a variable and then return from that variable to the function output so that I query the "test"table only once?
Below is an example, as my Prod table has millions of records and getting into performance issue when I query multiple times :(
CREATE OR REPLACE FUNCTION lcdm_main.test(IN p_member_crn character varying)
RETURNS TABLE(fname character varying, lname character varying) AS
$BODY$
DECLARE
countRow integer Default 0;
BEGIN
PERFORM
t.fname, t.lastname
from test t where t.email =p_member_crn
;
GET DIAGNOSTICS countRow := ROW_COUNT;
if countRow > 1 then
RETURN QUERY select concat( error_msg_cd , error_msg_nm )::character varying , NULL:: character varying
from lcdm_main.error where error_Cndtn_nm ='SMRF';
ELSE
return query
select
t.fname, t.lastname
from test t where t.email =p_member_crn
;
END IF ;
END;
$BODY$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER
COST 100
ROWS 1000;
A similar question remain without a solution in below thread.
https://dba.stackexchange.com/questions/40214/postgresql-stored-procedure-to-return-rows-or-empty-set
Thanks everyone for different approach provided.
Edit.
I might have misunderstood and miss some things at first.
According to your example, i assume that you don't care about precise record count on "test" query. You only need firstname and lastname if returned one record, or execute and return different query if test returns more than 1 records, right? so why don't you wrap "test" query with another one, limit result set to 2, add column count(*) over() to check whether recordcount is more than 1. Insert result into simple variables and then use IF structure to return variables or other query. Limit 2 might as well improve query performance on "test" query reducing amounts of joins made between rows if no ordering is done at the end of the query. This way you run test query only once, get necessary values and reduce resource footprint used by this function. For example:
CREATE OR REPLACE FUNCTION lcdm_main.test(IN p_member_crn character varying)
RETURNS TABLE(fname character varying, lname character varying) AS
$BODY$
declare
vfname text;
vlastname text;
countRow integer;
BEGIN
select q1.fname,q1.lastname, count(*) over() as cc
from (
select t.fname, t.lastname
from testq t
where t.email = p_member_crn limit 2
)q1
into vfname,vlastname,countRow;
if countRow > 1 then
RETURN QUERY select concat( error_msg_cd , error_msg_nm )::character varying , NULL:: character varying
from lcdm_main.error where error_Cndtn_nm ='SMRF';
ELSE
fname := vfname;
lname := vlastname;
return next;
END IF ;
END;
$BODY$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER
COST 100
ROWS 1000;

Recursive with cursor on psql, nothing data found

How to use a recursive query and then using cursor to update multiple rows in postgresql. I try to return data but no data is found. Any alternative to using recursive query and cursor, or maybe better code please help me.
drop function proses_stock_invoice(varchar, varchar, character varying);
create or replace function proses_stock_invoice
(p_medical_cd varchar,p_post_cd varchar, p_pstruserid character varying)
returns void
language plpgsql
as $function$
declare
cursor_data refcursor;
cursor_proses refcursor;
v_medicalCd varchar(20);
v_itemCd varchar(20);
v_quantity numeric(10);
begin
open cursor_data for
with recursive hasil(idnya, level, pasien_cd, id_root) as (
select medical_cd, 1, pasien_cd, medical_root_cd
from trx_medical
where medical_cd = p_pstruserid
union all
select A.medical_cd, level + 1, A.pasien_cd, A.medical_root_cd
from trx_medical A, hasil B
where A.medical_root_cd = B.idnya
)
select idnya from hasil where level >=1;
fetch next from cursor_data into v_medicalCd;
return v_medicalCd;
while (found)
loop
open cursor_proses for
select B.item_cd, B.quantity from trx_medical_resep A
join trx_resep_data B on A.medical_resep_seqno = B.medical_resep_seqno
where A.medical_cd = v_medicalCd and B.resep_tp = 'RESEP_TP_1';
fetch next from cursor_proses into v_itemCd, v_quantity;
while (found)
loop
update inv_pos_item
set quantity = quantity - v_quantity, modi_id = p_pstruserid, modi_id = now()
where item_cd = v_itemCd and pos_cd = p_post_cd;
end loop;
close cursor_proses;
end loop;
close cursor_data;
end
$function$;
but nothing data found?
You have a function with return void so it will never return any data to you. Still you have the statement return v_medicalCd after fetching the first record from the first cursor, so the function will return from that point and never reach the lines below.
When analyzing your function you have (1) a cursor that yields a number of idnya values from table trx_medical, which is input for (2) a cursor that yields a number of v_itemCd, v_quantity from tables trx_medical_resep, trx_resep_data for each idnya, which is then used to (3) update some rows in table inv_pos_item. You do not need cursors to do that and it is, in fact, extremely inefficient. Instead, turn the whole thing into a single update statement.
I am assuming here that you want to update an inventory of medicines by subtracting the medicines prescribed to patients from the stock in the inventory. This means that you will have to sum up prescribed amounts by type of medicine. That should look like this (note the comments):
CREATE FUNCTION proses_stock_invoice
-- VVV parameter not used
(p_medical_cd varchar, p_post_cd varchar, p_pstruserid varchar)
RETURNS void AS $function$
UPDATE inv_pos_item -- VVV column repeated VVV
SET quantity = quantity - prescribed.quantity, modi_id = p_pstruserid, modi_id = now()
FROM (
WITH RECURSIVE hasil(idnya, level, pasien_cd, id_root) AS (
SELECT medical_cd, 1, pasien_cd, medical_root_cd
FROM trx_medical
WHERE medical_cd = p_pstruserid
UNION ALL
SELECT A.medical_cd, level + 1, A.pasien_cd, A.medical_root_cd
FROM trx_medical A, hasil B
WHERE A.medical_root_cd = B.idnya
)
SELECT B.item_cd, sum(B.quantity) AS quantity
FROM trx_medical_resep A
JOIN trx_resep_data B USING (medical_resep_seqno)
JOIN hasil ON A.medical_cd = hasil.idnya
WHERE B.resep_tp = 'RESEP_TP_1'
--AND hacil.level >= 1 Useless because level is always >= 1
GROUP BY 1
) prescribed
WHERE item_cd = prescribed.item_cd
AND pos_cd = p_post_cd;
$function$ LANGUAGE sql STRICT;
Important
As with all UPDATE statements, test this code before you run the function. You can do that by running the prescribed sub-query separately as a stand-alone query to ensure that it does the right thing.

Dynamic Limit in PostgreSQL

I have function returning list of Employees, My requirement is if i pass Limit to function than i should get result with limit and offset, If i don't pass limit than all the rows should be returned
for example
When Limit is greater than 0(I am passing Limit as 10)
Select * from Employees
Limit 10 offset 0
When Limit is equal to 0 than
Select * from Employees
Is their any way to do such logic in function?
Yes, you can pass an expression for the LIMIT and OFFSET clauses, which includes using a parameter passed in to a function.
CREATE FUNCTION employees_limited(limit integer) RETURNS SET OF employees AS $$
BEGIN
IF limit = 0 THEN
RETURN QUERY SELECT * FROM employees;
ELSE
RETURN QUERY SELECT * FROM employees LIMIT (limit)
END IF;
RETURN;
END; $$ LANGUAGE plpgsql STRICT;
Note the parentheses around the LIMIT clause. You can similarly pass in an OFFSET value.
This example is very trivial, though. You could achieve the same effect by doing the LIMIT outside of the function:
SELECT * FROM my_function() LIMIT 10;
Doing this inside of a function would really only be useful for a complex query, potentially involving a large amount of data.
Also note that a LIMIT clause without an ORDER BY produces unpredictable results.
Sorry I can't comment. Solution is almost provided by Patrick.
First we should write function to return result without limitation.
CREATE FUNCTION test ()
RETURNS TABLE (val1 varchar, val2 integer) AS $$
BEGIN
RETURN QUERY SELECT val1, val2 FROM test_table;
END;
$$ LANGUAGE plpgsql;
Then we have to write wrapper function, which will process limitation.
CREATE FUNCTION test_wrapper (l integer DEFAULT 0)
RETURNS TABLE (name varchar, id integer) AS $$
BEGIN
IF l = 0 THEN
RETURN QUERY SELECT * FROM test(); -- returns everything
ELSE
RETURN QUERY SELECT * FROM test() LIMIT (l); -- returns accordingly
END IF;
END;
$$ LANGUAGE plpgsql;
In my case I needed to return tables as final result, but one can get anything required as return from wrapper function.
Please note that a SELECT with a LIMIT should always include an ORDER BY, because if not explicitly specified the order of the returned rows can be undefined.
Insead of LIMIT you could use ROW_NUMBER() like in the following query:
SELECT *
FROM (
SELECT *, row_number() OVER (ORDER BY id) AS rn
FROM Employees
) AS s
WHERE
(rn>:offset AND rn<=:limit+:offset) OR :limit=0
I used the below approach in postgres
Create Function employees_limited(limit integer, offset interger, pagination boolean) RETURNS SET OF employees AS $$
BEGIN
if pagination = false then --skip offset and limit
offset = 0;
limit = 2147483647; -- int max value in postgres
end if;
RETURN QUERY
SELECT * FROM employees order by createddate
LIMIT (limit) Offset offset;
RETURN;
END; $$ LANGUAGE plpgsql STRICT;

Postgres FOR LOOP

I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:
for i in 1..25 LOOP
insert into playtime.meta_random_sample
select i, ID
from tbl
order by random() limit 15000
end loop
Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.
Example with plpgsql:
DO
$do$
BEGIN
FOR i IN 1..25 LOOP
INSERT INTO playtime.meta_random_sample
(col_i, col_id) -- declare target columns!
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000;
END LOOP;
END
$do$;
For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:
INSERT INTO playtime.meta_random_sample (col_i, col_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000
) t;
About generate_series():
What is the expected behaviour for multiple set-returning functions in SELECT clause?
About optimizing performance of random selections:
Best way to select random rows PostgreSQL
Below is example you can use:
create temp table test2 (
id1 numeric,
id2 numeric,
id3 numeric,
id4 numeric,
id5 numeric,
id6 numeric,
id7 numeric,
id8 numeric,
id9 numeric,
id10 numeric)
with (oids = false);
do
$do$
declare
i int;
begin
for i in 1..100000
loop
insert into test2 values (random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random());
end loop;
end;
$do$;
I just ran into this question and, while it is old, I figured I'd add an answer for the archives. The OP asked about for loops, but their goal was to gather a random sample of rows from the table. For that task, Postgres 9.5+ offers the TABLESAMPLE clause on WHERE. Here's a good rundown:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/
I tend to use Bernoulli as it's row-based rather than page-based, but the original question is about a specific row count. For that, there's a built-in extension:
https://www.postgresql.org/docs/current/tsm-system-rows.html
CREATE EXTENSION tsm_system_rows;
Then you can grab whatever number of rows you want:
select * from playtime tablesample system_rows (15);
I find it more convenient to make a connection using a procedural programming language (like Python) and do these types of queries.
import psycopg2
connection_psql = psycopg2.connect( user="admin_user"
, password="***"
, port="5432"
, database="myDB"
, host="[ENDPOINT]")
cursor_psql = connection_psql.cursor()
myList = [...]
for item in myList:
cursor_psql.execute('''
-- The query goes here
''')
connection_psql.commit()
cursor_psql.close()
Here is the one complex postgres function involving UUID Array, For loop, Case condition and Enum data update. This function parses each row and checks for the condition and updates the individual row.
CREATE OR REPLACE FUNCTION order_status_update() RETURNS void AS $$
DECLARE
oid_list uuid[];
oid uuid;
BEGIN
SELECT array_agg(order_id) FROM order INTO oid_list;
FOREACH uid IN ARRAY uid_list
LOOP
WITH status_cmp AS (select COUNT(sku)=0 AS empty,
COUNT(sku)<COUNT(sku_order_id) AS partial,
COUNT(sku)=COUNT(sku_order_id) AS full
FROM fulfillment
WHERE order_id=oid)
UPDATE order
SET status=CASE WHEN status_cmp.empty THEN 'EMPTY'::orderstatus
WHEN status_cmp.full THEN 'FULL'::orderstatus
WHEN status_cmp.partial THEN 'PARTIAL'::orderstatus
ELSE null
END
FROM status_cmp
WHERE order_id=uid;
END LOOP;
END;
$$ LANGUAGE plpgsql;
To run the above function
SELECT order_status_update();
Using procedure.
CREATE or replace PROCEDURE pg_temp_3.insert_data()
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO meta_random_sample(col_serial, parent_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, parent_id
FROM parent_tree order by random() limit 2
) t;
END;
Call the procedure.
call pg_temp_3.insert_data();
PostgreSQL manual: https://www.postgresql.org/docs/current/sql-createprocedure.html

PostgreSQL: How to figure out missing numbers in a column using generate_series()?

SELECT commandid
FROM results
WHERE NOT EXISTS (
SELECT *
FROM generate_series(0,119999)
WHERE generate_series = results.commandid
);
I have a column in results of type int but various tests failed and were not added to the table. I would like to create a query that returns a list of commandid that are not found in results. I thought the above query would do what I wanted. However, it does not even work if I use a range that is outside the expected possible range of commandid (like negative numbers).
Given sample data:
create table results ( commandid integer primary key);
insert into results (commandid) select * from generate_series(1,1000);
delete from results where random() < 0.20;
This works:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE NOT EXISTS (SELECT 1 FROM results WHERE commandid = s.i);
as does this alternative formulation:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
LEFT OUTER JOIN results ON (results.commandid = s.i)
WHERE results.commandid IS NULL;
Both of the above appear to result in identical query plans in my tests, but you should compare with your data on your database using EXPLAIN ANALYZE to see which is best.
Explanation
Note that instead of NOT IN I've used NOT EXISTS with a subquery in one formulation, and an ordinary OUTER JOIN in the other. It's much easier for the DB server to optimise these and it avoids the confusing issues that can arise with NULLs in NOT IN.
I initially favoured the OUTER JOIN formulation, but at least in 9.1 with my test data the NOT EXISTS form optimizes to the same plan.
Both will perform better than the NOT IN formulation below when the series is large, as in your case. NOT IN used to require Pg to do a linear search of the IN list for every tuple being tested, but examination of the query plan suggests Pg may be smart enough to hash it now. The NOT EXISTS (transformed into a JOIN by the query planner) and the JOIN work better.
The NOT IN formulation is both confusing in the presence of NULL commandids and can be inefficient:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE s.i NOT IN (SELECT commandid FROM results);
so I'd avoid it. With 1,000,000 rows the other two completed in 1.2 seconds and the NOT IN formulation ran CPU-bound until I got bored and cancelled it.
As I mentioned in the comment, you need to do the reverse of the above query.
SELECT
generate_series
FROM
generate_series(0, 119999)
WHERE
NOT generate_series IN (SELECT commandid FROM results);
At that point, you should find values that do not exist within the commandid column within the selected range.
I am not so experienced SQL guru, but I like other ways to solve problem.
Just today I had similar problem - to find unused numbers in one character column.
I have solved my problem by using pl/pgsql and was very interested in what will be speed of my procedure.
I used #Craig Ringer's way to generate table with serial column, add one million records, and then delete every 99th record. This procedure work about 3 sec in searching for missing numbers:
-- creating table
create table results (commandid character(7) primary key);
-- populating table with serial numbers formatted as characters
insert into results (commandid) select cast(num_id as character(7)) from generate_series(1,1000000) as num_id;
-- delete some records
delete from results where cast(commandid as integer) % 99 = 0;
create or replace function unused_numbers()
returns setof integer as
$body$
declare
i integer;
r record;
begin
-- looping trough table with sychronized counter:
i := 1;
for r in
(select distinct cast(commandid as integer) as num_value
from results
order by num_value asc)
loop
if not (i = r.num_value) then
while true loop
return next i;
i = i + 1;
if (i = r.num_value) then
i = i + 1;
exit;
else
continue;
end if;
end loop;
else
i := i + 1;
end if;
end loop;
return;
end;
$body$
language plpgsql volatile
cost 100
rows 1000;
select * from unused_numbers();
Maybe it will be usable for someone.
If you're on AWS redshift, you might end up needing to defy the question, since it doesn't support generate_series. You'll end up with something like this:
select
startpoints.id gapstart,
min(endpoints.id) resume
from (
select id+1 id
from yourtable outer_series
where not exists
(select null
from yourtable inner_series
where inner_series.id = outer_series.id + 1
)
order by id
) startpoints,
yourtable endpoints
where
endpoints.id > startpoints.id
group by
startpoints.id;