Postgres: The best way to optmize "greater than" query

Postgres: The best way to optmize "greater than" query - postgresql

what is the best way to optimize join query that join with the same table on the next id value at the sub group? For now I have something like this:
CREATE OR REPLACE FUNCTION select_next_id(bigint, bigint) RETURNS bigint AS $body$
DECLARE
_id bigint;
BEGIN
SELECT id INTO _id FROM table WHERE id_group = $2 AND id > $1 ORDER BY id ASC LIMIT 1;
RETURN _id;
END;
$body$ LANGUAGE plpgsql;
And the JOIN query:
SELECT * FROM table t1
JOIN table t2 ON t2.id = select_next_id(t1.id, t1.id_group)
The table have more than 2kk rows, and it takes very very long. Is there a better way to do this quick? Also I have UNIQUE INDEX on column id. Not very helpfull I guess.
Some sample data:
id | id_group
=============
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | 2
20 | 4
25 | 4
37 | 4
40 | 1
55 | 2
And I want to recieve something like this:
id | id_next
1 | 2
2 | 3
3 | null
4 | 5
5 | 6
6 | 55
and so on.

For the query in the function, you need an index on (id_group, id), not just (id).
Next, you don't need the overhead of plpgsql in the function itself, and you can give a few hints to the planner by making it as stable and having a small cost:
CREATE OR REPLACE FUNCTION select_next_id(bigint, bigint) RETURNS bigint AS $body$
SELECT id FROM table WHERE id_group = $2 AND id > $1 ORDER BY id ASC LIMIT 1;
$body$ LANGUAGE sql STABLE COST 10;
In the final query, depending on what you're actually trying to do, you might be able to get rid of the join and the function call by using lead() as highlighted by the horse:
http://www.postgresql.org/docs/current/static/tutorial-window.html

I'm not entirely sure, but I think you want something like this:
select id,
lead(id) over (partition by id_group order by id) as id_next
from the_table
order by id, id_next;

Related

Why subqueried function does not insert new rows?

I need a function to insert rows because one column's (seriano) default value should be the same as PK id.
I have defined table:
CREATE SEQUENCE some_table_id_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;
CREATE TABLE some_table
(
id bigint NOT NULL DEFAULT nextval('some_table_id_seq'::regclass),
itemid integer NOT NULL,
serialno bigint,
CONSTRAINT stockitem_pkey PRIMARY KEY (id),
CONSTRAINT stockitem_serialno_key UNIQUE (serialno)
);
and function to insert count of rows:
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
id bigint;
BEGIN
FOR counter IN 1..count LOOP
id := NEXTVAL( 'some_table_id_seq' );
INSERT INTO some_table (id, itemid, serialno) VALUES (id, itemid, id);
ids := array_append(ids, id);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
And inserting with it works fine:
$ select insert_item(123, 10);
insert_item
-------------
1
2
3
4
5
6
7
8
9
10
(10 rows)
$ select * from some_table;
id | itemid | serialno
----+--------+----------
1 | 123 | 1
2 | 123 | 2
3 | 123 | 3
4 | 123 | 4
5 | 123 | 5
6 | 123 | 6
7 | 123 | 7
8 | 123 | 8
9 | 123 | 9
10 | 123 | 10
(10 rows)
But if I want to use function insert_item as subquery, it seems not to work anymore:
$ select id, itemid from some_table where id in (select insert_item(123, 10));
id | itemid
----+--------
(0 rows)
I created dumb function insert_dumb to test in a subquery:
CREATE OR REPLACE FUNCTION insert_dumb(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
BEGIN
FOR counter IN 1..count LOOP
ids := array_append(ids, counter::bigint);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
and this works in a subquery as expected:
$ select id, itemid from some_table where id in (select insert_dumb(123, 10));
id | itemid
----+--------
1 | 123
2 | 123
3 | 123
4 | 123
5 | 123
6 | 123
7 | 123
8 | 123
9 | 123
10 | 123
(10 rows)
Why does insert_item function not insert new rows when called as subquery? I tried to add raise notice to the loop and it runs as expected shouting new id every time (and increasing the sequence), but no new rows are appended to the table.
I made all the setup available as fiddle
I am using Postgres 11 on Ubuntu.
EDIT
Of course, I let out my real reason, and it pays off...
I need the insert_item function returning ids, so I could use it in update-statement, like:
update some_table set some_text = 'x' where id in (select insert_item(123, 10);)
And addition to the why-question: it is understandable I can get no ids in return (because they share the same snapshot), but the function runs all the needed INSERTs without affecting the table. Shouldn't those rows be available in the next query?

The problem is that the subquery and the surrounding query share the same snapshot, that is, they see the same state of the database. Hence the outer query cannot see the rows inserted by the inner query.
See the documentation (which explains that in the context of WITH, although it also applies here):
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables.
In addition, there is a second problem with your approach: if you run EXPLAIN (ANALYZE) on your statement, you will find that the subquery is not executed at all! Since the table is empty, there is no id, and running the subquery is not necessary to calculate the (empty) result.
You will have to run that in two different statements. Or, better, do it in a different fashion: updating a row that you just inserted is unnecessarily wasteful.

Laurenz explained the visibility problem, but you don't need the sub-query at all if you re-write your function to return the actual table, rather than just he IDs
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1)
RETURNS setof some_table
AS
$func$
INSERT INTO some_table (id, itemid, serialno)
select NEXTVAL( 'some_table_id_seq' ), itemid, currval('some_table_id_seq')
from generate_series(1,count)
returning *;
$func$
LANGUAGE sql;
Then you can use it like this:
select id, itemid
from insert_item(123, 10);
And you get the complete inserted rows.
Online example

fetch data from two different tables in single function postresql

I want to write function to get data from two different tables
My code:
create function
return table(a integer,b integer,c integer,k integer,l integer,m integer);
if(x=1) then
select a,b,c from mst_1
else
select k,l,m from mst_2
end IF;
end;
the problem is that two tables posses different columns, I'm getting error.

I replicated a case similar to yours, and it's just a matter of using the correct sintax.
If you have two tables like test and test_other like in my case
create table test (id serial, name varchar, surname varchar);
insert into test values(1,'Carlo', 'Rossi');
insert into test values(2,'Giovanni', 'Galli');
create table test_other (id_other serial, name_other varchar, surname_other varchar);
insert into test_other values(1,'Beppe', 'Bianchi');
insert into test_other values(2,'Salmo', 'Verdi');
you now want a function that returns the 3 columns from test if an input parameter is 1, the 3 columns from test_other otherwise.
Your function will look like the following
create or replace function case_return(x integer)
returns table(id integer,value_1 varchar, value_2 varchar)
language plpgsql
as
$$
begin
if(x=1) then
return query select test.id,test.name,test.surname from test;
else
return query select test_other.id_other, test_other.name_other, test_other.surname_other from test_other;
end IF;
end;
$$
;
The function always returns the columns id, value_1 and value_2 as per definition even if your source columns are different
defaultdb=> select * from case_return(0); id | value_1 | value_2
----+---------+---------
1 | Beppe | Bianchi
2 | Salmo | Verdi
(2 rows)
defaultdb=> select * from case_return(1); id | value_1 | value_2
----+----------+---------
1 | Carlo | Rossi
2 | Giovanni | Galli
(2 rows)

Postgres cascade delete on non-unique column

I have a table like this:
id | group_id | parent_group
---+----------+-------------
1 | 1 | null
2 | 1 | null
3 | 2 | 1
4 | 2 | 1
Is it possible to add a constraint such that a row is automatically deleted when there is no row with a group_id equal to the row's parent_group? For example, if I delete rows 1 and 2, I want rows 3 and 4 to be deleted automatically because there are no more rows with group_id 1.

The answer that clemens posted led me to the following solution. I'm not very familiar with triggers though; could there be any problems with this and is there a better way to do it?
CREATE OR REPLACE FUNCTION on_group_deleted() RETURNS TRIGGER AS $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM my_table WHERE group_id = OLD.group_id) THEN
DELETE FROM my_table WHERE parent_group = OLD.group_id;
END IF;
RETURN OLD;
END;
$$ LANGUAGE PLPGSQL;
CREATE TRIGGER my_table_delete_trigger AFTER DELETE ON my_table
FOR EACH ROW
EXECUTE PROCEDURE on_group_deleted();

Use array of IDs to insert records into table if it does not already exist

I have created a postgresql function that takes a comma separated list of ids as input parameter. I then convert this comma separated list into an array.
CREATE FUNCTION myFunction(csvIDs text)
RETURNS void AS $$
DECLARE ids INT[];
BEGIN
ids = string_to_array(csvIDs,',');
-- INSERT INTO tableA
END; $$
LANGUAGE PLPGSQL;
What I want to do now is to INSERT a record for each of the id's(in the array) into TABLE A if the ID does not already exist in table. The new records should have value field set to 0.
Table is created like this
CREATE TABLE TableA (
id int PRIMARY KEY,
value int
);
Is this possible to do?

You can use unnest() function to get each element of your array.
create table tableA (id int);
insert into tableA values(13);
select t.ids
from (select unnest(string_to_array('12,13,14,15', ',')::int[]) ids) t
| ids |
| --: |
| 12 |
| 13 |
| 14 |
| 15 |
Now you can check if ids value exists before insert a new row.
CREATE FUNCTION myFunction(csvIDs text)
RETURNS int AS
$myFunction$
DECLARE
r_count int;
BEGIN
insert into tableA
select t.ids
from (select unnest(string_to_array(csvIDs,',')::int[]) ids) t
where not exists (select 1 from tableA where id = t.ids);
GET DIAGNOSTICS r_count = ROW_COUNT;
return r_count;
END;
$myFunction$
LANGUAGE PLPGSQL;
select myFunction('12,13,14,15') as inserted_rows;
| inserted_rows |
| ------------: |
| 3 |
select * from tableA;
| id |
| -: |
| 13 |
| 12 |
| 14 |
| 15 |
dbfiddle here

Custom Postgres Update Function

I have a table which contains multiple rows for a user, holding their station_ids. When a station ID is changed from the front end via drop down button, I want to update the station ID in the table. I only want there to ever be one TRUE value for "is_default_station", but a user can have multiple FALSE values. I am using postgres 9.5, and the PG drive for NodeJS.
My table looks like this:
station_id | station_name | user_id | is_default_station
-----------------+---------------------------- +-------------------------+--------------------------
1 | station 1 | 1 | TRUE
2 | station 2 | 1 | FALSE
3 | station 3 | 1 | FALSE
4 | station 4 | 2 | FALSE
5 | station 5 | 2 | FALSE
6 | station 6 | 2 | TRUE
Here is my function:
CREATE OR REPLACE FUNCTION UPDATE_All_STATIONS_FUNC (
userId INTEGER,
stationId INTEGER
)
RETURNS RECORD AS $$
DECLARE
ret RECORD;
BEGIN
--Find all the user stations associated to a user, and set them to false. Then, update one to TRUE
UPDATE user_stations SET (is_default_station) = (FALSE) WHERE station_id = ALL (SELECT station_id FROM user_stations WHERE user_id =$1 AND is_default_station = TRUE);
UPDATE user_stations SET (is_default_station) = (TRUE) WHERE station_id =$2 AND user_id = $1 RETURNING user_id, station_id INTO ret;
RETURN ret;
END;
$$ LANGUAGE plpgsql;
I am accessing the function like so:
SELECT user_id, station_id FROM update_all_stations_func($1, $2) AS (user_id INTEGER, station_id INTEGER)
The function is not updating anything on the DB, and returning null values for user_id and station_id like so rows: [ { user_id: null, dashboard_id: null } ].
I am guessing that the initial update query with the nested SELECT is not finding anything inside the function, but if I use the first query alone to update, I find results and it updates as expected. What am I missing?

I simplified the first update statement, and the following works as expected:
CREATE OR REPLACE FUNCTION UPDATE_All_STATIONS_FUNC (
userId INTEGER,
stationId INTEGER
)
RETURNS RECORD AS $$
DECLARE
ret RECORD;
BEGIN
--Find all the user stations associated to a user, and set them to false. Then, update one to TRUE
UPDATE user_stations SET (is_default_station) = (FALSE) WHERE user_id = $1 AND station_id <> $2;
UPDATE user_stations SET (is_default_station) = (TRUE) WHERE user_id = $1 AND station_id = $2 RETURNING user_id, station_id INTO ret;
RETURN ret;
END;
$$ LANGUAGE plpgsql;