Unexpected difference between two pieces of sql - postgresql

I have two pieces of sql which I think should give identical results and do not. They both involve a function
create or replace function viewFromList( lid integer, offs integer, lim integer ) returns setof resultsView as $xxx$
BEGIN
return query select resultsView.* from resultsView, list, files
where list_id=lid and
resultsView.basename = files.basename and
idx(imgCmnt,file_id) > 0
order by idx(imgCmnt,file_id)
limit lim offset offs ;
return;
END;
$xxx$
The first is:
drop table if exists t1;
create temp table t1 as select * from viewFromList( lid, frst, nmbr::integer );
select count(*) into rv.height from t1;
The second is
DECLARE
t1 resultsView;
....
select viewFromList( lid, frst, nmbr::integer ) into t1;
select count(*) into rv.height from t1;
Seems to me, rv.height should get the same value in both cases. It doesn't. If it matters, the correct answer in my case is 7, the second code produces 12. I have, of course, looked at the result of the call to viewFromList with the appropriate values. When run in psql, it returns the expected 7 rows.
Can someone tell me what's going on?
Thanks.

Related

PostgreSQL function returning a table without specifying each field in a table

I'm moving from SQL server to Postgresql. In SQL Server I can define table-based function as an alias for a query. Example:
Create Function Example(#limit int) As
Returns Table As Return
Select t1.*, t2.*, price * 0.5 discounted
From t1
Inner Join t2 on t1.id = t2.id
Where t1.price < #limit;
GO
Select * From Example(100);
It gives me a way to return all fields and I don't need to specify types for them. I can easily change field types of a table, add new fields, delete fields, and then re-create a function.
So the question is how to do such thing in Postgresql? I found out that Postgresql requires to explicitly specify all field names and types when writing a function. May be I need something else, not a function?
Postgres implicitly creates a type for each table. So, if you are just selecting from one table, it's easiest to use that type in your function definition:
CREATE TABLE test (id int, value int);
CREATE FUNCTION mytest(p_id int)
RETURNS test AS
$$
SELECT * FROM test WHERE id = p_id;
$$ LANGUAGE SQL;
You are then free to add, remove, or alter columns in test and your function will still return the correct columns.
EDIT:
The question was updated to use the function parameter in the limit clause and to use a more complex query. I would still recommend a similar approach, but you could use a view as #Bergi recommends:
CREATE TABLE test1 (a int, b int);
CREATE TABLE test2 (a int, c int);
CREATE VIEW test_view as SELECT a, b, c from test1 JOIN test2 USING (a);
CREATE FUNCTION mytest(p_limit int)
RETURNS SETOF test_view AS
$$
SELECT * FROM test_view FETCH FIRST p_limit ROWS ONLY
$$ LANGUAGE SQL;
You aren't going to find an exact replacement for the behavior in SQL Server, it's just not how Postgres works.
If you change the function frequently, I'd suggest to use view instead of a function. Because every time you re-create a function, it gets compiled and it's a bit expensive, otherwise you're right - Postgres requires field name and type in functions:
CREATE OR REPLACE VIEW example AS
SELECT t1.*, t2.*, price * 0.5 discounted
FROM t1 INNER JOIN t2 ON t1.id = t2.id;
then
SELECT * FROM example WHERE price < 100;
You can do something like this
CREATE OR REPLACE FUNCTION Example(_id int)
RETURNS RECORD AS
$$
DECLARE
data_record RECORD;
BEGIN
SELECT * INTO data_record FROM SomeTable WHERE id = _id;
RETURN data_record;
END;
$$
LANGUAGE 'plpgsql';

using recursive CTE within a function

I wanted to try out using a recursive CTE for the first time, so I wrote a query to show the notes in a musical scale based on the root note and the steps given the different scales.
When running the script itself all is well, but the second that I try to make it into a function, I get the error "relation "temp_scale_steps" does not exist".
I am using postgreSQL 9.4.1. I can't see any reason why this would not work. Any advice would be gratefully received.
The code below:
create or replace function scale_notes(note_id int, scale_id int)
returns table(ordinal int, note varchar(2))
as
$BODY$
drop table if exists temp_min_note_seq;
create temp table temp_min_note_seq
as
select min(note_seq_id) as min_note_id from note_seq where note_id = $1
;
drop table if exists temp_scale_steps;
create temp table temp_scale_steps
as
with recursive steps (ordinal, step) as
(
select ordinal
,step
from scale_steps
where scale_id = $2
union all
select ordinal+1
,step
from steps
where ordinal < (select max(ordinal) from scale_steps where scale_id = $2)
)
select ordinal
,sum(step) as temp_note_seq_id
from steps
group by 1
order by 1
;
select x.ordinal
,n.note
from
(
select ordinal
,min_note_id + temp_note_seq_id as temp_note_seq_id
from temp_scale_steps
join temp_min_note_seq on (1=1)
) x
join note_seq ns on (x.temp_note_seq_id = ns.note_seq_id)
join notes n on (ns.note_id = n.note_id)
order by ordinal;
$BODY$
language sql volatile;
In response to comments I have changed the script so that the query is done in one step and now all works. However, I would still be interested to know why the version above does not work.

Postgres FOR LOOP

I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:
for i in 1..25 LOOP
insert into playtime.meta_random_sample
select i, ID
from tbl
order by random() limit 15000
end loop
Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.
Example with plpgsql:
DO
$do$
BEGIN
FOR i IN 1..25 LOOP
INSERT INTO playtime.meta_random_sample
(col_i, col_id) -- declare target columns!
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000;
END LOOP;
END
$do$;
For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:
INSERT INTO playtime.meta_random_sample (col_i, col_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000
) t;
About generate_series():
What is the expected behaviour for multiple set-returning functions in SELECT clause?
About optimizing performance of random selections:
Best way to select random rows PostgreSQL
Below is example you can use:
create temp table test2 (
id1 numeric,
id2 numeric,
id3 numeric,
id4 numeric,
id5 numeric,
id6 numeric,
id7 numeric,
id8 numeric,
id9 numeric,
id10 numeric)
with (oids = false);
do
$do$
declare
i int;
begin
for i in 1..100000
loop
insert into test2 values (random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random());
end loop;
end;
$do$;
I just ran into this question and, while it is old, I figured I'd add an answer for the archives. The OP asked about for loops, but their goal was to gather a random sample of rows from the table. For that task, Postgres 9.5+ offers the TABLESAMPLE clause on WHERE. Here's a good rundown:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/
I tend to use Bernoulli as it's row-based rather than page-based, but the original question is about a specific row count. For that, there's a built-in extension:
https://www.postgresql.org/docs/current/tsm-system-rows.html
CREATE EXTENSION tsm_system_rows;
Then you can grab whatever number of rows you want:
select * from playtime tablesample system_rows (15);
I find it more convenient to make a connection using a procedural programming language (like Python) and do these types of queries.
import psycopg2
connection_psql = psycopg2.connect( user="admin_user"
, password="***"
, port="5432"
, database="myDB"
, host="[ENDPOINT]")
cursor_psql = connection_psql.cursor()
myList = [...]
for item in myList:
cursor_psql.execute('''
-- The query goes here
''')
connection_psql.commit()
cursor_psql.close()
Here is the one complex postgres function involving UUID Array, For loop, Case condition and Enum data update. This function parses each row and checks for the condition and updates the individual row.
CREATE OR REPLACE FUNCTION order_status_update() RETURNS void AS $$
DECLARE
oid_list uuid[];
oid uuid;
BEGIN
SELECT array_agg(order_id) FROM order INTO oid_list;
FOREACH uid IN ARRAY uid_list
LOOP
WITH status_cmp AS (select COUNT(sku)=0 AS empty,
COUNT(sku)<COUNT(sku_order_id) AS partial,
COUNT(sku)=COUNT(sku_order_id) AS full
FROM fulfillment
WHERE order_id=oid)
UPDATE order
SET status=CASE WHEN status_cmp.empty THEN 'EMPTY'::orderstatus
WHEN status_cmp.full THEN 'FULL'::orderstatus
WHEN status_cmp.partial THEN 'PARTIAL'::orderstatus
ELSE null
END
FROM status_cmp
WHERE order_id=uid;
END LOOP;
END;
$$ LANGUAGE plpgsql;
To run the above function
SELECT order_status_update();
Using procedure.
CREATE or replace PROCEDURE pg_temp_3.insert_data()
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO meta_random_sample(col_serial, parent_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, parent_id
FROM parent_tree order by random() limit 2
) t;
END;
Call the procedure.
call pg_temp_3.insert_data();
PostgreSQL manual: https://www.postgresql.org/docs/current/sql-createprocedure.html

how to call postgresql stored procs from inside another stored proc and include return values in queries

I have a postgresql function / stored proc that does the following:
1. calls another function and saves the value into a variable.
2. executes another sql statement using the value I got from step one as an argument.
My problem is that the query is not returning any data. No errors are returned either.
I'm just new to postgresql so I don't know the best way to debug... but I added a RAISE NOTICE command right after step 1, like so:
SELECT INTO active_id get_widget_id(widget_desc);
RAISE NOTICE 'Active ID is:(%)', active_id;
In the "Messages" section of the pgadmin3 screen, I see the debug message with the data:
NOTICE: Active ID is:(2)
I'm wondering whether or not the brackets are causing the problem for me.
Here's the sql I'm trying to run in step 2:
SELECT d.id, d.contact_id, d.priority, cp.contact
FROM widget_details d, contact_profile cp, contact_type ct
WHERE d.rule_id=active_id
AND d.active_yn = 't'
AND cp.id=d.contact_id
AND cp.contact_type_id=ct.id
AND ct.name = 'email'
Order by d.priority ASC
You'll notice that in my where clause I am referencing the variable "active_id".
I know that this query should return at least one row because when i run a straight sql select (vs using this function) and substitute the value 2 for the variable "active_id", I get back the data I'm looking for.
Any suggetions would be appreciated.
Thanks.
EDIT 1:
Here's the full function definition:
CREATE TYPE custom_return_type AS (
widgetnum integer,
contactid integer,
priority integer,
contactdetails character varying
);
CREATE OR REPLACE FUNCTION test(widget_desc integer)
RETURNS SETOF custom_return_type AS
$BODY$
DECLARE
active_id integer;
rec custom_return_type ;
BEGIN
SELECT INTO active_id get_widget_id(widget_desc);
RAISE NOTICE 'Active ID is:(%)', active_id;
FOR rec IN
SELECT d.id, d.contact_id, d.priority, cp.contact
FROM widget_details d, contact_profile cp, contact_type ct
WHERE d.rule_id=active_id
AND d.active_yn = 't'
AND cp.id=d.contact_id
AND cp.contact_type_id=ct.id
AND ct.name = 'email'
Order by d.priority ASC
LOOP
RETURN NEXT rec;
END LOOP;
END
$BODY$
That's several levels of too-complicated (edit: as it turns out that Erwin already explained to you last time you posted the same thing). Start by using RETURNS TABLE and RETURN QUERY:
CREATE OR REPLACE FUNCTION test(fmfm_number integer)
RETURNS TABLE (
widgetnum integer,
contactid integer,
priority integer,
contactdetails character varying
) AS
$BODY$
BEGIN
RETURN QUERY SELECT d.id, d.contact_id, d.priority, cp.contact
FROM widget_details d, contact_profile cp, contact_type ct
WHERE d.rule_id = get_widget_id(widget_desc)
AND d.active_yn = 't'
AND cp.id=d.contact_id
AND cp.contact_type_id=ct.id
AND ct.name = 'email'
Order by d.priority ASC;
END
$BODY$ LANGUAGE plpgsql;
at which point it's probably simple enough to be turned into a trivial SQL function or even a view. Hard to be sure, since the function doesn't make tons of sense as written:
You never use the parameter fmfm_number anywhere; and
widget_desc is never defined
so this function could never run. Clearly you haven't shown us the real source code, but some kind of "simplified" code that doesn't match the code you're really having issues with.
There is a difference between:
SELECT INTO ...
[http://www.postgresql.org/docs/current/interactive/sql-selectinto.html]
and
SELECT select_expressions INTO [STRICT] target FROM ...;
[http://www.postgresql.org/docs/current/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW]
I think you want:
SELECT get_widget_id(widget_desc) INTO active_id;

PostgreSQL: How to figure out missing numbers in a column using generate_series()?

SELECT commandid
FROM results
WHERE NOT EXISTS (
SELECT *
FROM generate_series(0,119999)
WHERE generate_series = results.commandid
);
I have a column in results of type int but various tests failed and were not added to the table. I would like to create a query that returns a list of commandid that are not found in results. I thought the above query would do what I wanted. However, it does not even work if I use a range that is outside the expected possible range of commandid (like negative numbers).
Given sample data:
create table results ( commandid integer primary key);
insert into results (commandid) select * from generate_series(1,1000);
delete from results where random() < 0.20;
This works:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE NOT EXISTS (SELECT 1 FROM results WHERE commandid = s.i);
as does this alternative formulation:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
LEFT OUTER JOIN results ON (results.commandid = s.i)
WHERE results.commandid IS NULL;
Both of the above appear to result in identical query plans in my tests, but you should compare with your data on your database using EXPLAIN ANALYZE to see which is best.
Explanation
Note that instead of NOT IN I've used NOT EXISTS with a subquery in one formulation, and an ordinary OUTER JOIN in the other. It's much easier for the DB server to optimise these and it avoids the confusing issues that can arise with NULLs in NOT IN.
I initially favoured the OUTER JOIN formulation, but at least in 9.1 with my test data the NOT EXISTS form optimizes to the same plan.
Both will perform better than the NOT IN formulation below when the series is large, as in your case. NOT IN used to require Pg to do a linear search of the IN list for every tuple being tested, but examination of the query plan suggests Pg may be smart enough to hash it now. The NOT EXISTS (transformed into a JOIN by the query planner) and the JOIN work better.
The NOT IN formulation is both confusing in the presence of NULL commandids and can be inefficient:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE s.i NOT IN (SELECT commandid FROM results);
so I'd avoid it. With 1,000,000 rows the other two completed in 1.2 seconds and the NOT IN formulation ran CPU-bound until I got bored and cancelled it.
As I mentioned in the comment, you need to do the reverse of the above query.
SELECT
generate_series
FROM
generate_series(0, 119999)
WHERE
NOT generate_series IN (SELECT commandid FROM results);
At that point, you should find values that do not exist within the commandid column within the selected range.
I am not so experienced SQL guru, but I like other ways to solve problem.
Just today I had similar problem - to find unused numbers in one character column.
I have solved my problem by using pl/pgsql and was very interested in what will be speed of my procedure.
I used #Craig Ringer's way to generate table with serial column, add one million records, and then delete every 99th record. This procedure work about 3 sec in searching for missing numbers:
-- creating table
create table results (commandid character(7) primary key);
-- populating table with serial numbers formatted as characters
insert into results (commandid) select cast(num_id as character(7)) from generate_series(1,1000000) as num_id;
-- delete some records
delete from results where cast(commandid as integer) % 99 = 0;
create or replace function unused_numbers()
returns setof integer as
$body$
declare
i integer;
r record;
begin
-- looping trough table with sychronized counter:
i := 1;
for r in
(select distinct cast(commandid as integer) as num_value
from results
order by num_value asc)
loop
if not (i = r.num_value) then
while true loop
return next i;
i = i + 1;
if (i = r.num_value) then
i = i + 1;
exit;
else
continue;
end if;
end loop;
else
i := i + 1;
end if;
end loop;
return;
end;
$body$
language plpgsql volatile
cost 100
rows 1000;
select * from unused_numbers();
Maybe it will be usable for someone.
If you're on AWS redshift, you might end up needing to defy the question, since it doesn't support generate_series. You'll end up with something like this:
select
startpoints.id gapstart,
min(endpoints.id) resume
from (
select id+1 id
from yourtable outer_series
where not exists
(select null
from yourtable inner_series
where inner_series.id = outer_series.id + 1
)
order by id
) startpoints,
yourtable endpoints
where
endpoints.id > startpoints.id
group by
startpoints.id;