Ordering a query on a field in a return record

Ordering a query on a field in a return record - postgresql

I've got a query that calls a function in its select clause. The function returns a record type. In the calling query, I want to order by one of the fields in the returned record and if possible I'd also like to return the fields of the record as fields of the calling query. To make this clear, here's a simplified version of the code:
CREATE OR REPLACE FUNCTION getStatus(lastContact timestamptz, lastAlTime timestamptz, lastGps timestamptz, out status varchar, out toelichting varchar, out colorLevel integer)
RETURNS record AS
$BODY$
BEGIN
status := 'controle_status_ok';
toelichting := '';
colorLevel := 3;
END
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;
ALTER FUNCTION DMI_Controle_getStatus(timestamptz, timestamptz, timestamptz, out varchar, out varchar, out integer) OWNER TO xyz;
Using this function, I want to have a query like this one:
SELECT
id,
name,
getStatus(tabel3.lastcontact, tabel4.lastchanged, tabel5.lastfound) as status
FROM
tabel1
left join tabel2 on ...
left join tabel3 on ...
left join tabel4 on ...
left join tabel5 on ...
ORDER BY
status
Postgres comes with the following error:
ERROR: could not identify an ordering operator for type record
HINT: Use an explicit ordering operator or modify the query.
The question: how should I order by the value of colorLevel that's been returned by getStatus?
Additional question: can I return the three fields of the getStatus function at fields of the query that calls the getStatus function?

Use
ORDER BY (status).colorlevel
to reference a column of your record type.
As an aside: I used lower case(colorlevel instead of colorLevel) because identifiers are cast to lower case if not double-quoted anyway, and using mixed case identifiers is generally a bad idea in PostgreSQL.
As to your additional question, similar syntax requirement. I also use a subquery to optimize the query:
SELECT id
, name
, (x.status).status
, (x.status).toelichting
, (x.status).colorLevel
FROM tabel
, (SELECT getStatus(now(), now(), now()) as status) x
ORDER BY (x.status).colorlevel
Read about accessing composite types in the manual.
Answer after additional input
To use columns from your tables, put it all in the a subquery. I am trying to avoid to call the function multiple times, because that may be expensive.
SELECT
id,
name,
(status).status,
(status).toelichting,
(status).colorLevel
FROM (
SELECT
id,
name,
getStatus(tabel3.lastcontact, tabel4.lastchanged, tabel5.lastfound) as status
FROM
tabel1
left join tabel2 on ...
left join tabel3 on ...
left join tabel4 on ...
left join tabel5 on ...
) x
ORDER BY
(status).colorlevel

Related

How to handle differening parent json keys in postgresql jsonb

I have json records ingested in jsonb format that have varying parent keys i want to access- most of the parent keys refer to a document schema
SELECT id, COALESCE(data->'TEXPORT'->'FORM_SECTION'->'F03_2014',
data->'TEXPORT'->'FORM_SECTION'->'F02_2014',
data->'TEXPORT'->'FORM_SECTION'->'NOTICE_UUID',
data->'TEXPORT'->'FORM_SECTION'->'F01_2014',
data->'TEXPORT'->'FORM_SECTION'->'F14_2014',
data->'TEXPORT'->'FORM_SECTION'->'F21_2014',
data->'TEXPORT'->'FORM_SECTION'->'F15_2014')->'OBJECT'->'SHORT_DESCR'->'P' from json_table
How can i make this cleaner and how do i do multiple coalesces? Ie. sometimes the SHORT_DESCR key is called something else also

You can write your own helper function:
CREATE FUNCTION first_property(value jsonb, VARIADIC keys text[]) RETURNS jsonb AS $$
SELECT value -> key
FROM UNNEST(keys) WITH ORDINALITY AS _(key, i)
WHERE value ? key
ORDER BY i
LIMIT 1;
$$ LANGUAGE SQL;
(Online demo)
With that, you can shorten your query to
SELECT
id,
first_property(
data->'TEXPORT'->'FORM_SECTION',
'F03_2014', 'F02_2014', 'NOTICE_UUID', 'F01_2014', 'F14_2014', 'F21_2014', 'F15_2014'
)->'OBJECT'->'SHORT_DESCR'->'P'
FROM json_table
and you can call it multiple times, like
SELECT
id,
first_property(
first_property(
data->'TEXPORT'->'FORM_SECTION',
'F03_2014', 'F02_2014', 'NOTICE_UUID', 'F01_2014', 'F14_2014', 'F21_2014', 'F15_2014'
)->'OBJECT',
'SHORT_DESCR', 'SDCR', 'DESC'
)->'P'
FROM json_table

Avoid putting PostgreSQL function result into one field

The end result of what I am after is a query that calls a function and that function returns a set of records that are in their own separate fields. I can do this but the results of the function are all in one field.
ie: http://i.stack.imgur.com/ETLCL.png and the results I am after are: http://i.stack.imgur.com/wqRQ9.png
Here's the code to create the table
CREATE TABLE tbl_1_hm
(
tbl_1_hm_id bigserial NOT NULL,
tbl_1_hm_f1 VARCHAR (250),
tbl_1_hm_f2 INTEGER,
CONSTRAINT tbl_1_hm PRIMARY KEY (tbl_1_hm_id)
)
-- do that for a few times to get some data
INSERT INTO tbl_1_hm (tbl_1_hm_f1, tbl_1_hm_f2)
VALUES ('hello', 1);
CREATE OR REPLACE FUNCTION proc_1_hm(id BIGINT)
RETURNS TABLE(tbl_1_hm_f1 VARCHAR (250), tbl_1_hm_f2 int AS $$
SELECT tbl_1_hm_f1, tbl_1_hm_f2
FROM tbl_1_hm
WHERE tbl_1_hm_id = id
$$ LANGUAGE SQL;
--And here is the current query I am running for my results:
SELECT t1.tbl_1_hm_id, proc_1_hm(t1.tbl_1_hm_id) AS t3
FROM tbl_1_hm AS t1
Thanks for having a read. Please if you want to haggle about the semantics of what I am doing by hitting the same table twice or my naming convention --> this is a simplified test.

When a function returns a set of records, you should treat it as a table source:
SELECT t1.tbl_1_hm_id, t3.*
FROM tbl_1_hm AS t1, proc_1_hm(t1.tbl_1_hm_id) AS t3;
Note that functions are implicitly using a LATERAL join (scroll down to sub-sections 4 and 5) so you can use fields from tables listed previously without having to specify an explicit JOIN condition.

SELECT clause in FOR LOOP control using plpgsql

I try to make a script as to output all the foos that are used by only one user, if a foo is used by more than one user, it shouldn't be outputed.
here's my tables
foos (id, value)
users (id, name)
used (foo_id, user_id)
and my not working script
FUNCTION output_unshared_foos ()
RETURNS foos AS
$a$
DECLARE
foocounts RECORD;
BEGIN
SELECT u.foo_id, count(*)
INTO foocounts -- store in the local variable
FROM used u
GROUP BY u.foo_id;
FOR f IN SELECT * FROM foos
LOOP
IF (SELECT fc.count < 2 FROM foocounts fc WHERE fc.foo_id = f.id) THEN
RETURN NEXT f;
END IF;
END LOOP;
END
$a$ language plpgsql;
doesn't seem to work, every rows are returned and the conditional control seems to be always true.

Your first problem is that you can't store the result of a query that returns more than one row into a single variable (the SELECT u.foo_id, count(*) INTO ... part). I'm surprised you don't get a runtime error when you call your function.
Your function also doesn't compile because the record f is not declared and a functioned defined as returns foos can't use return next
But your approach is wrong (even if it worked). Doing row-by-row processing is almost always the wrong choice in SQL. SQL and relational databases are meant to handle sets, not single rows.
Your problem can be solved with a single query:
select foo_id
from used
group by foo_id
having count(distinct user_id) = 1
will return all foo ids that are used by exactly one user.
If you need the additional information from the foos table, you can join the above query to the foos table:
select f.*
from foos f
join (
select foo_id
from used
group by foo_id
having count(distinct user_id) = 1
) u on f.id = u.foo_id

Join 2 sets based on default order

How do I join 2 sets of records solely based on the default order?
So if I have a table x(col(1,2,3,4,5,6,7)) and another table z(col(a,b,c,d,e,f,g))
it will return
c1 c2
-- --
1 a
2 b
3 c
4 d
5 e
6 f
7 g
Actually, I wanted to join a pair of one dimensional arrays from parameters and treat them like columns from a table.
Sample code:
CREATE OR REPLACE FUNCTION "Test"(timestamp without time zone[],
timestamp without time zone[])
RETURNS refcursor AS
$BODY$
DECLARE
curr refcursor;
BEGIN
OPEN curr FOR
SELECT DISTINCT "Start" AS x, "End" AS y, COUNT("A"."id")
FROM UNNEST($1) "Start"
INNER JOIN
(
SELECT "End", ROW_NUMBER() OVER(ORDER BY ("End")) rn
FROM UNNEST($2) "End" ORDER BY ("End")
) "End" ON ROW_NUMBER() OVER(ORDER BY ("Start")) = "End".rn
LEFT JOIN "A" ON ("A"."date" BETWEEN x AND y)
GROUP BY 1,2
ORDER BY "Start";
return curr;
END
$BODY$

Now, to answer the real question that was revealed in comments, which appears to be something like:
Given two arrays 'a' and 'b', how do I pair up their elements so I can get the element pairs as column aliases in a query?
There are a couple of ways to tackle this:
If and only if the arrays are of equal length, use multiple unnest functions in the SELECT clause (a deprecated approach that should only be used for backward compatibility);
Use generate_subscripts to loop over the arrays;
Use generate_series over subqueries against array_lower and array_upper to emulate generate_subscripts if you need to support versions too old to have generate_subscripts;
Relying on the order that unnest returns tuples in and hoping - like in my other answer and as shown below. It'll work, but it's not guaranteed to work in future versions.
Use the WITH ORDINALITY functionality added in PostgreSQL 9.4 (see also its first posting) to get a row number for unnest when 9.4 comes out.
Use multiple-array UNNEST, which is SQL-standard but which PostgreSQL doesn't support yet.
So, say we have function arraypair with array parameters a and b:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
-- blah code here blah
$$ LANGUAGE whatever IMMUTABLE;
and it's invoked as:
SELECT * FROM arraypair( ARRAY[1,2,3,4,5,6,7], ARRAY['a','b','c','d','e','f','g'] );
possible function definitions would be:
SRF-in-SELECT (deprecated)
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT unnest(a), unnest(b);
$$ LANGUAGE sql IMMUTABLE;
Will produce bizarre and unexpected results if the arrays aren't equal in length; see the documentation on set returning functions and their non-standard use in the SELECT list to learn why, and what exactly happens.
generate_subscripts
This is likely the safest option:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT
a[i], b[i]
FROM generate_subscripts(CASE WHEN array_length(a,1) >= array_length(b,1) THEN a::text[] ELSE b::text[] END, 1) i;
$$ LANGUAGE sql IMMUTABLE;
If the arrays are of unequal length, as written it'll return null elements for the shorter, so it works like a full outer join. Reverse the sense of the case to get an inner-join like effect. The function assumes the arrays are one-dimensional and that they start at index 1. If an entire array argument is NULL then the function returns NULL.
A more generalized version would be written in PL/PgSQL and would check array_ndims(a) = 1, check array_lower(a, 1) = 1, test for null arrays, etc. I'll leave that to you.
Hoping for pair-wise returns:
This isn't guaranteed to work, but does with PostgreSQL's current query executor:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (), c1.col
FROM unnest(a) c1(col)
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (), c2.col
FROM unnest(b) c2(col)
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
$$ LANGUAGE sql IMMUTABLE;
I would consider using generate_subscripts much safer.
Multi-argument unnest:
This should work, but doesn't because PostgreSQL's unnest doesn't accept multiple input arrays (yet):
SELECT * FROM unnest(a,b);

select x.c1, z.c2
from
x
inner join
(
select
c2,
row_number() over(order by c2) rn
from z
order by c2
) z on x.c1 = z.rn
order by x.c1
If x.c1 is not 1,2,3... you can do the same that was done with z
The middle order by is not necessary as pointed by Erwin. I tested it like this:
create table t (i integer);
insert into t
select ceil(random() * 100000)
from generate_series(1, 100000);
select
i,
row_number() over(order by i) rn
from t
;
And i comes out ordered. Before this simple test which I never executed I though it would be possible that the rows would be numbered in any order.

By "default order" it sounds like you probably mean the order in which the rows are returned by select * from tablename without an ORDER BY.
If so, this ordering is undefined. The database can return rows in any order that it feels like. You'll find that if you UPDATE a row, it probably moves to a different position in the table.
If you're stuck in a situation where you assumed tables had an order and they don't, you can as a recovery option add a row number based on the on-disk ordering of the tuples within the table:
select row_number() OVER (), *
from the_table
order by ctid
If the output looks right, I recommend that you CREATE TABLE a new table with an extra field, then do an INSERT INTO ... SELECT to insert the data ordered by ctid, then ALTER TABLE ... RENAME the tables and finally fix any foreign key references so they point to the new table.
ctid can be changed by autovacuum, UPDATE, CLUSTER, etc, so it is not something you should ever be using in applications. I'm using it here only because it sounds like you don't have any real ordering or identifier key.
If you need to pair up rows based on their on-disk ordering (an unreliable and unsafe thing to do as noted above), you could per this SQLFiddle try:
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c1.col
FROM c1
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c2.col
FROM c2
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
but never rely on this in a production app. If you're really stuck you can use this with CREATE TABLE AS to construct a new table that you can start with when you're working on recovering data from a DB that lacks a required key, but that's about it.
The same approach given above might work with an empty window clause () instead of (ORDER BY ctid) when using sets that lack a ctid, like interim results from functions. It's even less safe then though, and should be a matter of last resort only.
(See also this newer related answer: https://stackoverflow.com/a/17762282/398670)

Postgresql Slow on custom function, php but fast if directly input on psql using text search with gin index

I have 3 tables Person, Names, and Notes. Each person has multiple name and has optional notes. I have full text search on some columns on names and notes (see below), they are working perfectly if the word I search with is in the result set or is in the db, this is for custom function, php, and psql. The problem now is that when the word I search is not present in the db the query gets super slow in php and custom function but still fast on psql. On psql it's less than 1s, others are more than 10s.
Tables:
Person | id, birthday
Name | person_id, name, fs_name
Notes | person_id, note, fs_note
Beside PK and FK index, Gin index on fs_name and fs_note.
Function/Query
create or replace function queryNameFunc (TEXT)
returns TABLE(id int, name TEXT) as $$
select id, name
from person_name pnr
inner join person pr on (pnr.person_id=pr.id)
left join personal_notes psr on (psr.person_id = pr.id)
where pr.id in
(select distinct(id)
from person_name pn
inner join person p on (p.id = pn.person_id)
left join personal_notes ps on (ps.person_id = p.id)
where tname ## to_tsquery($1)
limit 20);
$$ language SQL;
The where condition is trimmed down in here, so for example if I do 'john & james' on $1 and the data is on the db then results is fast but if 'john and james' are not in db then its slow. This got slower as I have 1M records on person and 3M+ on names (all dummy records). Any idea on how to fix this? I tried restarting the server, restarting postgresql.

The database has to preprare the inner query before it has any knowledge about the parameter. This might result in a bad queryplan. To avoid this problem in a function, use the plpgsql-language and use EXECUTE inside the function:
CREATE OR REPLACE FUNCTION queryNameFunc (TEXT) RETURNS TABLE(id INT, name TEXT) AS $$
BEGIN
RETURN QUERY EXECUTE '
SELECT
id,
name
FROM
person_name pnr
INNER JOIN person pr ON (pnr.person_id=pr.id)
LEFT JOIN personal_notes psr ON (psr.person_id = pr.id)
WHERE
pr.id IN(
SELECT
DISTINCT(id)
FROM
person_name pn
INNER JOIN person p ON (p.id = pn.person_id)
LEFT JOIN personal_notes ps ON (ps.person_id = p.id)
WHERE tname ## to_tsquery($1)
LIMIT 20)' USING $1;
END;
$$ LANGUAGE plpgsql;
This works in version 8.4 and you do have to install plpgsql:
CREATE LANGUAGE plpgsql;