Postgresql: cannot get result using sub-query at FROM ... place - postgresql

What I'm trying to accomplish is to get aggregated data for all unique combinations of sendercompid, targetcompid, msgtype through all tables in inner SQL.
I expect to have from 20mil to 40mil unique rows in resulting output.
I cannot succeed in running next query on Postgresql 8.3.13:
SELECT
sendercompid, targetcompid, count(msgtype), msgtype
FROM
(SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'test'
AND table_schema = 'msg'
AND (table_name like 'fix_aee_20121214%') OR
(table_name like 'fix_aee2_20121214%')
)
WHERE
(sendercompid LIKE '%201%') OR
(targetcompid LIKE '%201%')
GROUP BY
sendercompid, targetcompid, msgtype ;
If this select is being split on 2 : outer and inner, then :
inner will provide list of tables and outer will do select and grouping from each table .
If I run those two SQLs as one, I have an alias error from pgsql db
ERROR: subquery in FROM must have an alias
I tried use alias, but this error not disappear.
Any thoughts what I am missing there?
Thank you.

FROM doesn't work the way you think it does. A sub-select works like any other query: it produces a set of rows. The outer SELECT works with those rows as though they were a table. There's no special magic beyond that; it has no idea that the values you're returning are table names, and won't treat them as such.
You can probably accomplish what you want using the catalog tables, but that would be complicated and hack-y.
Since your sub-tables appear to be date-based partitions, I think what you really want to use is the partitioning support built into Postgres, described in these docs. Essentially your partitions inherit from a parent table, and you set up range constraints on each child. When you query from the parent table with constraint_exclusion enabled, Postgres automatically selects the appropriate partition.

You can "run" such query using dynamic SQL. The idea - to form the proper query with table name as string inside PL\pgSQL procedure and EXECUTE it. Something like:
CREATE OR REPLACE FUNCTION public.function1 (
)
RETURNS TABLE (
"field1" NUMERIC,
"field2" NUMERIC,
...
) AS
$body$
BEGIN
RETURN EXECUTE 'SELECT * FROM '|| (SELECT table_name from information_schema.tables
where table_catalog = 'test' AND
table_schema='msg' AND
(table_name like 'fix_aee_20121214%') OR
(table_name like 'fix_aee2_20121214%'));
END;
$body$
LANGUAGE 'plpgsql';
Then use something like:
SELECT sendercompid, targetcompid, count(msgtype), msgtype
FROM function1
WHERE (sendercompid LIKE '%201%') OR
(targetcompid LIKE '%201%')
GROUP BY sendercompid, targetcompid, msgtype ;
Or you can create function with full query and some parameters to build WHERE clause.
Details: EXECUTE,

If this is just a concept then:
ERROR: subquery in FROM must have an alias.
There's is no information where columns sendercompid, targetcompid, count(msgtype), msgtype are come from.
SELECT sendercompid, targetcompid, count(msgtype), msgtype from (
SELECT table_name from information_schema.tables
where table_catalog = 'test' AND
table_schema='msg' AND
(table_name like 'fix_aee_20121214%') OR
(table_name like 'fix_aee2_20121214%')
) a
WHERE (sendercompid LIKE '%201%') OR
(targetcompid LIKE '%201%')
GROUP BY sendercompid, targetcompid, msgtype ;

Use alias for subquery. Better you alias for all tables to avoid confusion.

Related

Undefined column error when using the output of a function as the table name

I have a multi-tenant database where each tenant gets their own schema. Each schema has a set of materialized views used in full-text searches.
The following function takes a schema name and a table name and concatenates them into schema.table_name format:
CREATE OR REPLACE FUNCTION create_table_name(_schema text, _tbl text, OUT result text)
AS 'select $1 || ''.'' || $2'
LANGUAGE SQL
It works as expected in PGAdmin:
I'm trying to use this function in a prepared statement, like this:
SELECT p.id AS id,
ts_rank(
p.document, plainto_tsquery(unaccent(?))
) AS rank
FROM create_table_name(?, 'project_search') AS p
WHERE p.document ## plainto_tsquery(unaccent(?))
OR p.name ILIKE ?
However, when I run it, I get the following error:
ERROR 42703 (undefined_column) column p.id does not exist
If I "hard-code" the schema and table name though, it works.
Why am I getting this error?
P.S. I should note that I am aware of the dangers of this approach, but the schema name always comes from inside my application so I'm not worried about SQL injection.
You want to use the function result as table name in a query, but what you are actually doing is using the function as a table function. This “table” has only one row and one column called result, which explains the error message.
You need dynamic SQL for that, for example by using PL/pgSQL code in a DO statement:
DO
$$DECLARE
...
BEGIN
EXECUTE
format(
E'SELECT p.id AS id,\n'
' ts_rank(\n'
' p.document,\n'
' plainto_tsquery(unaccent(?))\n'
' ) AS rank\n'
'FROM %I.project_search AS p\n'
'WHERE p.document ## plainto_tsquery(unaccent($1))\n'
'OR p.name ILIKE $2',
schema_name
)
USING fts_query, like_pattern
INTO var1, ...;
...
$$;
To handle more than one result row, you'd use a FOR loop — this is just a simple example to show the principle.
Note how I use format with the %I pattern to avoid SQL injection. Your function is vulnerable.

Execute a SELECT with dynamic ORDER BY expression inside a function

I'm trying to EXECUTE some SELECTs to use inside a function, my code is something like this:
DECLARE
result_one record;
BEGIN
EXECUTE 'WITH Q1 AS
(
SELECT id
FROM table_two
INNER JOINs, WHERE, etc, ORDER BY... DESC
)
SELECT Q1.id
FROM Q1
WHERE, ORDER BY...DESC';
RETURN final_result;
END;
I know how to do it in MySQL, but in PostgreSQL I'm failing. What should I change or how should I do it?
For a function to be able to return multiple rows it has to be declared as returns table() (or returns setof)
And to actually return a result from within a PL/pgSQL function you need to use return query (as documented in the manual)
To build dynamic SQL in Postgres it is highly recommended to use the format() function to properly deal with identifiers (and to make the source easier to read).
So you need something like:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
'with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc', p_sort_column);
end;
$$
language plpgsql;
Note that the order by inside the CTE is pretty much useless if you are sorting the final query unless you use a LIMIT or distinct on () inside the query.
You can make your life even easier if you use another level of dollar quoting for the dynamic SQL:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
$query$
with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc
$query$, p_sort_column);
end;
$$
language plpgsql;
What a_horse said. And:
How to return result of a SELECT inside a function in PostgreSQL?
Plus, to pick a column for ORDER BY dynamically, you have to add that column to the SELECT list of your CTE, which leads to complications if the column can be duplicated (like with passing 'id') ...
Better yet, remove the CTE entirely. There is nothing in your question to warrant its use anyway. (Only use CTEs when needed in Postgres, they are typically slower than equivalent subqueries or simple queries.)
CREATE OR REPLACE FUNCTION get_data(p_sort_column text)
RETURNS TABLE (id integer) AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$
SELECT t2.id -- assuming you meant t2?
FROM table_two t2
JOIN table_three t3 on ...
ORDER BY t2.%I DESC NULL LAST -- see below!
$q$, $1);
END
$func$ LANGUAGE plpgsql;
I appended NULLS LAST - you'll probably want that, too:
PostgreSQL sort by datetime asc, null first?
If p_sort_column is from the same table all the time, hard-code that table name / alias in the ORDER BY clause. Else, pass the table name / alias separately and auto-quote them separately to be safe:
Define table and column names as arguments in a plpgsql function?
I suggest to table-qualify all column names in a bigger query with multiple joins (t2.id not just id). Avoids various kinds of surprising results / confusion / abuse.
And you may want to schema-qualify your table names (myschema.table_two) to avoid similar troubles when calling the function with a different search_path:
How does the search_path influence identifier resolution and the "current schema"

Dynamic SELECT INTO in PL/pgSQL function

How can I write a dynamic SELECT INTO query inside a PL/pgSQL function in Postgres?
Say I have a variable called tb_name which is filled in a FOR loop from information_schema.tables. Now I have a variable called tc which will be taking the row count for each table. I want something like the following:
FOR tb_name in select table_name from information_schema.tables where table_schema='some_schema' and table_name like '%1%'
LOOP
EXECUTE FORMAT('select count(*) into' || tc 'from' || tb_name);
END LOOP
What should be the data type of tb_name and tc in this case?
CREATE OR REPLACE FUNCTION myfunc(_tbl_pattern text, _schema text = 'public')
RETURNS void AS -- or whatever you want to return
$func$
DECLARE
_tb_name information_schema.tables.table_name%TYPE; -- currently varchar
_tc bigint; -- count() returns bigint
BEGIN
FOR _tb_name IN
SELECT table_name
FROM information_schema.tables
WHERE table_schema = _schema
AND table_name ~ _tbl_pattern -- see below!
LOOP
EXECUTE format('SELECT count(*) FROM %I.%I', _schema, _tb_name)
INTO _tc;
-- do something with _tc
END LOOP;
END
$func$ LANGUAGE plpgsql;
Notes
I prepended all parameters and variables with an underscore (_) to avoid naming collisions with table columns. Just a useful convention.
_tc should be bigint, since that's what the aggregate function count() returns.
The data type of _tb_name is derived from its parent column dynamically: information_schema.tables.table_name%TYPE. See the chapter Copying Types in the manual.
Are you sure you only want tables listed in information_schema.tables? Makes sense, but be aware of implications. See:
How to check if a table exists in a given schema
a_horse already pointed to the manual and Andy provided a code example. This is how you assign a single row or value returned from a dynamic query with EXECUTE to a (row) variable. A single column (like count in the example) is decomposed from the row type automatically, so we can assign to the scalar variable tc directly - in the same way we would assign a whole row to a record or row variable. Related:
How to get the value of a dynamically generated field name in PL/pgSQL
Schema-qualify the table name in the dynamic query. There may be other tables of the same name in the current search_path, which would result in completely wrong (and very confusing!) results without schema-qualification. Sneaky bug! Or this schema is not in the search_path at all, which would make the function raise an exception immediately.
How does the search_path influence identifier resolution and the "current schema"
Always quote identifiers properly to defend against SQL injection and random errors. Schema and table have to be quoted separately! See:
Table name as a PostgreSQL function parameter
Truncating all tables in a Postgres database
I use the regular expression operator ~ in table_name ~ _tbl_pattern instead of table_name LIKE ('%' || _tbl_pattern || '%'), that's simpler. Be wary of special characters in the pattern parameter either way! See:
PostgreSQL Reverse LIKE
Escape function for regular expression or LIKE patterns
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
I set a default for the schema name in the function call: _schema text = 'public'. Just for convenience, you may or may not want that. See:
Assigning default value for type
Addressing your comment: to pass values, use the USING clause like:
EXECUTE format('SELECT count(*) FROM %I.%I
WHERE some_column = $1', _schema, _tb_name,column_name)
USING user_def_variable;
Related:
INSERT with dynamic table name in trigger function
It looks like you want the %I placeholder for FORMAT so that it treats your variable as an identifier. Also, the INTO clause should go outside the prepared statement.
FOR tb_name in select table_name from information_schema.tables where table_schema='some_schema' and table_name like '%1%'
LOOP
EXECUTE FORMAT('select count(*) from %I', tb_name) INTO tc;
END LOOP

IF... ELSE... two mutually exclusive inserts INTO #temptable

I need to insert either set A or set B of records into a #temptable, depending on certain condition
My pseudo-code:
IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1;
IF {some-condition}
SELECT {columns}
INTO #t1
FROM {some-big-table}
WHERE {some-filter}
ELSE
SELECT {columns}
INTO #t1
FROM {some-other-big-table}
WHERE {some-other-filter}
The two SELECTs above are exclusive (guaranteed by the ELSE operator). However, SQL compiler tries to outsmart me and throws the following message:
There is already an object named '#t1' in the database.
My idea of "fixing" this is to create #t1 upfront and then executing a simple INSERT INTO (instead of SELECT... INTO). But I like minimalism and am wondering whether this can be achieved in an easier way i.e. without explicit CREATE TABLE #t1 upfront.
Btw why is it NOT giving me an error on a conditional DROP TABLE in the first line? Just wondering.
You can't have 2 temp tables with the same name in a single SQL batch. One of the MSDN article says "If more than one temporary table is created inside a single stored procedure or batch, they must have different names". You can have this logic with 2 different temp tables or table variable/temp table declared outside the IF-Else block.
Using a Dyamic sql we can handle this situation. As a developoer its not a good practice. Best to use table variable or temp table.
IF 1=2
BEGIN
EXEC ('SELECT 1 ID INTO #TEMP1
SELECT * FROM #TEMP1
')
END
ELSE
EXEC ('SELECT 2 ID INTO #TEMP1
SELECT * FROM #TEMP1
')

Possible to use a PostgreSQL TYPE to define a dblink table?

In Postgres, you can link to your other databases using dblink like so:
SELECT *
FROM dblink (
'dbname=name port=1234 host=host user=user password=password',
'select * from table'
) AS users([insert each column name and its type here]);
But this is quite verbose.
I've shortened it up by using dblink_connect and dblink_disconnect to abstract the connection string from my dblink queries. However, that still leaves me with the manual table definition (i.e., [insert each column name and its type here]).
Instead of defining the table manually, is there a way I can define it with a TYPE or anything else that'd be re-usable?
In my case, the number of remote tables I have to join and the number of columns involved makes my query massive.
I tried something along the lines of:
SELECT *
FROM dblink (
'myconn',
'select * from table'
) AS users(postgres_pre_defined_type_here);
But I received the following error:
ERROR: a column definition list is required for functions returning "record"
As you considered creating several types for dblink, you can accept creating several functions as well. The functions will be well defined and very easy to use.
Example:
create or replace function dblink_tables()
returns table (table_schema text, table_name text)
language plpgsql
as $$
begin
return query select * from dblink (
'dbname=test password=mypassword',
'select table_schema, table_name from information_schema.tables')
as tables (table_schema text, table_name text);
end $$;
select table_name
from dblink_tables()
where table_schema = 'public'
order by 1