I have a PostGIS query where I really need to have nested queries inside PostGIS function calls:
UPDATE raw.geocoding
SET the_geom = ST_Centroid(
ST_Collect(
SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C000081610F9CC5DC3341EE672E6E723B3241')::varchar),
SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C00002CF887E0C5DC3341C9E5B2DF2A383241')::varchar)
)
)
WHERE hash = ((E'3e638a27c6c38f05026252f4a0b57b2e')::varchar)
Unfortunately, this doesn't work. I get a syntax error at the beginning of the nested query:
ERROR: syntax error at or near "SELECT"
LINE 4: SELECT the_geom
^
********** Error **********
ERROR: syntax error at or near "SELECT"
SQL state: 42601
Character: 86
Looks like I cannot have a nested query as a PostGIS function parameter?
I've perused through the PostGIS documentation and cannot find any clear guidance for dealing with this.
It appears Postgres has a way of doing variables in pgSQL, but it's unclear to me how this would be pulled off in a standard query. This is a query that will be run tens or hundreds of thousands of times from a C# program. That aside, I could do a pgSQL stored procedure if required; just wanted to make sure there wasn't a simpler alternative first.
In case you were wondering, the query looks messy because it's the result of a npgsql-generated parameterized query. I think it's fair to say that npgsql is being extra-cautious with redundant typing and escaping.
I am running PostGIS 2.0.1, Postgres 9.1.5, and npgsql 2.0.12.
It sounds like you want a scalar subquery, an expression written like (SELECT ....) (note enclosing parentheses) that contains a query returning either zero rows (NULL result) or one field from one row.
You were most of the way there, you just needed the parens:
UPDATE raw.geocoding
SET the_geom = ST_Centroid(
ST_Collect(
(SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C000081610F9CC5DC3341EE672E6E723B3241')::varchar)),
(SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C00002CF887E0C5DC3341C9E5B2DF2A383241')::varchar))
)
)
WHERE hash = ((E'3e638a27c6c38f05026252f4a0b57b2e')::varchar)
Note that subqueries can be used in other places too - table returning subqueries can appear in FROM, for example. The PostgreSQL manual teaches about all this, and is well worth a cover-to-cover read.
If you're doing a lot of these updates, you may find it more efficient to formulate the UPDATE as a join using the PostgreSQL extension UPDATE ... FROM ... WHERE rather than running lots of individual UPDATEs over and over. I just wanted to raise the possibility. See from-list in UPDATE
Related
I'm using a PostgreSQL with a Go driver. Sometimes I need to query not existing fields, just to check - maybe something exists in a DB. Before querying I can't tell whether that field exists. Example:
where size=10 or length=10
By default I get an error column "length" does not exist, however, the size column could exist and I could get some results.
Is it possible to handle such cases to return what is possible?
EDIT:
Yes, I could get all the existing columns first. But the initial queries can be rather complex and not created by me directly, I can only modify them.
That means the query can be simple like the previous example and can be much more complex like this:
WHERE size=10 OR (length=10 AND n='example') OR (c BETWEEN 1 and 5 AND p='Mars')
If missing columns are length and c - does that mean I have to parse the SQL, split it by OR (or other operators), check every part of the query, then remove any part with missing columns - and in the end to generate a new SQL query?
Any easier way?
I would try to check within information schema first
"select column_name from INFORMATION_SCHEMA.COLUMNS where table_name ='table_name';"
And then based on result do query
Why don't you get a list of columns that are in the table first? Like this
select column_name
from information_schema.columns
where table_name = 'table_name' and (column_name = 'size' or column_name = 'length');
The result will be the columns that exist.
There is no way to do what you want, except for constructing an SQL string from the list of available columns, which can be got by querying information_schema.columns.
SQL statements are parsed before they are executed, and there is no conditional compilation or no short-circuiting, so you get an error if a non-existing column is referenced.
I create one sequence in postgres and fire one query which is mentioned below
SELECT M_PRODUCTSEQ.NEXTVAL from DUAL;
but it gives me the below error:
ERROR: relation "dual" does not exist.
Kindly help me out. How can i made the relation with dual?
PostgreSQL does NOT support the from DUAL syntax. It does however make the from portion of a query like this optional, so getting the next value (nextval) of a sequence you would do something like this:
SELECT nextval('m_productseq');
I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL
SELECT peridle, CPU
FROM (SELECT MAX(peridle) FROM try2);
While executing this query in hive I am getting following error
Parse Error: line 1:47 cannot recognize input near 'select' 'MAX' '(' in expression specification
Please suggest a solution how to use aggregate functions in hive subquery
At least two things need to be fixed here:
You are not returning fields named peridle or CPU from the sub-query, yet you are trying to select them.
Hive requires you to alias all sub-queries, even if you don't reference the alias. You can quickly do this by changing the ); at the end to ) x; (or however you want to call it).
CREATE OR REPLACE FUNCTION fnMyFunction(recipients recipient[]) ...
FOREACH v_recipient IN ARRAY recipients
LOOP
v_total := v_total + v_recipient.amount;
INSERT INTO tmp_recipients(id, amount)
VALUES(v_recipient.id, v_recipient.amount::numeric(10,2));
END LOOP;
...
This works great in dev environment, but just found out the release environment is 8.4 which doesn't appear to support the FOREACH construct. I was hoping someone might shed some light on an alternative implementation for loop though the array parameter set and using values from array in a similar fashion to avoid a complete refactor.
The error message I am receiving is:
ERROR: syntax error at or near "FOREACH" SQL state: 42601 Context: SQL
statement in PL/PgSQL function "fnMyFunction" near line ##
The db environment is on a shared host so I have no options for platform upgrade.
I tagged postgres 9.1 and 8.4 because the function works properly in 9.x but fails on 8.4.
Use unnest; I think it was in 8.4. Untested but I think is right:
FOR v_recipient IN SELECT vr FROM unnest(recipients) x(vr)
LOOP
....
END LOOP;
If you can't do that you'll have to loop over array_length using indexing into the array.
The way you have it, you execute one INSERT at a time. With relational databases, set-based operations are regularly much faster than iterating through records one at a time.
This should be simpler, faster and work with PostgreSQL 8.4 or later:
INSERT INTO tmp_recipients(id, amount)
SELECT (r.col).*
FROM (SELECT unnest(recipients) AS col) r
This assumes that the composite base type of the array consists of (id, amount) - in that order - and the type of amount can be coerced to numeric(10,2). Else, or just to be sure, be more explicit:
INSERT INTO tmp_recipients(id, amount)
SELECT (r.col).id, (r.col).amount::numeric(10,2)
FROM (SELECT unnest(recipients) AS col) r
The parentheses around (r.col) are not optional. They are required to disambiguate the syntax for the composite type.