How can I know table (table_1) is being used in which all UDF?
Below query gives me table's details:
SELECT * FROM information_schema.tables;
Below Query gives UDF details:
select * FROM pg_proc;
But how can I know that table1 is used in which all UDF?
A string search in the prosrc column of pg_proc is the only way if you want to find dependencies between functions and tables.
Of course that is not very satisfying, because it would be rather difficult to say if – say – an occurrence of table_1 is a reference to the table or a variable name. Also, you cannot find the source of C functions in the catalog.
To get a reliable answer, you would need insight into the language in which the function is written, and here is the core of the problem: PostgreSQL does not have any insight into the language! PostgreSQL's fabled extensibility allows you to define new languages for functions, and only the language handler knows how to interpret the string that is the function body. That also holds for PL/pgSQL which is shipped with PostgreSQL.
That is also the reason why there are no pg_depend entries for objects used in functions.
Related
I use PostgreSQL exclusively. I have no plans to ever change this. However, I recognize that other people are not me, and they instead use MySQL, MS SQL, IBM SQL, SQLite SQL, Oracle SQL and ManyOthers SQL. I'm aware that they have different names in reality.
My queries look like:
SELECT * FROM table WHERE id = $1;
UPDATE table SET col = $1 WHERE col = $2;
INSERT INTO table (a, b, c) VALUES ($1, $2, $3);
My database wrapper functions currently support only PostgreSQL, by internally calling the pg_* functions.
I wish to support "the other databases" too. This would involve (the trivial part) to make my wrapper functions able to interact with the other databases by using the PHP functions for those.
The difficult part is to reconstruct the PostgreSQL-flavor SQL queries from the application into something that works identically yet will be understood by the other SQL database in use, such as MySQL. This obviously involves highly advanced parsing, analysis and final creation of the final query string. For example, this PostgreSQL SQL query:
SELECT * FROM table WHERE col ILIKE $1 ORDER BY random() LIMIT 1;
... will be turned into WeirdSQL like this:
SELECT * FROM table WHERE col ISEQUALTOKINDA %1 ORDER BY rnd() LIMIT 1;
I don't require support from any other input SQL flavor than PostgreSQL, but the output must be "all the big SQL database vendors".
Has anyone even attempted this? Or is it something that is never gonna happen as free software but might exist as a commercial offering? It sounds like it would be a thing. It would be insanely useful, and "crazier" projects have been attempted.
jOOQ is a Java library that aims to hide differences between databases. It has its own SQL grammar which tries to be compatible with everything (but parameter markers must be the JDBC ?), and generates DB-specific SQL from that.
There is an online translator, which generates the following from your query for Oracle:
select *
from table
where lower(cast(col as varchar2(4000))) like lower(cast(:1 as varchar2(4000)))
order by DBMS_RANDOM.RANDOM
fetch next 1 rows only
ODBC uses its own syntax on top on the database's syntax. ODBC drivers are required to convert ODBC parameter markers (?) to whatever the database uses, and to translate escape sequences for certain elements that are likely to have a non-standard syntax in the DB (time/GUID/interval literals, LIKE escape character, outer joins, procedure calls, function calls).
However, most escape sequences are optional, and this does not help with other syntax differences, such as the LIMIT 1.
ODBC drivers provide a long list of information about SQL syntax details, but it is the application's job to construct queries that conform to those restrictions, and not all differences can be described by this list. In practice, most ODBC applications restrict themselves to a commonly supported subset of SQL.
I'm reviewing log of executed PostgreSQL statements and stumble upon one statement I can't totally understand. Can somebody explain what PostgreSQL actually do when such query is executed? What is siq_query?
select *
from siq_query('', '21:1', '', '("my search string")', False, True, 'http://siqfindex:8080/storediq/findex')
I'm running PostgreSQL 9.2
siq_query(...) is a server-side function taking 7 input parameters (or more). It's not part of any standard Postgres distribution I know (certainly not mainline Postgres 9.2), so it has to be user-defined or part of some extension you installed. It does whatever is defined in the function. This can include basically anything your Postgres user is allowed to do. Unless it's a SECURITY DEFINER function, then it ca do whatever the owner of the function is allowed to do.
The way it is called (SELECT * FROM), only makes sense if it returns multiple rows and/or columns, most likely a set of rows, making it a "set-returning function", which can be used almost like a table in SQL queries.
Since the function name is not schema-qualified, it has to reside in a visible schema. See:
How does the search_path influence identifier resolution and the "current schema"
Long story short, you need to see the function definition to know what it does exactly. You can use psql (\df+ siq_query), pgAdmin (browse and select it to see its definition in the SQL pane) or any other client tool to look it up. Or query the system catalog pg_proc directly:
SELECT * FROM pg_proc WHERE proname = 'siq_query';
Pay special attention to the column prosrc, which holds the function body for some languages like plpgsql.
There might be multiple variants of that name, Postgres allows function overloading.
I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL
I would like to add some extension function in Postgresql like max/min, but I could not find the source function of them. Could anyone suggest which part of the source code should I view? thanks,
Here is an example. I have relation: model(id int) where model is a bunch of CAD models each one has an ID ; I want to find all models which id>5 and area>5. but I do not want to calculate all face area, so I uses having clause only calculate a subset. here is the query:
select model.id, model.face_number
from
model
where
id>5
group by model.id
having
area(model.id)>5;
I want to define function area(oid) function like max/min as FDW. but I do not know how to pass the input parameters, so I want to compare it with min/max.
This doesn't make much sense.
min and max are aggregate functions. They reduce a set of rows into a single value.
Your problem description doesn't seem to have much to do with aggregation. So it's not at all clear that aggregate functions have anything to do with it.
If you really do need to write an aggregate function, start with the PostgreSQL manual:
User-defined aggregates
C-language extension functions
Writing extensions
Extension-building infrastructure
I strongly recommend that you prototype your aggregate function in PL/PgSQL or another procedural language. Write it in C only if you've demonstrated that it can work using a quicker-to-work-with language, and determined that you need it faster than you can do with PL/PgSQL or PL/Python or whatever.
Anyway, if you want to find the implementation of min/max, start here:
select a.*, so.oprname as aggsortopname, tt.typname as aggtranstypename
from pg_aggregate a
inner join pg_proc p on (a.aggfnoid = p.oid)
inner join pg_type tt on (a.aggtranstype = tt.oid)
inner join pg_operator so on (a.aggsortop = so.oid)
where p.proname = 'max';
There you'll see that the aggregate is composed of multiple parts: a transform function, a sort operator, a transitional state type, an optional final function, etc. The documentation on user-defined aggregates explains that in detail.
So there's no single "max function". The definition of max in pg_proc.h actually just refers to a dummy function.
So for max(int4), it's defined as the transition function int4larger (src/backend/utils/adt/int.c) over transition type int4, with the sort operator >, with no final function.
You do not want an aggregate function for what you have described. You also should not worry about performance until you have a working version--likely PostgreSQL's query optimizer will do exactly what you want if you write this:
select model.id, model.face_number
from
model
where
id>5 and area(model.id)>5;
Here is an example function:
CREATE FUNCTION area(int in_id)
RETURNS double precision AS $$
SELECT length*width FROM model WHERE id=in_id;
$$ LANGUAGE SQL STABLE;
Of course you can replace length*width with some more appropriate calculation.
I cannot find any documentation regarding HSTORE data access using the C library. Currently I'm considering to just convert the HSTORE columns into arrays in my queries but is there a way to avoid such conversions?
libpqtypes appears to have some support for hstore.
Another option is to avoid directly interacting with hstore in your code. You can still benefit from it in the database without dealing with its text representation on the client side. Say you want to fetch a hstore field; you just use:
SELECT t.id, k, v FROM thetable t, LATERAL each(t.hstorefield);
or on old PostgreSQL versions you can use the quirky and nonstandard set-returning-function-in-SELECT form:
SELECT t.id, each(t.hstorefield) FROM thetable t;
(but watch out if selecting multiple records from t this way, you'll get weird results wheras LATERAL will be fine).
Another option is to use hstore_to_array or hstore_to_matrix when querying, if you're comfortable dealing with PostgreSQL array representation.
To create hstore values you can use the hstore constructors that take arrays. Those arrays can in turn be created with array_agg over a VALUES clause if you don't want to deal with PostgreSQL's array representation in your code.
All this mess should go away in future, as PostgreSQL 9.4 is likely to have much better interoperation between hstore and json types, allowing you to just use the json representation when interacting with hstore.
The binary protocol for hstore is not complicated.
See the _send and _recv functions from its IO code.
Of course, that means requesting (or binding) it in binary format in libpq.
(see the paramFormats[] and resultFormat arguments to PQexecParams)