Is there any equivalent to COST and VOLATILE in Oracle (migrating from postgres) - postgresql

My question is pretty straightforward:
I'm migrating functions from PostgreSQL to Oracle, and we have some functions defined with
COST 100
VOLATILE
Is there any way to do this on oracle or does Oracle manage it automatically?
COST n tells the optimizer that the cost of executing the function will be n (PostgreSQL has no idea how expensive a function is), and VOLATILE specifies that there are no guarantees that the function will return the same result for the same parameters, which is also used by the optimizer.

Since both COST 100 and VOLATILE are the default values for user-defined functions, I suspect that you never consciously set them. I would simply omit both clauses when migrating to Oracle.

Related

Is it possible to monitor PostgreSQL server performance from inside a PL/PGSQL function?

This may sound exotic, by I would like to programmatically know if it is a 'good moment' to execute a heavy-write PL/PGSQL function in a server. By 'good moment' I mean pondering some direct or calculated indicator of the load level, concurrency level, or any other magnitude of the PostgreSQL server.
I am aware there is a number of advanced applications specialized in performance tracking out there, like https://www.datadoghq.com. But I just want a simple internal KPI that alters or delays the execution of these heavy-write procedures until a 'better moment' comes.
Some of these procedures purge tables, some make average/sum calculations over millions of rows, some check remote tables, etc. They may wait for minutes or hours for a 'better moment' when the concurrent user pressure comes down.
Any idea?
You can see how many other sessions are active by something like:
select count(*) from pg_stat_activity where state='active';
But you have to be a superuser, or have the pg_monitor role, or else state will be NULL for sessions of other users. If that bothers you, you could write a function with SECURITY DEFINER to allow access to this info. (You should probably be putting this into its own function anyway, which means there is no reason that it needs to be implemented in plpgsql unless that is the only language available to you.)
You can also invoke any arbitrary OS operations by using a suitably privileged pl language. That includes from plpgsql, by abusing COPY...FROM PROGRAM.

why functions that returns tables are so much slower then running the actual query?

I'm pretty new to PostgreSQL so I guess i'm missing some basic information, information that I didn't quite find while googling, guess I didn't really know the right keywords, hopefully here I'll get the missing information :)
I'm using PostgreSQL 11.4.
I've encountered many issues when I create a function that returns a query result as a table, and it executes it about 50 times slower then running the actual query, sometimes even more then that.
I understand that IMMUTABLE can be used when there is no table scans, just when I manipulate and return data based on the function parameters and STABLE when if the query with same parameters do a table scan and always returns the same results.
so the format of my function creation is this:
CREATE FUNCTION fnc_name(columns...)
RETURNS TABLE ( columns..) STABLE AS $func$
BEGIN
select ...
END $func$ LANGUAGE pgplsql;
I can't show the query here since it's work related, but still... there is something that I didn't quite understand about creating functions why is it so slow ? I need to fully understand this issue cause I need to create many more functions and it seems right now that I need to run the actual query to get proper performance instead of using functions and I still don't really have a clue as to why!
any information regarding this issue would be greatly appreciated.
All depends on usage of this function, and size of returned relation.
First I have to say - don't write these functions. It is known antipattern. I'll try to explain why. Use views instead.
Result of table functions written in higher PL languages like Perl, Python or PLpgSQL is materialized. When table is small (to work_mem) it is stored in memory. Bigger tables are stored in temp file. It can have significant overhead.
Function is a black box for optimizer - is not possible to push down predicates, there are not correct statistics, there is not possible to play with form of joins or order of joins. So some not trivial queries can be slower (little bit or significantly) due impossible optimizations.
There is a exception from these rules - simple SQL functions. SQL functions (functions with single SQL statement) can be inlined (when some prerequisites are true). Due inlining the body of function is merged to body of outer SQL query, and the result is same like you will write subquery directly. So result is not materialized and it is not a barrier for optimization.
There is a basic rule - use functions only when you cannot to calculate some data by SQL. Don't try to hide SQL or encapsulate SQL (elsewhere - for simplification some complex queries use views not functions). Same rules are valid for all SQL databases (Oracle, DB2, MSSQL). Postgres is not a exception.
This note is not against stored procedures (functions). It is great technology. But it requires specific style of programming. Wrapping queries into functions (when there is not any other) is bad.

Why we use different languages in postgresql function

Create Function Abc()
Returns setof
as $$
language 'sql'
or
language 'plpgsql'
PostgreSQL supports more languages long time - originally, there was functions in SQL language. This language is great for simple tasks, can be used as macro - some simple cases are inlined (and then the overhead of wrapping code to function is zero). And anybody knows SQL.
PLpgSQL is based on Oracle's PL/SQL - it is great procedural languages with integrated SQL. Lot of people knows PL/SQL and can quickly works with PL/pgSQL.
PLPerl, PLPerlu, PLPython - these languages are languages for external procedures. These languages are generic - has not optimized for usage inside databases, but can do very fast lot of other tasks or can use pretty wide set of libraries (although this usage can be little bit dangerous) - "u" at end means "untrusted".
Mostly time PostgreSQL developers uses SQL for one line functions (it is like macro languages), and PLpgSQL for manipulation with data. Languages for external procedures are used exceptionally - I like use Python for XML parsing for example.

Execute multiple functions together without losing performance

I have this process that has to make a series of queries, using pl/pgsql:
--process:
SELECT function1();
SELECT function2();
SELECT function3();
SELECT function4();
To be able to execute everything in one call, I created a process function as such:
CREATE OR REPLACE FUNCTION process()
RETURNS text AS
$BODY$
BEGIN
PERFORM function1();
PERFORM function2();
PERFORM function3();
PERFORM function4();
RETURN 'process ended';
END;
$BODY$
LANGUAGE plpgsql
The problem is, when I sum the time that each function takes by itself, the total is 200 seconds, while the time that the function process() takes is more than one hour!
Maybe it's a memory issue, but I don't know which configuration on postgresql.conf should I change.
The DB is running on PostgreSQL 9.4, in a Debian 8.
You commented that the 4 functions have to run consecutively. So it's safe to assume that each function works with data from tables that have been modified by the previous function. That's my prime suspect.
Any Postgres function runs inside the transaction of the outer context. So all functions share the same transaction context if packed into another function. Each can see effects on data from previous functions, obviously. (Even though effects are still invisible to other concurrent transactions.) But statistics are not updated immediately.
Query plans are based on statistics on involved objects. PL/pgSQL does not plan statements until they are actually executed, that would work in your favor. Per documentation:
As each expression and SQL command is first executed in the function,
the PL/pgSQL interpreter parses and analyzes the command to create a
prepared statement, using the SPI manager's SPI_prepare function.
PL/pgSQL can cache query plans, but only within the same session and (in pg 9.2+ at least) only after a couple of executions have shown the same query plan to work best repeatedly. If you suspect this going wrong for you, you can work around it with dynamic SQL which forces a new plan every time:
EXECUTE 'SELECT function1()';
However, the most likely candidate I see is invalidated statistics that lead to inferior query plans. SELECT / PERFORM statements (same thing) inside the function are run in quick succession, there is no chance for autovacuum to kick in and update statistics between one function and the next. If one function substantially alters data in a table the next function is working with, the next function might base its query plan on outdated information. Typical example: A table with a few rows is filled with many thousands of rows, but the next plan still thinks a sequential scan is fastest for the "small" table. You state:
when I sum the time that each function takes by itself, the total is
200 seconds, while the time that the function process() takes is more
than one hour!
What exactly does "by itself" mean? Did you run them in a single transaction or in individual transactions? Maybe even with some time in between? That would allow autovacuum to update statistics (it's typically rather quick) and possibly lead to completely different query plans based on the changed statistic.
You can inspect query plans inside plpgsql functions with auto-explain
Postgres query plan of a UDF invocation written in pgpsql
If you can identify such an issue, you can force ANALYZE in between statements. Being at it, for just a couple of SELECT / PERFORM statements you might as well use a simpler SQL function and avoid plan caching altogether (but see below!):
CREATE OR REPLACE FUNCTION process()
RETURNS text
LANGUAGE sql AS
$func$
SELECT function1();
ANALYZE some_substantially_affected_table;
SELECT function2();
SELECT function3();
ANALYZE some_other_table;
SELECT function4();
SELECT 'process ended'; -- only last result is returned
$func$;
Also, as long as we don't see the actual code of your called functions, there can be any number of other hidden effects.
Example: you could SET LOCAL ... some configuration parameter to improve the performance of your function1(). If called in separate transactions that won't affect the rest. The effect only last till the end of the transaction. But if called in a single transaction it affects the rest, too ...
Basics:
Difference between language sql and language plpgsql in PostgreSQL functions
PostgreSQL Stored Procedure Performance
Plus: transactions accumulate locks, which binds an increasing amount of resources and may cause increasing friction with concurrent processes. All locks are released at the end of a transaction. It's better to run big functions in separate transactions if at all possible, not wrapped in a single function (and thus transaction). That last item is related to what #klin and IMSoP already covered.
Warning for future readers (2015-05-30).
The technique described in the question is one of the smartest ways to effectively block the server.
In some corporations the use of this technology can meet with the reward in the form of immediate termination of the employment contract.
Attempts to improve this method are useless. It is simple, beautiful and sufficiently effective.
In RDMS the support of transactions is very expensive. When executing a transaction the server must create and store information on all changes made to the database to make these changes visible in environment (other concurrent processes) in case of a successful completion, and in case of failure, to restore the state before the transaction as soon as possible. Therefore the natural principle affecting server performance is to include in one transaction a minimum number of database operations, ie. only as much as is necessary.
A Postgres function is executed in one transaction. Placing in it many operations that could be run independently is a serious violation of the above rule.
The answer is simple: just do not do it. A function execution is not a mere execution of a script.
In the procedural languages used to write applications there are many other possibilities to simplify the code by using functions or scripts. There is also the possibility to run scripts with shell.
The use a Postgres function for this purpose would make sense if there were a possibility of using transactions within the function. At present, such a possibility does not exist, although discussions on this issue have already long history (you can read about it e.g. in postgres mailing lists).

what are the advantages of using plpgsql in postgresql

Besides the syntactic sugar and expressiveness power what are the differences in runtime efficiency. I mean, plpgsql can be faster than, lets say plpythonu or pljava? Or are they all approximately equals?
We are using stored procedures for the task of detecting nearly-duplicates records of people in a moderately sized database (around 10M of records)
plpgsql provides greater type safety I believe, you have to perform explicit casts if you want to perform operations using two different columns of similar type, like varchar and text or int4 and int8. This is important because if you need to have your stored proc use indexes, postgres requires that the types match exactly between join conditions (edit: for equality checks too I think).
There may be a facility for this in the other languages though, I haven't used them. In any case, I hope this gives you a better starting point for your investigation.
plpgsql is very well integrated with SQL - the source code should be very clean and readable. For SQL languages like PLJava or PLPython, SQL statements have to be isolated - SQL isn't part of language. So you have to write little bit more code. If your procedure has lot of SQL statements, then plpgsql procedure should be cleaner, shorter and little bit faster. When your procedure hasn't SQL statements, then procedures from external languages can be faster - but external languages (interprets) needs some time for initialisation - so for simple task, procedures in SQL or plpgsql language should be faster.
External languages are used when you need some functionality like access to net, access to filesystem - http://www.postgres.cz/index.php/PL/Perlu_-_Untrusted_Perl_%28en%29
What I know - people usually use a combination of PL languages - (SQL,plpgsql, plperl) or (SQL, plpgsql, plpython).
Without doing actual testing, I would expect plpgsql to be somewhat more efficient than other languages, because it's small. Having said that, remember that SQL functions are likely to be even faster than plpgsql, if a function is simple enough that you can write it in just SQL.