Why we use different languages in postgresql function - plpgsql

Create Function Abc()
Returns setof
as $$
language 'sql'
or
language 'plpgsql'

PostgreSQL supports more languages long time - originally, there was functions in SQL language. This language is great for simple tasks, can be used as macro - some simple cases are inlined (and then the overhead of wrapping code to function is zero). And anybody knows SQL.
PLpgSQL is based on Oracle's PL/SQL - it is great procedural languages with integrated SQL. Lot of people knows PL/SQL and can quickly works with PL/pgSQL.
PLPerl, PLPerlu, PLPython - these languages are languages for external procedures. These languages are generic - has not optimized for usage inside databases, but can do very fast lot of other tasks or can use pretty wide set of libraries (although this usage can be little bit dangerous) - "u" at end means "untrusted".
Mostly time PostgreSQL developers uses SQL for one line functions (it is like macro languages), and PLpgSQL for manipulation with data. Languages for external procedures are used exceptionally - I like use Python for XML parsing for example.

Related

Is there any equivalent to COST and VOLATILE in Oracle (migrating from postgres)

My question is pretty straightforward:
I'm migrating functions from PostgreSQL to Oracle, and we have some functions defined with
COST 100
VOLATILE
Is there any way to do this on oracle or does Oracle manage it automatically?
COST n tells the optimizer that the cost of executing the function will be n (PostgreSQL has no idea how expensive a function is), and VOLATILE specifies that there are no guarantees that the function will return the same result for the same parameters, which is also used by the optimizer.
Since both COST 100 and VOLATILE are the default values for user-defined functions, I suspect that you never consciously set them. I would simply omit both clauses when migrating to Oracle.

Postgres find types used in procedures and user-defined functions

Is it possible to get the list of user-defined Types & Domains that are used in all the stored procedures and user-defined functions?
Not easily, because the body of functions is stored as a string. One would have to parse that, which is particularly difficult, as there are so many procedural languages in PostgreSQL.
You could perform a substring search in the source code, but that is notoriously unreliable.

why functions that returns tables are so much slower then running the actual query?

I'm pretty new to PostgreSQL so I guess i'm missing some basic information, information that I didn't quite find while googling, guess I didn't really know the right keywords, hopefully here I'll get the missing information :)
I'm using PostgreSQL 11.4.
I've encountered many issues when I create a function that returns a query result as a table, and it executes it about 50 times slower then running the actual query, sometimes even more then that.
I understand that IMMUTABLE can be used when there is no table scans, just when I manipulate and return data based on the function parameters and STABLE when if the query with same parameters do a table scan and always returns the same results.
so the format of my function creation is this:
CREATE FUNCTION fnc_name(columns...)
RETURNS TABLE ( columns..) STABLE AS $func$
BEGIN
select ...
END $func$ LANGUAGE pgplsql;
I can't show the query here since it's work related, but still... there is something that I didn't quite understand about creating functions why is it so slow ? I need to fully understand this issue cause I need to create many more functions and it seems right now that I need to run the actual query to get proper performance instead of using functions and I still don't really have a clue as to why!
any information regarding this issue would be greatly appreciated.
All depends on usage of this function, and size of returned relation.
First I have to say - don't write these functions. It is known antipattern. I'll try to explain why. Use views instead.
Result of table functions written in higher PL languages like Perl, Python or PLpgSQL is materialized. When table is small (to work_mem) it is stored in memory. Bigger tables are stored in temp file. It can have significant overhead.
Function is a black box for optimizer - is not possible to push down predicates, there are not correct statistics, there is not possible to play with form of joins or order of joins. So some not trivial queries can be slower (little bit or significantly) due impossible optimizations.
There is a exception from these rules - simple SQL functions. SQL functions (functions with single SQL statement) can be inlined (when some prerequisites are true). Due inlining the body of function is merged to body of outer SQL query, and the result is same like you will write subquery directly. So result is not materialized and it is not a barrier for optimization.
There is a basic rule - use functions only when you cannot to calculate some data by SQL. Don't try to hide SQL or encapsulate SQL (elsewhere - for simplification some complex queries use views not functions). Same rules are valid for all SQL databases (Oracle, DB2, MSSQL). Postgres is not a exception.
This note is not against stored procedures (functions). It is great technology. But it requires specific style of programming. Wrapping queries into functions (when there is not any other) is bad.

pgsql stored procedure - internal, c or sql language is the best?

I have a production pgsql server with the following stored procedure language support:
internal
c
sql
I cannot find examples for internal and c, just pl/pgsql or in rare cases sql. I'll try to get the provider to install other languages, but providers usually not provide, so I don't think this will happen... So I am stucked with these languages...
Which one should I choose and why?
(if you have a good tutorial too, then please write it in your answer or in comment)
select * from pg_language
Btw I could not test the c and the internal without tutorial, so maybe there is a simple solution: I cannot use them because they are not trusted.
Edit - after the solution
The create language what worked for me. After that I checked what languages are available with the following query:
select * from pg_pltemplate
You can read more about create language here.
I will use plpgsql, I found a good book about postgresql here: The PostgreSQL Programmer's Guide , Edited by Thomas Lockhart
Typically, you can use four, five PL languages - SQL, PL/pgSQL, PL/Python or PL/Perl, C.
SQL - short one line functions - can be super fast due inlining (like macro)
PL/pgSQL - good for business logic implementation (if you like it or not, it can accelerate your application due: less network traffic, less data type conversions, less interprocess communication - PLpgSQL uses types compatible with Postgres and functions are executed in PostgreSQL SQL executor process) - good for code with lot of SQL queries due native support of SQL (you can like it or you can prefer ORM - personally I dislike ORM - it is main performance killer what I know).
PL/Python or PL/Perl - good for special tasks where PL/pgSQL is not good or miss necessary features - I like PL/Perl due possibility to use CPAN archive in PostgreSQL - need send main or need to do SOAP call - all is in CPAN.
C - need maximum performance or access to PostgreSQL internals - then use a C functions. A fast implementation of some generic strings, date, math routines are the most simply in C language.
Examples of C codes you can find in
documentation http://www.postgresql.org/docs/9.2/static/xfunc-c.html
contrib archive https://github.com/postgres/postgres/tree/master/contrib
PGXN archive http://pgxn.org/
pgfoundry archive http://pgfoundry.org/
C language can be used for implementation of own datatypes, necessary operations and index support. You can find lot of PostgreSQL extensions - very famous is PostGIS.
Looking at your listing of pg_language, this shows the default values: if I create a new database using createdb, PostgreSql 8.4/Debian, it's the same output. The listing may already contain another line for PL/pgSQL, depending on the version and/or your data center (pointed out by a_horse_with_no_name).
So you have
"built-in functions" (internal)
"Dynamically-loaded C functions" (c)
"SQL-language functions" (sql)
If you run
CREATE LANGUAGE plpgsql;
there will turn up another line for PL/pgSQL (if you have the privilege).
If you installed PL/Java for example, you would get
"Java trusted" (java)
"Java untrusted" (javau)
which show up in the listing as well.
Some guidelines as for choosing a language
If you want a higher level language, consider Scala (requires support for PL/Java or JVM based stored procedures respectively). So you have the functional paradigma not only in SQL, but also in your stored function/procedure. Of course, like Java you have OOP as well.
If you are using Java, have a look at Java stored procedures (requires PL/Java). As for an example, look here. In contrast to PL/pgSQL, you have full OOP.
PL/Java tends to be difficult to install, so it's not really appreciated by data centers. It's worth the trouble, because you can have the same language both for client/application servers and for stored procedures/functions: There is no need to learn another language. For example, you can access result sets the same way. The only thing that differs is the JDBC URL. In contrast to PL/pgSQL, these stored procedures are portable, if the other database supports JVM based stored procedures as well.
If you have to choose one from the already available languages, consider PL/pgSQL. It's normally always installed, and you do not have to deal with memory allocations.
If you have to interface with the operating system/libraries, there is C. To get an impression, look here. It's not really difficult, it's just more boiler-plate around the functionality.
If you want C++, it gets harder, because the interface between PostgreSql and the C/C++ modules uses the C calling convention, so you should have a C file which sits between PostgreSql and your C++ module. To get an impression, look here.
If you are not using PL/pgSQL, the most difficult part is the installation (PL/Java), and the interfacing code (PL/Java, PL/C, PL/C++). If you have set it up initially, it's really a pleasure to have the language you really want in stored procedures/functions as well. It's worth the trouble.
If you access the database from some software tool (for instance, from java through JDBC) you also develop, it may be better to simplify queries, do more job on the client side and avoid the database side scripting.
The rationale is that these server side scripts are more difficult to test (database is required for unit tests), debug (normally way more complex that for your own code under debugger) and maintain (upgrade, etc). Bugs in server side scripts are often overlooked for a longer time as being separate,
these scripts are only infrequently seen by the client side developers.
However if anyway preferred, we have used PL/PSQL in the past as it is possible to have the automated scripts that install all code on the server automatically just through JDBC connection.

what are the advantages of using plpgsql in postgresql

Besides the syntactic sugar and expressiveness power what are the differences in runtime efficiency. I mean, plpgsql can be faster than, lets say plpythonu or pljava? Or are they all approximately equals?
We are using stored procedures for the task of detecting nearly-duplicates records of people in a moderately sized database (around 10M of records)
plpgsql provides greater type safety I believe, you have to perform explicit casts if you want to perform operations using two different columns of similar type, like varchar and text or int4 and int8. This is important because if you need to have your stored proc use indexes, postgres requires that the types match exactly between join conditions (edit: for equality checks too I think).
There may be a facility for this in the other languages though, I haven't used them. In any case, I hope this gives you a better starting point for your investigation.
plpgsql is very well integrated with SQL - the source code should be very clean and readable. For SQL languages like PLJava or PLPython, SQL statements have to be isolated - SQL isn't part of language. So you have to write little bit more code. If your procedure has lot of SQL statements, then plpgsql procedure should be cleaner, shorter and little bit faster. When your procedure hasn't SQL statements, then procedures from external languages can be faster - but external languages (interprets) needs some time for initialisation - so for simple task, procedures in SQL or plpgsql language should be faster.
External languages are used when you need some functionality like access to net, access to filesystem - http://www.postgres.cz/index.php/PL/Perlu_-_Untrusted_Perl_%28en%29
What I know - people usually use a combination of PL languages - (SQL,plpgsql, plperl) or (SQL, plpgsql, plpython).
Without doing actual testing, I would expect plpgsql to be somewhat more efficient than other languages, because it's small. Having said that, remember that SQL functions are likely to be even faster than plpgsql, if a function is simple enough that you can write it in just SQL.