pgsql stored procedure - internal, c or sql language is the best? - postgresql

I have a production pgsql server with the following stored procedure language support:
internal
c
sql
I cannot find examples for internal and c, just pl/pgsql or in rare cases sql. I'll try to get the provider to install other languages, but providers usually not provide, so I don't think this will happen... So I am stucked with these languages...
Which one should I choose and why?
(if you have a good tutorial too, then please write it in your answer or in comment)
select * from pg_language
Btw I could not test the c and the internal without tutorial, so maybe there is a simple solution: I cannot use them because they are not trusted.
Edit - after the solution
The create language what worked for me. After that I checked what languages are available with the following query:
select * from pg_pltemplate
You can read more about create language here.
I will use plpgsql, I found a good book about postgresql here: The PostgreSQL Programmer's Guide , Edited by Thomas Lockhart

Typically, you can use four, five PL languages - SQL, PL/pgSQL, PL/Python or PL/Perl, C.
SQL - short one line functions - can be super fast due inlining (like macro)
PL/pgSQL - good for business logic implementation (if you like it or not, it can accelerate your application due: less network traffic, less data type conversions, less interprocess communication - PLpgSQL uses types compatible with Postgres and functions are executed in PostgreSQL SQL executor process) - good for code with lot of SQL queries due native support of SQL (you can like it or you can prefer ORM - personally I dislike ORM - it is main performance killer what I know).
PL/Python or PL/Perl - good for special tasks where PL/pgSQL is not good or miss necessary features - I like PL/Perl due possibility to use CPAN archive in PostgreSQL - need send main or need to do SOAP call - all is in CPAN.
C - need maximum performance or access to PostgreSQL internals - then use a C functions. A fast implementation of some generic strings, date, math routines are the most simply in C language.
Examples of C codes you can find in
documentation http://www.postgresql.org/docs/9.2/static/xfunc-c.html
contrib archive https://github.com/postgres/postgres/tree/master/contrib
PGXN archive http://pgxn.org/
pgfoundry archive http://pgfoundry.org/
C language can be used for implementation of own datatypes, necessary operations and index support. You can find lot of PostgreSQL extensions - very famous is PostGIS.

Looking at your listing of pg_language, this shows the default values: if I create a new database using createdb, PostgreSql 8.4/Debian, it's the same output. The listing may already contain another line for PL/pgSQL, depending on the version and/or your data center (pointed out by a_horse_with_no_name).
So you have
"built-in functions" (internal)
"Dynamically-loaded C functions" (c)
"SQL-language functions" (sql)
If you run
CREATE LANGUAGE plpgsql;
there will turn up another line for PL/pgSQL (if you have the privilege).
If you installed PL/Java for example, you would get
"Java trusted" (java)
"Java untrusted" (javau)
which show up in the listing as well.
Some guidelines as for choosing a language
If you want a higher level language, consider Scala (requires support for PL/Java or JVM based stored procedures respectively). So you have the functional paradigma not only in SQL, but also in your stored function/procedure. Of course, like Java you have OOP as well.
If you are using Java, have a look at Java stored procedures (requires PL/Java). As for an example, look here. In contrast to PL/pgSQL, you have full OOP.
PL/Java tends to be difficult to install, so it's not really appreciated by data centers. It's worth the trouble, because you can have the same language both for client/application servers and for stored procedures/functions: There is no need to learn another language. For example, you can access result sets the same way. The only thing that differs is the JDBC URL. In contrast to PL/pgSQL, these stored procedures are portable, if the other database supports JVM based stored procedures as well.
If you have to choose one from the already available languages, consider PL/pgSQL. It's normally always installed, and you do not have to deal with memory allocations.
If you have to interface with the operating system/libraries, there is C. To get an impression, look here. It's not really difficult, it's just more boiler-plate around the functionality.
If you want C++, it gets harder, because the interface between PostgreSql and the C/C++ modules uses the C calling convention, so you should have a C file which sits between PostgreSql and your C++ module. To get an impression, look here.
If you are not using PL/pgSQL, the most difficult part is the installation (PL/Java), and the interfacing code (PL/Java, PL/C, PL/C++). If you have set it up initially, it's really a pleasure to have the language you really want in stored procedures/functions as well. It's worth the trouble.

If you access the database from some software tool (for instance, from java through JDBC) you also develop, it may be better to simplify queries, do more job on the client side and avoid the database side scripting.
The rationale is that these server side scripts are more difficult to test (database is required for unit tests), debug (normally way more complex that for your own code under debugger) and maintain (upgrade, etc). Bugs in server side scripts are often overlooked for a longer time as being separate,
these scripts are only infrequently seen by the client side developers.
However if anyway preferred, we have used PL/PSQL in the past as it is possible to have the automated scripts that install all code on the server automatically just through JDBC connection.

Related

Using Postgres' external procedural languages over application code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am trying to figure out the advantages and disadvantages of using non-plpgsql procedural languages (PL/Python, PL/Perl, PL/v8, etc.) to implement data manipulation logic on the database level instead of going up to model level/ORM of the application framework that interacts with the database (Rails, Entity Framework, Django, etc.) and implementing it there.
To give a concrete example, say, I have a table that contains Mustache templates, and I want to have them "rendered" somehow.
Table definition:
create table templates (
id serial primary key,
content text not null,
data jsonb not null
);
Usually I would go the model code and add and extra method to render the template. Example in Rails:
class Template < ApplicationRecord
def rendered
Mustache.render(content, data)
end
end
However, I could also write a PL/Python function that would do just that but on the database level:
create or replace function fn_mustache(template text, data jsonb)
returns text
language plpython3u
as $$
import chevron
import json
return chevron.render(template, json.loads(data))
$$;
create view v_templates as
select id, content, data, fn_mustache(content, data) as rendered
from templates;
This yields virtually the same result functionality-wise. This example is very basic, yet the idea is to use PL/Python (or others) to manipulate the data in a more advanced manner than PL/pgsql can allow for. That is, PL/pgsql does not have the same amount of libraries that any generic programming language provides today (in the example am relying on implementations of Mustache templating system which would not be practical to implement in PL/pgsql in this case). I obviously would not use PL/Python for any sort of networking or other OS-level features, but for operations exclusively on data this seems like a decent approach (change my mind).
Points that I can observe so far:
PL/Python is an "untrusted" language which I guess makes it by definition more dangerous to write a function in since you have access to syscalls; at least it feels like the cost of messing up a PL/Python function is higher than that of the mistake on the application layer, since the former is executed in the context of the database
Database approach is more extensible since I am working on the level that is the closest to the data, i.e. I am not scattering the presentation logic across multiple "tiers" (ORM and DB in this case). This means that if I need some other external service interested in interacting with the data, I can plug it directly into the database, bypassing the application layer.
Implementing this on model level just seems much simpler in execution
Supporting the application code variant seems easier as well since there are less concepts to keep in mind
What are the other advantages and disadvantages of these two approaches? (e.g. performance, maintainability)
You are wondering whether to have application logic inside the database or not. This is to a great extent a matter of taste. In the days of yore, the approach to implement application logic in database functions was more popular, but today it is usually frowned upon.
Extreme positions in this debate are
The application is implemented in the database to the extent that the database functions produce the HTML code that is sent to the client.
The database is just a dumb collection of tables with no triggers or constraints beyond a primary key, and the application tries to maintain data integrity.
The best solution is typically somewhere in the middle, but where is largely a matter of taste. You see that this is a typical opinion-based question. However, let me supply some arguments that help you make a decision.
Points speaking against application logic in the database:
It makes it more difficult to port to another database.
It is more complicated to develop and debug database functions than client code. For example, you won't have as advanced debugging tools.
That database machine has to perform not only the normal database workload, but also the application code workload. But databases are harder to scale than application servers (you can't just spin up a second database to handle part of the workload).
PostgreSQL-specific: all database functions run inside a single database transaction, so you cannot implement functionality that requires more complicated transaction management.
Points speaking for application logic in the database:
It becomes easier to port to another application server or client programming language.
Less data has to be transferred between client and server, which can make processing more efficient.
The software stack becomes shorter and the overall software architecture simpler.
My personal opinion is that anything that has to do with basic data integrity should be implemented in the database:
Have foreign keys and check constraints in the database. The application will of course also respect these rules (no point in triggering a database error), but it is good for data integrity to have a safety net.
If you have to keep redundant information in the database, use triggers to make sure that all copies of a datum are kept synchronized. This implicitly makes use of transactional atomicity.
Anything that is more complicated is best done in the application. Be wary of database functions that are very long or complicated. Exceptions can be made for performance reasons: perhaps some complicated report could not easily be written in pure SQL, and shipping all the raw data to the client is prohibitively expensive.

SphinxQL with php mysqli/pdo and prepared statements

When querying Sphinx through SphinxQL would you gain the standard benefits of using mysqli/pdo in PHP?
In additions is there any benefit to using prepared statements with SphinxQL? Are they even supported?
I don't think proper binary (ie in the protocol - server-side) prepared statements are supported. It would have to be software emulated (client-side), which wouldn't bring much benefit.
In general one of the main reasons (other than sql injection protection) for prepared statements, is to avoid the overhead of full SQL parsing on every command. the sql dialect understood by sphinx is much simpler than a full blown database server, so it should in general be much quicker that parsing the incoming statements.
You may as well use mysqli I would think, but PDO wouldnt bring much benefit.
But at the end of the day, use which is most familiar to you, rather than worrying about the tiny benefits each might bring :)

Oracle in the back, Access in the front?

I "inherited" an Access 2003 project. Now they've begun upgrading us to 2007. I'm low man on the totem pole (and rightly so), so I don't have access - ha, no pun intended - to the Big Mama Oracle db, only the dumps that have been saved as tables (and built into a multitude of queries) in Access.
So, some very basic questions in order to get my bearings.
I learned from this discussion that, owing to the complexity of the reports, I should be thinking in terms of Stored Procedures. OK, I like that idea. It's good programming.
Access 07 supports (apparently), something like stored procedures (doesn't it?). However, I've read scary things about it, and much of the rest of the department has yet to upgrade from '03. If I do my work in '07, their '03's will not know what to do with my beautiful Stored Procedures, right? FURTHERMORE, if it turns out that '07 is really NOT the right choice for this project (for whatever reason -- who knows, it's new to this operation), then all the time invested is instantly obsolesced.
Since Big Mama IS an Oracle dB, clearly that's got to be stable. So, why don't I just wrap my head around SP's in Oracle? It seems like it would result in the most robust application for all: I'm given to understand that I can teach both Access '03 and '07 how to call those Oracle SP's. Plus, my coding will be lower level and closer to the source, which promotes stability and efficiency.
Can I actually create an Oracle-centric SP in Access '07 (or '03). I kinda doubt it.
If you're stuck using Access backed by an Oracle database, I reckon a reasonable path to follow would be to offload as much work to Oracle as possible.
That means, get Oracle to do all the heavy lifting with procedures and functions (preferably encapsulated in packages), and views. Then, use JDBC to allow Access to just query and present the results.
This means learning SQL and PL/SQL, but I think it's worth it :)

Where can I get the ANSI or ISO standards for the RDBMS queries?

I want to write some queries which can work in almost all the databases without any SQLExceptions. So, where can I get the ANSI standards to write the queries ?
Not sure that'll help you.
Vendors are touch and go as far as standards implementation and often the standards themselves are imprecise enough such that you could never write a query that would work with all implementors.
For example, SQL 92 defines the concatenation operator as || but neither MySQL nor MSSQL use this (Oracle does). Vendor independent string concatenation is impossible.
Similarly, a standard escape character is not specified so how you handled that might not work in all vendors.
Having said that:
SQL 92:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Wiki article with links to SQL 99 ISO documents:
http://en.wikipedia.org/wiki/SQL:1999
From wikipedia:
The SQL standard is not freely available. The whole standard may be purchased from the ISO as ISO/IEC 9075(1-4,9-11,13,14):2008.
Nevertheless I would not advise you to follow this strategy because no database engine follows any SQL standard (SQL 99, 2003, etc.) to the letter. All of them take liberties in the way they handle instructions or define variables (for example, when comparing two strings different engines handle case sensitivity differently). A method that is very efficient with one engine can be terrible inefficient for another.
A suggestion would be to develop a standard group of queries and develop different classes that contain the specific implementation of that query for a certain target RDBMS.
Hope this helped
Check out the BNF of the core SQL grammars available at http://savage.net.au/SQL/
This is part of the answer - the rest, as pointed out by Kiranu and MattMitchell, is that different vendors implement the standard differently. No DBMS adheres perfectly to even SQL-92, though most are pretty close.
One observation: the SQL standard says nothing about indexes - so there is no standard syntax for creating an index. It also says nothing about how to create a database; each vendor has their own mechanisms for doing that.
The Sql-92 standard is probably the one you want to target. I believe it's supported most of the major RDBMSs.
Here is a less terse link. Sample content:
PostgreSQL Has views. Breaks standard by not allowing updates to views...
DB2 Conforms to at least SQL-92.
MSSQL Conforms to at least SQL-92.
MySQL Conforms to at least SQL-92.
Oracle Conforms to at least SQL-92.
Informix Conforms to at least SQL-92.
Something else you might consider, if you're using .NET, is to use the factory pattern in System.Data.Common which does a good job of abstracting provider specifics for a number of RDBMSs.
If you are trying to make a product that will work against multiple databases I think trying to only use standard sql is not the way to go, as other answers have indicated, due to the different 'interpretations' of the standard. Instead you should if possible have some kind of data access layer in your application which has different implementations specific for each database. Depending on what you are trying to do, there are tools such as Hibernate which will so a lot of the heavy lifting in regards to this for you.

what are the advantages of using plpgsql in postgresql

Besides the syntactic sugar and expressiveness power what are the differences in runtime efficiency. I mean, plpgsql can be faster than, lets say plpythonu or pljava? Or are they all approximately equals?
We are using stored procedures for the task of detecting nearly-duplicates records of people in a moderately sized database (around 10M of records)
plpgsql provides greater type safety I believe, you have to perform explicit casts if you want to perform operations using two different columns of similar type, like varchar and text or int4 and int8. This is important because if you need to have your stored proc use indexes, postgres requires that the types match exactly between join conditions (edit: for equality checks too I think).
There may be a facility for this in the other languages though, I haven't used them. In any case, I hope this gives you a better starting point for your investigation.
plpgsql is very well integrated with SQL - the source code should be very clean and readable. For SQL languages like PLJava or PLPython, SQL statements have to be isolated - SQL isn't part of language. So you have to write little bit more code. If your procedure has lot of SQL statements, then plpgsql procedure should be cleaner, shorter and little bit faster. When your procedure hasn't SQL statements, then procedures from external languages can be faster - but external languages (interprets) needs some time for initialisation - so for simple task, procedures in SQL or plpgsql language should be faster.
External languages are used when you need some functionality like access to net, access to filesystem - http://www.postgres.cz/index.php/PL/Perlu_-_Untrusted_Perl_%28en%29
What I know - people usually use a combination of PL languages - (SQL,plpgsql, plperl) or (SQL, plpgsql, plpython).
Without doing actual testing, I would expect plpgsql to be somewhat more efficient than other languages, because it's small. Having said that, remember that SQL functions are likely to be even faster than plpgsql, if a function is simple enough that you can write it in just SQL.