Postgres read and write external files - postgresql

I would like to use my postgres server also to serve documents and images that I don't want to store in the database for several reasons.
There is an extension for that purpose https://github.com/darold/external_file external fileexternal file and changed the code a bit to serve my needs without changing the core (see below). I am using 9.5 as I expect this version to be final before I finish development ;-)
I encounter the following problems:
Writing works quick and seems to be reliable but big files lead to out of memory (1Gig and above).
Reading often hangs vor a very long time (select readEFile('aPath');) and is not reliable.
Both WAL and Database quickly grow in size although no database tables are involved.
My Questions:
What is wrong with the following code? How can I exclude all those operations from WAL? Has anyone alredy written something like that and would share his development?
CREATE OR REPLACE FUNCTION public.writeefile(
buffer bytea,
filename character varying)
RETURNS void AS
$BODY$
DECLARE
l_oid oid;
lfd integer;
lsize integer;
BEGIN
l_oid := lo_create(0);
lfd := lo_open(l_oid,131072); --0x00020000 write mode
lsize := lowrite(lfd,buffer);
PERFORM lo_close(lfd);
PERFORM lo_export(l_oid,filename);
PERFORM lo_unlink(l_oid);
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION public.writeefile(bytea, character varying)
OWNER TO itcms;
CREATE OR REPLACE FUNCTION public.readefile(filename character varying)
RETURNS bytea AS
$BODY$
DECLARE
l_oid oid;
r record;
buffer bytea;
BEGIN
buffer := '';
SELECT lo_import(filename) INTO l_oid;
FOR r IN ( SELECT data
FROM pg_largeobject
WHERE loid = l_oid
ORDER BY pageno ) LOOP
buffer = buffer || r.data;
END LOOP;
PERFORM lo_unlink(l_oid);
return buffer;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION public.readefile(character varying)
OWNER TO itcms;
To explain my need for the above: This will be part of a medical system that also serves and stores huge documents and images through unsecure connections. storing hundreds of GB in the database doesn't seem to be a good idea to me. Since they don't change and just new docs are added backup of files is much more easy. As the database already handles SSL connections it would be great not having to deploy an additional sftp server for serving those files!

Your concept is doomed to failure. You are using the database server as a cache for disk operations on large files . This is an obvious waste of time and resources because the server each time have to save the entire contents of the file to remove it for a moment.
In my opinion, the use of ftp server will be simpler, more natural and far more efficient solution.

Related

Is there a postgres function to mutably update a binary data structure?

I have written an AGGREGATE function that approximates a SELECT COUNT(DISTINCT ...) over a UUID column, a kind of poor man's HyperLogLog (and having different perf characteristics).
However, it is very slow because I am using set_bit on a BIT and that has copy-on-write semantics.
So my question is:
is there a way to inplace / mutably update a BIT or bytea?
failing that, are there any binary data structures that allow mutable/in-place set_bit edits?
A constraint is that I can't push C code or extensions to implement this. But I can use extensions that are available in AWS RDS postgres. If it's not faster than HLL then I'll just be using HLL. Note that HLL is optimised for pre-aggregated counts, it isn't terribly fast at doing adhoc count estimates over millions of rows (although still faster than a raw COUNT DISTINCT).
Below is the code for context, probably buggy too:
CREATE OR REPLACE FUNCTION uuid_approx_count_distinct_sfunc (BIT(83886080), UUID)
RETURNS BIT(83886080) AS $$
DECLARE
s BIT(83886080) := $1;
BEGIN
IF s IS NULL THEN s := '0' :: BIT(83886080); END IF;
RETURN set_bit(s, abs(mod(uuid_hash($2), 83886080)), 1);
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION uuid_approx_count_distinct_ffunc (BIT(83886080))
RETURNS INTEGER AS $$
DECLARE
i INTEGER := 83886079;
s INTEGER := 0;
BEGIN
LOOP
EXIT WHEN i < 0;
IF get_bit($1, i) = 1 THEN s := s + 1; END IF;
i := i - 1;
END LOOP;
RETURN s;
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE AGGREGATE approx_count_distinct (UUID) (
SFUNC = uuid_approx_count_distinct_sfunc,
FINALFUNC = uuid_approx_count_distinct_ffunc,
STYPE = BIT(83886080),
COMBINEFUNC = bitor,
PARALLEL = SAFE
);
Yeah, SQL isn't actually that fast for raw computation. I might try a UDF, perhaps pljava or plv8 (JavaScript) which compile just-in-time to native and available on most major hosting providers. Of course for performance, use C (perhaps via LLVM) for maximum performance at maximum pain. Plv8 should take minutes to prototype, just pass an array constructed from array_agg(). Obviously keep the array size to millions of items, or find a way to roll-up your sketches ( bitwuse-AND ?)
https://plv8.github.io/#function-calls
https://www.postgresqltutorial.com/postgresql-aggregate-functions/postgresql-array_agg-function/
FYI HyperLogLog is available as an open source extension for PostgreSQL from Citus/Microsoft and of course available on Azure. https://www.google.com/search?q=hyperloglog+postgres
(You could crib from their coffee and just change the core algorithm, then test side by side). Citus is pretty easy to install, so this isn't a bad option.

From SQL Server to Postgres

I'm new to Postgresql; I have only used SQL Server before so I'm trying to migrate an ASP.NET MVC application to ASP.NET Core 3.0 and replace SQL Server with PostgreSQL in the process, and move to Ubuntu.
Can anyone tell what's wrong with my query here?
CREATE OR REPLACE PROCEDURE officekit.testsp (mode CHARACTER)
LANGUAGE plpgsql AS $plpgsql$
BEGIN
IF mode = 'test' THEN
SELECT a,b,c,d FROM testSchema.test;
END IF;
COMMIT;
END;
$plpgsql$;
In your code almost every line is not good.
First if you want to write stored procedures in Postgres, please start with the documentation. The concept of stored procedures in MSSQL is significantly different to that in Oracle, DB2, and certainly different to that in Postgres. PLpgSQL in Postgres is similar to Oracle's PL/SQL. The best thing to do, is to forget all that you know about stored procedures and start from scratch.
The errors:
procedures in Postgres cannot return data - only functions can do this - in your example you probably need a table function.
CREATE OR REPLACE FUNCTION officekit.testsp(mode text)
RETURNS TABLE(a text, b text, c text)
AS $$
BEGIN
IF mode = 'test' THEN
RETURN QUERY SELECT test.a, test.b, test.c FROM test;
END IF;
END;
$$ LANGUAGE plpgsql;
Personally I don't like this style of programming. Functions should not just wrap queries. A lot of significant performance problems and architecture issues arise from the bad use of this possibility, but it depends on context.
Why is commit there? This code doesn't make any change in the database. There is no reason to call commit.

Trigger to insert rows in remote database after deletion

I have created a trigger that works like this:
After deleting data from table flux_tresorerie_historique it insert this row in the table flux_tresorerie_historique that is located in another database archive
I use dblink to insert data in the remote database, the problem is that the creation of the query is too hard especially that the table contain more than 20 columns, and I want to create similar functions for 10 other tables.
Is there another rapid way to ensure this task?
Here an example that works fine:
CREATE OR REPLACE FUNCTION flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$BODY$
DECLARE date_rapprochement_flux TEXT;
DECLARE code_commission TEXT;
DECLARE reference_flux TEXT;
BEGIN
IF OLD.date_rapprochement_flux is null
THEN
date_rapprochement_flux = 'NULL';
ELSE
date_rapprochement_flux = ''''||to_char(OLD.date_rapprochement_flux, 'YYYY-MM-DD')||'''';
END IF;
IF OLD.code_commission is null
THEN
code_commission = 'NULL';
ELSE
code_commission = ''''||replace(OLD.code_commission,'''','''''')||'''';
END IF;
IF OLD.reference_flux is null
THEN
reference_flux = 'NULL';
ELSE
reference_flux = ''''||replace(OLD.reference_flux,'''','''''')||'''';
END IF;
perform dblink_connect('dbname=gtr_bd_archive user=postgres password=postgres');
perform dblink_exec('insert into flux_tresorerie_historique values('||OLD.id_flux_historique||','''||OLD.date_operation_flux||''','''||OLD.date_valeur_flux||''','||date_rapprochement_flux||','''||replace(OLD.libelle_flux,'''','''''')||''','||OLD.montant_flux||','||OLD.contre_valeur_dzd||','''||replace(OLD.rib_compte_bancaire,'''','''''')||''','||OLD.frais_flux||','''||replace(OLD.sens_flux,'''','''''')||''','''||replace(OLD.statut_flux,'''','''''')||''','''||replace(OLD.code_devise,'''','''''')||''','''||replace(OLD.code_mode_paiement,'''','''''')||''','''||replace(OLD.code_agence,'''','''''')||''','''||replace(OLD.code_compte,'''','''''')||''','''||replace(OLD.code_banque,'''','''''')||''','''||OLD.date_maj_flux||''','''||replace(OLD.statut_frais,'''','''''')||''','||reference_flux||','||code_commission||','||OLD.id_flux||');');
perform dblink_disconnect();
RETURN NULL;
END;
This is a limited application of replication. Requirements vary a lot, so there are a number of different established solutions, addressing different situations. Consider the overview in the manual.
Your hand-knit, trigger-based solution is one viable option for relatively few deletions. Opening and closing a separate connection for every row incurs quite an overhead. There are other various options.
While working with dblink I suggest some modifications. Most importantly:
Use format() to escape strings more elegantly.
Pass the whole row instead of passing and escaping every single column.
Don't place the password in every single trigger function.
Use a FOREIGN SERVER plus USER MAPPING. Detailed instructions here:
Persistent inserts in a UDF even if the function aborts
Basically, run once on the source server:
CREATE SERVER myserver FOREIGN DATA WRAPPER dblink_fdw
OPTIONS (hostaddr '127.0.0.1', dbname 'gtr_bd_archive');
CREATE USER MAPPING FOR role_source SERVER myserver
OPTIONS (user 'postgres', password 'secret');
Preferably, don't log in as superuser at the target server. Use a dedicated role with limited privileges to avoid privilege escalation.
And use a password file on the target server to allow password-less access. This way you don't even have to store the password in the USER MAPPING. Instructions in the last chapter of this related answer:
Run batch file with psql command without password
Then:
CREATE OR REPLACE FUNCTION pg_temp.flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$func$
BEGIN
PERFORM dblink_connect('myserver'); -- name of foreign server from above
PERFORM dblink_exec( format(
$$
INSERT INTO flux_tresorerie_historique -- provide target column list!
SELECT (r).id_flux_historique
, (r).date_operation_flux
, (r).date_valeur_flux
, (r).date_rapprochement_flux::date -- 'YYYY-MM-DD' is default ISO format anyway
, (r).libelle_flux
, (r).montant_flux
, (r).contre_valeur_dzd
, (r).rib_compte_bancaire
, (r).frais_flux
, (r).sens_flux
, (r).statut_flux
, (r).code_devise
, (r).code_mode_paiement
, (r).code_agence
, (r).code_compte
, (r).code_banque
, (r).date_maj_flux
, (r).statut_frais
, (r).reference_flux
, (r).code_commission
, (r).id_flux
FROM (SELECT %L::flux_tresorerie_historique) t(r)
$$, OLD::text)); -- cast whole row type
PERFORM dblink_disconnect();
RETURN NULL; -- only for AFTER trigger
END
$func$ LANGUAGE plpgsql;
You should spell out the list of columns for the target table if the row types don't match.
If you are serious about this:
insert this row in the table flux_tresorerie_historique
I.e., you insert the whole row and the target row type is identical (no extracting a date from a timestamp etc.), you can simplify much further passing the whole row.
CREATE OR REPLACE FUNCTION flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$func$
BEGIN
PERFORM dblink_connect('myserver'); -- name of foreign server
PERFORM dblink_exec( format(
$$
INSERT INTO flux_tresorerie_historique
SELECT (%L::flux_tresorerie_historique).*
$$
, OLD::text));
PERFORM dblink_disconnect();
RETURN NULL; -- only for AFTER trigger
END
$func$ LANGUAGE plpgsql;
Related:
How do I do large non-blocking updates in PostgreSQL?
You can use quote_nullable for this! Also, concat_ws comes very handy:
CREATE OR REPLACE FUNCTION flux_tresorerie_historique_backup_row()
RETURNS trigger AS
$BODY$
BEGIN
perform dblink_connect('dbname=gtr_bd_archive user=postgres password=postgres');
perform dblink_exec('insert into flux_tresorerie_historique values('||
concat_ws(', ', quote_nullable(OLD.id_flux_historique),
quote_nullable(OLD.date_operation_flux),
quote_nullable(OLD.date_valeur_flux),
quote_nullable(to_char(OLD.date_rapprochement_flux, 'YYYY-MM-DD')),
quote_nullable(OLD.libelle_flux),
quote_nullable(OLD.montant_flux),
quote_nullable(OLD.contre_valeur_dzd),
quote_nullable(OLD.rib_compte_bancaire),
quote_nullable(OLD.frais_flux),
quote_nullable(OLD.sens_flux),
quote_nullable(OLD.statut_flux),
quote_nullable(OLD.code_devise),
quote_nullable(OLD.code_mode_paiement),
quote_nullable(OLD.code_agence),
quote_nullable(OLD.code_compte),
quote_nullable(OLD.code_banque),
quote_nullable(OLD.date_maj_flux),
quote_nullable(OLD.statut_frais),
quote_nullable(OLD.reference_flux),
quote_nullable(OLD.code_commission),
quote_nullable(OLD.id_flux)
)||');');
perform dblink_disconnect();
RETURN NULL;
END;
Note that it is OK to place non-sting values between single quotes, since a quoted literal is for PostgreSQL just as good a literal value as one without the quotes, so it is convenient to place all of the columns processed by quote_nullable. Also note that quote_nullable will already output dates in YYYY-MM-DD format (e.g. select quote_nullable(now()::date) would result in '2016-05-04'), so you may want to simplify OLD.date_rapprochement_flux even further by removing the to_char.

Is this generic MERGE/UPSERT function for PostgreSQL safe?

I have created a "merge" function which is supposed to execute either an UPDATE or an INSERT query, depending on existing data. Instead of writing an upsert-wrapper for each table (as in most of the available examples), this function takes entire SQL strings. Both of the SQL strings are automatically generated by our application.
The plan is to call the function like this:
-- hypothetical "settings" table, with a primary key of (user_id, setting):
SELECT merge(
$$UPDATE settings SET value = 'x' WHERE user_id = 42 AND setting = 'foo'$$,
$$INSERT INTO settings (user_id, setting, value) VALUES (42, 'foo', 'x')$$
);
Here's the full code of the merge() function:
CREATE OR REPLACE FUNCTION merge (update_sql TEXT, insert_sql TEXT) RETURNS TEXT AS
$func$
DECLARE
max_iterations INTEGER := 10;
i INTEGER := 0;
num_updated INTEGER;
BEGIN
-- usually returns before re-entering the loop
LOOP
-- first try the update
EXECUTE update_sql;
GET DIAGNOSTICS num_updated = ROW_COUNT;
IF num_updated > 0 THEN
RETURN 'UPDATE';
END IF;
-- nothing was updated: try the insert, watching out for concurrent inserts
BEGIN
EXECUTE insert_sql;
RETURN 'INSERT';
EXCEPTION WHEN unique_violation THEN
-- nop; just loop and try again from the top
END;
-- emergency brake
i := i + 1;
IF i >= max_iterations THEN
RAISE EXCEPTION 'merge(): tried looping % times, giving up now.', i;
EXIT;
END IF;
END LOOP;
END;
$func$
LANGUAGE plpgsql;
It appears to work well enough in my tests, but I'm not certain if I haven't missed anything crucial, especially regarding concurrent UPDATE/INSERT/DELETE queries, which may be issued without using this function. Did I overlook anything important?
Among the resources I consulted for this function are:
UPDATE/INSERT example 40.2 in the PostgreSQL manual
Why is UPSERT so complicated?
SO: Insert, on duplicate update (postgresql)
(Edit: one of the goals was to avoid locking the target table.)
The answer to your question depends your the context of how your application(s) will access the database. There are many ways to solve this as nicely discussed in depesz's post you cited by yourself. In addition you might want to also consider using writeable CTEs see here. Also the [question]Insert, on duplicate update in PostgreSQL? has some interesting discussions for your decision making process.

postgresql privileges Ensuring inserts are only done through functions

Let's say I have a table persons which contains only a name(varchar) and a user client.
I'd like that the only way for client to insert to persons is through the function:
CREATE OR REPLACE FUNCTION add_a_person(a_name varying character)
RETURNS void AS
$BODY$
BEGIN
INSERT INTO persons VALUES(a_name);
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
So, I don't want to grant client insert privileges on persons and only give execute privilege for add_a_person.
But without doing so, I'd get a permission denied because of the use of insert inside the function.
I have not found a way to this in the postgres documentation about granting privileges.
Is there a way to do this?
You can define the function with SECURITY DEFINER. This will allow the function to run for the restricted user as if they had the higher privileges of the function's creator (which needs to be able to insert into the table).
The last line of the definition would look like this:
LANGUAGE plpgsql VOLATILE COST 100 SECURITY DEFINER;
This is a bit simplistic, but assuming are running 9.2 or later, this is an example of how to check for a single permitted function doing an insert:
CREATE TABLE my_table (col1 text, col2 integer, col3 timestamp);
CREATE FUNCTION my_table_insert_function(col1 text, col2 integer) RETURNS integer AS $$
BEGIN
INSERT INTO my_table VALUES (col1, col2, current_timestamp);
RETURN 1;
END $$ LANGUAGE plpgsql;
CREATE FUNCTION my_table_insert_trigger_function() RETURNS trigger AS $$
DECLARE
stack text;
fn integer;
BEGIN
RAISE EXCEPTION 'secured';
EXCEPTION WHEN OTHERS THEN
BEGIN
GET STACKED DIAGNOSTICS stack = PG_EXCEPTION_CONTEXT;
fn := position('my_table_insert_function' in stack);
IF (fn <= 0) THEN
RAISE EXCEPTION 'Expecting insert from my_table_insert_function'
USING HINT = 'Use function to insert data';
END IF;
RETURN new;
END;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER my_table_insert_trigger BEFORE INSERT ON my_table
FOR EACH ROW EXECUTE PROCEDURE my_table_insert_trigger_function();
And a quick example of usage:
INSERT INTO my_table VALUES ('test one', 1, current_timestamp); -- FAILS
SELECT my_table_insert_function('test one', 1); -- SUCCEEDS
You'll want to peek into the stack in more detail if you want your code to be more robust, secure, etc. Checks for multiple functions are possible, of course, but involve more work. Splitting the stack into multiple lines and parsing it can be fairly involved, so you'll probably want some helper functions if things get more complex.
This is just a proof of concept, but it does what it claims. I would expect this code to be fairly slow given the use of exception handling and stack inspection, so don't use it in performance-critical parts of your application. It's not likely to be suitable for cases where DML statements are frequent, but if security is more important than performance, go for it.
Matthew's answer is correct in that a SECURITY DEFINER will allow the function to run with the privileges of a different user. Documentation for this is at http://www.postgresql.org/docs/9.1/static/sql-createfunction.html
Why are you trying to implement security this way? If you want to enforce some logic on the inserts, then I would strongly recommend doing it with constraints. http://www.postgresql.org/docs/9.1/static/ddl-constraints.html
If you want substantially higher levels of logic than can be reasonably implemented in constraints, I would suggest looking into building a business logic layer between your presentation layer and the data storage layer. You will find that scalability demands this pretty much instantly.
If your goal is to defend against SQL injection then you have found a way that might work, but that will create a heck of a lot of work for you. Worse, it leads to huge volumes of really mindless code that all has to be kept in sync across schema changes. This is pretty rough if you're trying to do anything agile. Consider instead using a programming framework that takes advantage of PREPARE / EXECUTE, which is pretty much all of them at this point.
http://www.postgresql.org/docs/9.0/static/sql-prepare.html