How to list files in a folder from inside Postgres? - postgresql

Is there any way to list files from a folder?
Something like:
select * from pg_ls_dir('/home/christian')
I tried pg_ls_dir but, per documentation:
Only files within the database cluster directory and the log_directory
can be accessed. Use a relative path for files in the cluster
directory, and a path matching the log_directory configuration setting
for log files. Use of these functions is restricted to superusers.
I need to list files from a folder outside the postgres directories, similar to how it's done with COPY.

Using PostgreSQL 9.3, it is possible to avoid the overhead of installing a language extension:
DROP TABLE IF EXISTS files;
CREATE TABLE files(filename text);
COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"';
SELECT * FROM files ORDER BY filename ASC;
Creates a table with 2,000+ rows from [ to zip.
Normally the COPY command requires superuser privileges. Since the path to the file system is hard-coded (i.e., not an unsanitized value from users), it doesn't pose a great security risk to define the function first using a superuser account (e.g., postgres) as follows:
CREATE OR REPLACE FUNCTION files()
RETURNS SETOF text AS
$BODY$
BEGIN
SET client_min_messages TO WARNING;
DROP TABLE IF EXISTS files;
CREATE TEMP TABLE files(filename text);
COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"';
RETURN QUERY SELECT * FROM files ORDER BY filename ASC;
END;
$BODY$
LANGUAGE plpgsql SECURITY DEFINER;
Log in to PostgreSQL using a non-superuser account, then:
SELECT * FROM files();
The same list of results should be returned without any security violation errors.
The SECURITY DEFINER tells PostgreSQL to run the function under the role of the account that was used to create the function. Since it was created using a superuser role, it will execute with superuser permissions, regardless of the role that executes the command.
The SET client_min_messages TO WARNING; tells PostgreSQL to suppress messages if the table cannot be dropped. It's okay to delete this line.
The CREATE TEMP TABLE is used to create a table that does not need to persist over time. If you need a permanent table, remove the TEMP modifier.
The 'find...' command, which could also be /usr/bin/find, lists only files (type -f) and displays only the filename without the leading path separated one filename per line (-printf "%f\n"). Finally, -maxdepth 1 limits the file search to only the specified directory without searching any subdirectories. See find's man page for details.
One disadvantage to this approach is that there doesn't seem to be a way to parameterize the command to execute. It seems that PostgreSQL requires it to be a text string, rather than an expression statement. Perhaps this is for the best as it prevents allowing arbitrary commands to be executed. What you see is what you execute.

Extended version of this answer, function ls_files_extended:
-- Unfortunately that variant only allow use hardcoded path
-- To use user parameter we will use dynamic EXECUTE.
-- Return also file size and allow filtering
--
-- #param path text. Filesystem path for read to
-- #param filter text (default null meaning return all). Where condition to filter files. F.e.: $$filename LIKE '0%'$$
-- #param sort text (default filename).
--
-- Examples of use:
-- 1) Simple call, return all files, sort by filename:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive')
-- 2) Return all, sort by filesize:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', null, 'size ASC')
-- 3) Use filtering and sorting:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', 'filename LIKE ''0%''', 'size ASC')
-- or use $-quoting for easy readability:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', $$filename LIKE '0%'$$, 'size ASC')
CREATE OR REPLACE FUNCTION ls_files_extended(path text, filter text default null, sort text default 'filename')
RETURNS TABLE(filename text, size bigint) AS
$BODY$
BEGIN
SET client_min_messages TO WARNING;
CREATE TEMP TABLE _files(filename text, size bigint) ON COMMIT DROP;
EXECUTE format($$COPY _files FROM PROGRAM 'find %s -maxdepth 1 -type f -printf "%%f\t%%s\n"'$$, path);
RETURN QUERY EXECUTE format($$SELECT * FROM _files WHERE %s ORDER BY %s $$, concat_ws(' AND ', 'true', filter), sort);
END;
$BODY$ LANGUAGE plpgsql SECURITY DEFINER;

It's normally not useful for a SQL client.
Anyway should you need to implement it, that's a typical use case for a script language like plperlu. Example:
CREATE FUNCTION nosecurity_ls(text) RETURNS setof text AS $$
opendir(my $d, $_[0]) or die $!;
while (my $f=readdir($d)) {
return_next($f);
}
return undef;
$$ language plperlu;
That's equivalent to the pg_ls_dir(text) function mentioned in System Administration Functions except for the restrictions.
=> select * from nosecurity_ls('/var/lib/postgresql/9.1/main') as ls;
ls
-----------------
pg_subtrans
pg_serial
pg_notify
pg_clog
pg_multixact
..
base
pg_twophase
etc...

Related

get multiple result set for a Procedure in PostgreSQL

My requirement is that I want to create a generic function where I can pass any other function and its params and it should return appropriate output (i.e It may be a table result or single result etc.) and it should be with in single statement there.
This Is what I have searched and tried but I don't want to run any multiple statements.
CREATE FUNCTION CustomerWithOrdersByState() RETURNS SETOF refcursor AS $$
DECLARE
ref1 refcursor; -- Declare cursor variables
ref2 refcursor;
BEGIN
OPEN ref1 FOR SELECT * FROM "table1" limit 10;
RETURN NEXT ref1;
OPEN ref2 FOR SELECT * FROM "table2" limit 10;
RETURN NEXT ref2;
END;
$$ LANGUAGE plpgsql;
==================================================================
begin;
select * from CustomerWithOrdersByState();
FETCH ALL FROM "<unnamed portal 31>";
-- FETCH ALL FROM "<unnamed portal 30>";
commit;
I am using Postgres 11.4 version..
I've had what I believe is a similar issue where I wanted a way to execute a script with multiple result sets in a single batch.
As pointed out above, PGAdmin4 (and many other clients I've tried) only seem to process a single command at a time, meaning that you have to select a row, execute, select the next, execute... etc.
One quick way I found which appears to work is to save the script as a single file, then execute it on the CLI via PSQL.
So, for an example, I created a file called myscript.sql as follows:
DROP TABLE IF EXISTS sampledata;
CREATE TABLE if not exists sampledata as select x,1 as c2,2 as c3, md5(random()::text) from generate_series(1,5) x;
CREATE OR REPLACE FUNCTION GET_RECORDS(ref refcursor) RETURNS REFCURSOR AS $$
BEGIN
OPEN ref FOR SELECT * FROM SAMPLEDATA; -- OPEN A CURSOR
RETURN ref; -- RETURN THE CURSOR TO THE CALLER
END;
$$ LANGUAGE PLPGSQL;
/*
In PGManage, you would need to execute this commands one at a time (ie, 4 times).
*/
BEGIN;
SELECT get_records('r1');
FETCH ALL IN "r1";
COMMIT;
I then created a bash script (runscript.sh) which allowed for easy execution of different files.
#!/bin/bash
# Can be used to execute scripts.
# Like this ./runfile.sh hello.sql
psql -U xuser -d postgres < "$1"
I set the script to be executable:
chmod a+x runscript.sh
And then execute as follows:
./runscript.sh myscript.sql
The script executes and I see the results in the CLI. I can iterate quickly on the file, save it and execute it in the shell.

Detect if source file is empty before using COPY in PL/pgSQL

I am running 64bit Postgres 10.3 on Windows 2012 R2. Every week we get data in text files (44 seperate files) and I use a LOOP and the COPY command in a PL/pgSQL function to import the data. I have come accross an issue where an incorrect password was supplied when unzipping the data, so the source files were created but empty. The import function appeared to be running but would freeze at a different file each time I tried.
Is there any way (using PL/pgSQL) to detect if a file on disk is empty before trying to use COPY?
You can use a function for this. Something like:
CREATE OR REPLACE FUNCTION wwusp_filesize(file_path text)
RETURNS TEXT AS $BODY$
BEGIN
DROP TABLE IF EXISTS tmp_file_size;
CREATE TEMPORARY TABLE tmp_file_size (size BIGINT);
EXECUTE 'COPY tmp_file_size (size) FROM PROGRAM ''wc -c < ' || file_path || '''';
RETURN (SELECT pg_size_pretty(size) FROM tmp_file_size);
END;
$BODY$ LANGUAGE plpgsql;
This will work on Linux servers. If you wish to make it work also on Windows, just replace the wc -c < file with the corresponding program/syntax.
# SELECT wwusp_filesize('/etc/postgresql/9.5/main/pg_hba.conf') AS filesize;
filesize
------------
4641 bytes
(1 row)
And with an empty file ..
# SELECT wwusp_filesize('/etc/postgresql/9.5/main/empty_file') AS filesize;
filesize
----------
0 bytes
(1 row)

Absolute path not allowed on row 7, xml to postgres?

CREATE OR REPLACE FUNCTION j_f_sync_from_xml()
RETURNS boolean AS
$BODY$
DECLARE
myxml xml;
datafile text :=
'C:\Users\Polichetti\Documents\ABBATE_EMANUELE_Lvl1F2Va_20160418-1759.xml';
BEGIN
myxml := pg_read_file(datafile, 0, 100000000);
CREATE TABLE james AS
SELECT (xpath('//some_id/text()', x))[1]::text AS id
FROM unnest(xpath('/xml/path/to/datum', myxml)) x;
END;
$BODY$ language plpgsql;
SELECT * from james;
I find an error, absolute path not allowed on row 7.
Probably I don't know which path I have to use.
https://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADMIN-GENFILE
Only files within the database cluster directory and the log_directory
can be accessed. Use a relative path for files in the cluster
directory, and a path matching the log_directory configuration setting
for log files.
you can access just any OS file, run show data_directory and show log_directory o find out where you can read file with pg_read_file
If you want to load xml to db, I'd rather use different approach, eg:
create table xml(body text);
copy xml from '/absolute/path/to/file.xml';
select string_agg(body,'')::xml from xml;
this is the easiest example. you can look on the web for more, eg using large objects utils

How to run a sequence of SQL queries and save the results?

In other statistical programs, it's possible to create a log file that shows the output issued as a result of a command. Is it possible to do something similar in SQL?
In particular, I'd like to have a single .sql file with many queries and to then output each result to a text file.
I'm using PostgreSQL and Navicat.
plpgsql function and COPY
One way would be to put the SQL script into a plpgsql function, where you can write the individual return values to files with COPY and compile a report from intermediary results just like you need it.
This has additional effect that may or may not be desirable. Like, you can grant or revoke permission to the whole function to arbitrary roles. Read about SECURITY DEFINER in the manual. And the syntax will be verified when you save the function - however, only superficially (there are plans to change that in the future). More details in this answer on dba.SE.
Basic example:
CREATE OR REPLACE FUNCTION func()
RETURNS void AS
$BODY$
BEGIN
COPY (SELECT * FROM tbl WHERE foo) TO '/path/to/my/file/tbl.csv';
COPY (SELECT * FROM tbl2 WHERE NOT bar) TO '/path/to/my/file/tbl2.csv';
END;
$BODY$
LANGUAGE plpgsql;
Of course, you need to have the necessary privileges in the database and in the file system.
Call it from the shell:
psql mydb -c 'SELECT func();'
psql switching between meta commands and SQL
#!/bin/sh
BASEDIR='/var/lib/postgresql/outfiles/'
echo "
\\o $OUTDIR/file1.txt \\\\\\\\ SELECT * FROM tbl1;
\\o $OUTDIR/file2.txt \\\\\\\\ SELECT * FROM tbl2;
\\o $OUTDIR/file3.txt \\\\\\\\ SELECT * FROM tbl3;" | psql event -p 5432 -t -A
That's right, 8 backslashes. Results from a double backslash that gets interpreted two times, so you have to double them two times.
I quote the manual about the meta-commands \o:
Saves future query results to the file filename or ...
and \\:
command must be either a command string that is completely parsable by
the server (i.e., it contains no psql-specific features), or a single
backslash command. Thus you cannot mix SQL and psql meta-commands with
this option. To achieve that, you could pipe the string into psql,
like this: echo '\x \\ SELECT * FROM foo;' | psql. (\\ is the
separator meta-command.)
Don't know about navicat, but you can do it with psql. Check the various --echo-X command-line options and the \o command if you just want temporary output to a file.

How to insert a text file into a field in PostgreSQL?

How to insert a text file into a field in PostgreSQL?
I'd like to insert a row with fields from a local or remote text file.
I'd expect a function like gettext() or geturl() in order to do the following:
% INSERT INTO collection(id, path, content) VALUES(1, '/etc/motd', gettext('/etc/motd'));
-S.
The easiest method would be to use one of the embeddable scripting languages. Here's an example using plpythonu:
CREATE FUNCTION gettext(url TEXT) RETURNS TEXT
AS $$
import urllib2
try:
f = urllib2.urlopen(url)
return ''.join(f.readlines())
except Exception:
return ""
$$ LANGUAGE plpythonu;
One drawback to this example function is its reliance on urllib2 means you have to use "file:///" URLs to access local files, like this:
select gettext('file:///etc/motd');
Thanks for the tips. I've found another answer with a built in function.
You need to have super user rights in order to execute that!
-- 1. Create a function to load a doc
-- DROP FUNCTION get_text_document(CHARACTER VARYING);
CREATE OR REPLACE FUNCTION get_text_document(p_filename CHARACTER VARYING)
RETURNS TEXT AS $$
-- Set the end read to some big number because we are too lazy to grab the length
-- and it will cut of at the EOF anyway
SELECT CAST(pg_read_file(E'mydocuments/' || $1 ,0, 100000000) AS TEXT);
$$ LANGUAGE sql VOLATILE SECURITY DEFINER;
ALTER FUNCTION get_text_document(CHARACTER VARYING) OWNER TO postgres;
-- 2. Determine the location of your cluster by running as super user:
SELECT name, setting FROM pg_settings WHERE name='data_directory';
-- 3. Copy the files you want to import into <data_directory>/mydocuments/
-- and test it:
SELECT get_text_document('file1.txt');
-- 4. Now do the import (HINT: File must be UTF-8)
INSERT INTO mytable(file, content)
VALUES ('file1.txt', get_text_document('file1.txt'));
Postgres's COPY command is exactly for this.
My advice is to upload it to a temporary table, and then transfer the data across to your main table when you're happy with the formatting. e.g.
CREATE table text_data (text varchar)
COPY text_data FROM 'C:\mytempfolder\textdata.txt';
INSERT INTO main_table (value)
SELECT string_agg(text, chr(10)) FROM text_data;
DROP TABLE text_data;
Also see this question.
You can't. You need to write a program that will read file content's (or URL's) and store it into the desired field.
Use COPY instead of INSERT
reference: http://www.commandprompt.com/ppbook/x5504#AEN5631