CREATE OR REPLACE FUNCTION j_f_sync_from_xml()
RETURNS boolean AS
$BODY$
DECLARE
myxml xml;
datafile text :=
'C:\Users\Polichetti\Documents\ABBATE_EMANUELE_Lvl1F2Va_20160418-1759.xml';
BEGIN
myxml := pg_read_file(datafile, 0, 100000000);
CREATE TABLE james AS
SELECT (xpath('//some_id/text()', x))[1]::text AS id
FROM unnest(xpath('/xml/path/to/datum', myxml)) x;
END;
$BODY$ language plpgsql;
SELECT * from james;
I find an error, absolute path not allowed on row 7.
Probably I don't know which path I have to use.
https://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADMIN-GENFILE
Only files within the database cluster directory and the log_directory
can be accessed. Use a relative path for files in the cluster
directory, and a path matching the log_directory configuration setting
for log files.
you can access just any OS file, run show data_directory and show log_directory o find out where you can read file with pg_read_file
If you want to load xml to db, I'd rather use different approach, eg:
create table xml(body text);
copy xml from '/absolute/path/to/file.xml';
select string_agg(body,'')::xml from xml;
this is the easiest example. you can look on the web for more, eg using large objects utils
Related
I am running 64bit Postgres 10.3 on Windows 2012 R2. Every week we get data in text files (44 seperate files) and I use a LOOP and the COPY command in a PL/pgSQL function to import the data. I have come accross an issue where an incorrect password was supplied when unzipping the data, so the source files were created but empty. The import function appeared to be running but would freeze at a different file each time I tried.
Is there any way (using PL/pgSQL) to detect if a file on disk is empty before trying to use COPY?
You can use a function for this. Something like:
CREATE OR REPLACE FUNCTION wwusp_filesize(file_path text)
RETURNS TEXT AS $BODY$
BEGIN
DROP TABLE IF EXISTS tmp_file_size;
CREATE TEMPORARY TABLE tmp_file_size (size BIGINT);
EXECUTE 'COPY tmp_file_size (size) FROM PROGRAM ''wc -c < ' || file_path || '''';
RETURN (SELECT pg_size_pretty(size) FROM tmp_file_size);
END;
$BODY$ LANGUAGE plpgsql;
This will work on Linux servers. If you wish to make it work also on Windows, just replace the wc -c < file with the corresponding program/syntax.
# SELECT wwusp_filesize('/etc/postgresql/9.5/main/pg_hba.conf') AS filesize;
filesize
------------
4641 bytes
(1 row)
And with an empty file ..
# SELECT wwusp_filesize('/etc/postgresql/9.5/main/empty_file') AS filesize;
filesize
----------
0 bytes
(1 row)
I would like to use my postgres server also to serve documents and images that I don't want to store in the database for several reasons.
There is an extension for that purpose https://github.com/darold/external_file external fileexternal file and changed the code a bit to serve my needs without changing the core (see below). I am using 9.5 as I expect this version to be final before I finish development ;-)
I encounter the following problems:
Writing works quick and seems to be reliable but big files lead to out of memory (1Gig and above).
Reading often hangs vor a very long time (select readEFile('aPath');) and is not reliable.
Both WAL and Database quickly grow in size although no database tables are involved.
My Questions:
What is wrong with the following code? How can I exclude all those operations from WAL? Has anyone alredy written something like that and would share his development?
CREATE OR REPLACE FUNCTION public.writeefile(
buffer bytea,
filename character varying)
RETURNS void AS
$BODY$
DECLARE
l_oid oid;
lfd integer;
lsize integer;
BEGIN
l_oid := lo_create(0);
lfd := lo_open(l_oid,131072); --0x00020000 write mode
lsize := lowrite(lfd,buffer);
PERFORM lo_close(lfd);
PERFORM lo_export(l_oid,filename);
PERFORM lo_unlink(l_oid);
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION public.writeefile(bytea, character varying)
OWNER TO itcms;
CREATE OR REPLACE FUNCTION public.readefile(filename character varying)
RETURNS bytea AS
$BODY$
DECLARE
l_oid oid;
r record;
buffer bytea;
BEGIN
buffer := '';
SELECT lo_import(filename) INTO l_oid;
FOR r IN ( SELECT data
FROM pg_largeobject
WHERE loid = l_oid
ORDER BY pageno ) LOOP
buffer = buffer || r.data;
END LOOP;
PERFORM lo_unlink(l_oid);
return buffer;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION public.readefile(character varying)
OWNER TO itcms;
To explain my need for the above: This will be part of a medical system that also serves and stores huge documents and images through unsecure connections. storing hundreds of GB in the database doesn't seem to be a good idea to me. Since they don't change and just new docs are added backup of files is much more easy. As the database already handles SSL connections it would be great not having to deploy an additional sftp server for serving those files!
Your concept is doomed to failure. You are using the database server as a cache for disk operations on large files . This is an obvious waste of time and resources because the server each time have to save the entire contents of the file to remove it for a moment.
In my opinion, the use of ftp server will be simpler, more natural and far more efficient solution.
Is there any way to list files from a folder?
Something like:
select * from pg_ls_dir('/home/christian')
I tried pg_ls_dir but, per documentation:
Only files within the database cluster directory and the log_directory
can be accessed. Use a relative path for files in the cluster
directory, and a path matching the log_directory configuration setting
for log files. Use of these functions is restricted to superusers.
I need to list files from a folder outside the postgres directories, similar to how it's done with COPY.
Using PostgreSQL 9.3, it is possible to avoid the overhead of installing a language extension:
DROP TABLE IF EXISTS files;
CREATE TABLE files(filename text);
COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"';
SELECT * FROM files ORDER BY filename ASC;
Creates a table with 2,000+ rows from [ to zip.
Normally the COPY command requires superuser privileges. Since the path to the file system is hard-coded (i.e., not an unsanitized value from users), it doesn't pose a great security risk to define the function first using a superuser account (e.g., postgres) as follows:
CREATE OR REPLACE FUNCTION files()
RETURNS SETOF text AS
$BODY$
BEGIN
SET client_min_messages TO WARNING;
DROP TABLE IF EXISTS files;
CREATE TEMP TABLE files(filename text);
COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"';
RETURN QUERY SELECT * FROM files ORDER BY filename ASC;
END;
$BODY$
LANGUAGE plpgsql SECURITY DEFINER;
Log in to PostgreSQL using a non-superuser account, then:
SELECT * FROM files();
The same list of results should be returned without any security violation errors.
The SECURITY DEFINER tells PostgreSQL to run the function under the role of the account that was used to create the function. Since it was created using a superuser role, it will execute with superuser permissions, regardless of the role that executes the command.
The SET client_min_messages TO WARNING; tells PostgreSQL to suppress messages if the table cannot be dropped. It's okay to delete this line.
The CREATE TEMP TABLE is used to create a table that does not need to persist over time. If you need a permanent table, remove the TEMP modifier.
The 'find...' command, which could also be /usr/bin/find, lists only files (type -f) and displays only the filename without the leading path separated one filename per line (-printf "%f\n"). Finally, -maxdepth 1 limits the file search to only the specified directory without searching any subdirectories. See find's man page for details.
One disadvantage to this approach is that there doesn't seem to be a way to parameterize the command to execute. It seems that PostgreSQL requires it to be a text string, rather than an expression statement. Perhaps this is for the best as it prevents allowing arbitrary commands to be executed. What you see is what you execute.
Extended version of this answer, function ls_files_extended:
-- Unfortunately that variant only allow use hardcoded path
-- To use user parameter we will use dynamic EXECUTE.
-- Return also file size and allow filtering
--
-- #param path text. Filesystem path for read to
-- #param filter text (default null meaning return all). Where condition to filter files. F.e.: $$filename LIKE '0%'$$
-- #param sort text (default filename).
--
-- Examples of use:
-- 1) Simple call, return all files, sort by filename:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive')
-- 2) Return all, sort by filesize:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', null, 'size ASC')
-- 3) Use filtering and sorting:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', 'filename LIKE ''0%''', 'size ASC')
-- or use $-quoting for easy readability:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', $$filename LIKE '0%'$$, 'size ASC')
CREATE OR REPLACE FUNCTION ls_files_extended(path text, filter text default null, sort text default 'filename')
RETURNS TABLE(filename text, size bigint) AS
$BODY$
BEGIN
SET client_min_messages TO WARNING;
CREATE TEMP TABLE _files(filename text, size bigint) ON COMMIT DROP;
EXECUTE format($$COPY _files FROM PROGRAM 'find %s -maxdepth 1 -type f -printf "%%f\t%%s\n"'$$, path);
RETURN QUERY EXECUTE format($$SELECT * FROM _files WHERE %s ORDER BY %s $$, concat_ws(' AND ', 'true', filter), sort);
END;
$BODY$ LANGUAGE plpgsql SECURITY DEFINER;
It's normally not useful for a SQL client.
Anyway should you need to implement it, that's a typical use case for a script language like plperlu. Example:
CREATE FUNCTION nosecurity_ls(text) RETURNS setof text AS $$
opendir(my $d, $_[0]) or die $!;
while (my $f=readdir($d)) {
return_next($f);
}
return undef;
$$ language plperlu;
That's equivalent to the pg_ls_dir(text) function mentioned in System Administration Functions except for the restrictions.
=> select * from nosecurity_ls('/var/lib/postgresql/9.1/main') as ls;
ls
-----------------
pg_subtrans
pg_serial
pg_notify
pg_clog
pg_multixact
..
base
pg_twophase
etc...
Edit: After posting I found Erwin Brandstetter's answer to a similar question. It sounds like in 9.2+ I could use the last option he listed, but none of the other alternatives sound workable for my situation. However, the comment from Jakub Kania and reiterated by Craig Ringer suggesting I use COPY, or \copy, in psql appears to solve my problem.
My goal is to get the results of executing a dynamically created query into a text file.
The names and number of columns are unknown; the query generated at run time is a 'pivot' one, and the names of columns in the SELECT list are taken from values stored in the database.
What I envision is being able, from the command line to run:
$ psql -o "myfile.txt" -c "EXECUTE mySQLGeneratingFuntion(param1, param2)"
But what I'm finding is that I can't get results from an EXECUTEd query unless I know the number of columns and their types that are in the results of the query.
create or replace function carrier_eligibility.createSQL() returns varchar AS
$$
begin
return 'SELECT * FROM carrier_eligibility.rule_result';
-- actual procedure writes a pivot query whose columns aren't known til run time
end
$$ language plpgsql
create or replace function carrier_eligibility.RunSQL() returns setof record AS
$$
begin
return query EXECUTE carrier_eligibility.createSQL();
end
$$ language plpgsql
-- this works, but I want to be able to get the results into a text file without knowing
-- the number of columns
select * from carrier_eligibility.RunSQL() AS (id int, uh varchar, duh varchar, what varchar)
Using psql isn't a requirement. I just want to get the results of the query into a text file, with the column names in the first row.
What format of a text file do you want? Something like csv?
How about something like this:
CREATE OR REPLACE FUNCTION sql_to_csv(in_sql text) returns setof text
SECURITY INVOKER -- CRITICAL DO NOT CHANGE THIS TO SECURITY DEFINER
LANGUAGE PLPGSQL AS
$$
DECLARE t_row RECORD;
t_out text;
BEGIN
FOR t_row IN EXECUTE in_sql LOOP
t_out := t_row::text;
t_out := regexp_replace(regexp_replace(t_out, E'^\\(', ''), E'\\)$', '');
return next t_out;
END LOOP;
END;
$$;
This should create properly quoted csv strings without the header. Embedded newlines may be a problem but you could write a quick Perl script to connect and write the data or something.
Note this presumes that the tuple structure (parenthesized csv) does not change with future versions, but it currently should work with 8.4 at least through 9.2.
How to insert a text file into a field in PostgreSQL?
I'd like to insert a row with fields from a local or remote text file.
I'd expect a function like gettext() or geturl() in order to do the following:
% INSERT INTO collection(id, path, content) VALUES(1, '/etc/motd', gettext('/etc/motd'));
-S.
The easiest method would be to use one of the embeddable scripting languages. Here's an example using plpythonu:
CREATE FUNCTION gettext(url TEXT) RETURNS TEXT
AS $$
import urllib2
try:
f = urllib2.urlopen(url)
return ''.join(f.readlines())
except Exception:
return ""
$$ LANGUAGE plpythonu;
One drawback to this example function is its reliance on urllib2 means you have to use "file:///" URLs to access local files, like this:
select gettext('file:///etc/motd');
Thanks for the tips. I've found another answer with a built in function.
You need to have super user rights in order to execute that!
-- 1. Create a function to load a doc
-- DROP FUNCTION get_text_document(CHARACTER VARYING);
CREATE OR REPLACE FUNCTION get_text_document(p_filename CHARACTER VARYING)
RETURNS TEXT AS $$
-- Set the end read to some big number because we are too lazy to grab the length
-- and it will cut of at the EOF anyway
SELECT CAST(pg_read_file(E'mydocuments/' || $1 ,0, 100000000) AS TEXT);
$$ LANGUAGE sql VOLATILE SECURITY DEFINER;
ALTER FUNCTION get_text_document(CHARACTER VARYING) OWNER TO postgres;
-- 2. Determine the location of your cluster by running as super user:
SELECT name, setting FROM pg_settings WHERE name='data_directory';
-- 3. Copy the files you want to import into <data_directory>/mydocuments/
-- and test it:
SELECT get_text_document('file1.txt');
-- 4. Now do the import (HINT: File must be UTF-8)
INSERT INTO mytable(file, content)
VALUES ('file1.txt', get_text_document('file1.txt'));
Postgres's COPY command is exactly for this.
My advice is to upload it to a temporary table, and then transfer the data across to your main table when you're happy with the formatting. e.g.
CREATE table text_data (text varchar)
COPY text_data FROM 'C:\mytempfolder\textdata.txt';
INSERT INTO main_table (value)
SELECT string_agg(text, chr(10)) FROM text_data;
DROP TABLE text_data;
Also see this question.
You can't. You need to write a program that will read file content's (or URL's) and store it into the desired field.
Use COPY instead of INSERT
reference: http://www.commandprompt.com/ppbook/x5504#AEN5631