In other statistical programs, it's possible to create a log file that shows the output issued as a result of a command. Is it possible to do something similar in SQL?
In particular, I'd like to have a single .sql file with many queries and to then output each result to a text file.
I'm using PostgreSQL and Navicat.
plpgsql function and COPY
One way would be to put the SQL script into a plpgsql function, where you can write the individual return values to files with COPY and compile a report from intermediary results just like you need it.
This has additional effect that may or may not be desirable. Like, you can grant or revoke permission to the whole function to arbitrary roles. Read about SECURITY DEFINER in the manual. And the syntax will be verified when you save the function - however, only superficially (there are plans to change that in the future). More details in this answer on dba.SE.
Basic example:
CREATE OR REPLACE FUNCTION func()
RETURNS void AS
$BODY$
BEGIN
COPY (SELECT * FROM tbl WHERE foo) TO '/path/to/my/file/tbl.csv';
COPY (SELECT * FROM tbl2 WHERE NOT bar) TO '/path/to/my/file/tbl2.csv';
END;
$BODY$
LANGUAGE plpgsql;
Of course, you need to have the necessary privileges in the database and in the file system.
Call it from the shell:
psql mydb -c 'SELECT func();'
psql switching between meta commands and SQL
#!/bin/sh
BASEDIR='/var/lib/postgresql/outfiles/'
echo "
\\o $OUTDIR/file1.txt \\\\\\\\ SELECT * FROM tbl1;
\\o $OUTDIR/file2.txt \\\\\\\\ SELECT * FROM tbl2;
\\o $OUTDIR/file3.txt \\\\\\\\ SELECT * FROM tbl3;" | psql event -p 5432 -t -A
That's right, 8 backslashes. Results from a double backslash that gets interpreted two times, so you have to double them two times.
I quote the manual about the meta-commands \o:
Saves future query results to the file filename or ...
and \\:
command must be either a command string that is completely parsable by
the server (i.e., it contains no psql-specific features), or a single
backslash command. Thus you cannot mix SQL and psql meta-commands with
this option. To achieve that, you could pipe the string into psql,
like this: echo '\x \\ SELECT * FROM foo;' | psql. (\\ is the
separator meta-command.)
Don't know about navicat, but you can do it with psql. Check the various --echo-X command-line options and the \o command if you just want temporary output to a file.
Related
I have an anonymous function containing a query within a FOR loop that executes 100 times, and I need to save the 100 result sets as 100 files on the remote client (not on the server).
It seems like the psql \copy meta-command should be the way to do this, but I'm at a loss. Something of this form, maybe?
\copy (anonymous_function_w/_FOR_loop_here) to 'filename.txt'
where filename.txt is built from the FOR loop variable's value in each iteration. That's important - the files on the remote client need to be named based on the FOR loop's variable.
Is there any way to pull this off? I suppose an alternative approach would be to UNION all 100 query results into one big result, with the FOR loop's variable value in one field, and then use bash scripting to split it into 100 appropriately named files. But my bash skills are pretty lame. If psql can do the job directly that would be great.
EDIT: I should add that here's what the FOR loop variable looks like:
FOR rec IN SELECT DISTINCT county FROM voter.counties
so the file name would be built from rec.county + '.txt'
The typical approach to this is to use a SQL statement that generates the necessary statements, spool the output into a script file, then run that file.
Something like:
-- prepare for a "plain" output without headers or something similar
\a
\t
-- spool the output into export.sql
\o export.sql
select format('\copy (select * from some_table where county = %L) to ''%s.txt''', county, county)
from (select distinct county from voter.counties) t;
-- turn spooling off
\o
-- run the generated file
\i export.sql
So for each county name in voters.counties the export.sql will contain:
\copy (select * from some_table where county = 'foobar') to 'foobar.txt'
In PostgreSQL 9.5 psql (for use within an interactive session), I wold like to create an alias to a complete SQL statement, analogous to a shell alias. The objective is to just get the output printed on the screen.
If I could enable formatted server output (in Oracle terms) from within a stored procedure, it would look like this:
CREATE or replace FUNCTION print_my_table()
RETURNS void
AS $$
-- somehow enable output here
SELECT * from my_table;
$$ LANGUAGE SQL;
This would be invoked as print_my_table(); (as opposed to SELECT x FROM ...)
I know I can use 'RAISE NOTICE' to print from within a stored procedure, but to do that I would need to reimplement pretty-printing of a table.
Perhaps there is a completely different mechanism to do this?
(my_table stands for a complex SQL statement that collects server data accounting information, or a my_table() stored procedure returning a table)
EDIT
The solution provided by #Abelisto (using psql variables) enables the creation of aliases to arbitrary statements, beyond merely printing the result to the screen.
There is so called internal variables in the psql utility which will be replaced by its content (except inside the string constants):
postgres=# \set foo 'select 1;'
postgres=# :foo
?column?
----------
1
(1 row)
It can be also set by the command line option -v:
psql -v foo='select 1;' -v bar='select 2;'
Create the text file like
\set foo 'select 1;'
\set bar 'select 2;'
\set stringinside 'select $$abc$$;'
and load it using \i command.
Finally you can create the file ~/.psqlrc (its purpose is like ~/.bashrc file) and its content will be automatically executed each time when psql starts.
I am using psql with a PostgreSQL database and the following copy command:
\COPY isa (np1, np2, sentence) FROM 'c:\Downloads\isa.txt' WITH DELIMITER '|'
I get:
ERROR: extra data after last expected column
How can I skip the lines with errors?
You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.
\copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:
COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will
already have received earlier rows in a COPY FROM. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invoke VACUUM to recover the wasted space.
Bold emphasis mine. And:
COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.
COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.
There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.
Solution
Fix your input file instead.
If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.
Related answers with detailed instructions:
How to update selected rows with values from a CSV file in Postgres?
COPY command: copy only specific columns from csv
It is too bad that in 25 years Postgres doesn't have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.
I had to make a work-around this way:
Copy the original table and call it dummy_original_table
in the original table, create a trigger like this:
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$
DECLARE
v_rec RECORD;
BEGIN
-- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
IF v_rec IS NOT NULL THEN
RETURN NULL;
END IF;
BEGIN
INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
ON CONFLICT DO NOTHING;
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
RETURN NULL;
END;
Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table
psql dbname -c \copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E'\t'
Workaround: remove the reported errant line using sed and run \copy again
Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed and run \copy again, e.g.,
#!/bin/bash
bad_line_number=5 # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv
[per the comment from #Botond_Balázs ]
Here's one solution -- import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:
#!/bin/bash
input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
echo "$input_line" > $tmp_file
psql my_database \
-c "\
COPY my_table \
FROM `$tmp_file` \
DELIMITER '|'\
CSV;\
"
done
Additionally, you could modify the script to capture the psql stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line and the captured stdout/stderr to stdin and/or append it to a file.
I have a file with a list of identifying attributes, one on each line. I'd like to use these as part of a conditional select statement to psql. The same query I'm thinking of is:
SELECT * FROM mytable where mykey IN ('contents of file');
I'd like to point the IN construct to the file (which is being generated from another database and script.) Every entry in the file will be in mytable
Is there a way to do this in psql from the command line? My intention is to run this as part of a bash script on our servers.
Is there any way to list files from a folder?
Something like:
select * from pg_ls_dir('/home/christian')
I tried pg_ls_dir but, per documentation:
Only files within the database cluster directory and the log_directory
can be accessed. Use a relative path for files in the cluster
directory, and a path matching the log_directory configuration setting
for log files. Use of these functions is restricted to superusers.
I need to list files from a folder outside the postgres directories, similar to how it's done with COPY.
Using PostgreSQL 9.3, it is possible to avoid the overhead of installing a language extension:
DROP TABLE IF EXISTS files;
CREATE TABLE files(filename text);
COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"';
SELECT * FROM files ORDER BY filename ASC;
Creates a table with 2,000+ rows from [ to zip.
Normally the COPY command requires superuser privileges. Since the path to the file system is hard-coded (i.e., not an unsanitized value from users), it doesn't pose a great security risk to define the function first using a superuser account (e.g., postgres) as follows:
CREATE OR REPLACE FUNCTION files()
RETURNS SETOF text AS
$BODY$
BEGIN
SET client_min_messages TO WARNING;
DROP TABLE IF EXISTS files;
CREATE TEMP TABLE files(filename text);
COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"';
RETURN QUERY SELECT * FROM files ORDER BY filename ASC;
END;
$BODY$
LANGUAGE plpgsql SECURITY DEFINER;
Log in to PostgreSQL using a non-superuser account, then:
SELECT * FROM files();
The same list of results should be returned without any security violation errors.
The SECURITY DEFINER tells PostgreSQL to run the function under the role of the account that was used to create the function. Since it was created using a superuser role, it will execute with superuser permissions, regardless of the role that executes the command.
The SET client_min_messages TO WARNING; tells PostgreSQL to suppress messages if the table cannot be dropped. It's okay to delete this line.
The CREATE TEMP TABLE is used to create a table that does not need to persist over time. If you need a permanent table, remove the TEMP modifier.
The 'find...' command, which could also be /usr/bin/find, lists only files (type -f) and displays only the filename without the leading path separated one filename per line (-printf "%f\n"). Finally, -maxdepth 1 limits the file search to only the specified directory without searching any subdirectories. See find's man page for details.
One disadvantage to this approach is that there doesn't seem to be a way to parameterize the command to execute. It seems that PostgreSQL requires it to be a text string, rather than an expression statement. Perhaps this is for the best as it prevents allowing arbitrary commands to be executed. What you see is what you execute.
Extended version of this answer, function ls_files_extended:
-- Unfortunately that variant only allow use hardcoded path
-- To use user parameter we will use dynamic EXECUTE.
-- Return also file size and allow filtering
--
-- #param path text. Filesystem path for read to
-- #param filter text (default null meaning return all). Where condition to filter files. F.e.: $$filename LIKE '0%'$$
-- #param sort text (default filename).
--
-- Examples of use:
-- 1) Simple call, return all files, sort by filename:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive')
-- 2) Return all, sort by filesize:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', null, 'size ASC')
-- 3) Use filtering and sorting:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', 'filename LIKE ''0%''', 'size ASC')
-- or use $-quoting for easy readability:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', $$filename LIKE '0%'$$, 'size ASC')
CREATE OR REPLACE FUNCTION ls_files_extended(path text, filter text default null, sort text default 'filename')
RETURNS TABLE(filename text, size bigint) AS
$BODY$
BEGIN
SET client_min_messages TO WARNING;
CREATE TEMP TABLE _files(filename text, size bigint) ON COMMIT DROP;
EXECUTE format($$COPY _files FROM PROGRAM 'find %s -maxdepth 1 -type f -printf "%%f\t%%s\n"'$$, path);
RETURN QUERY EXECUTE format($$SELECT * FROM _files WHERE %s ORDER BY %s $$, concat_ws(' AND ', 'true', filter), sort);
END;
$BODY$ LANGUAGE plpgsql SECURITY DEFINER;
It's normally not useful for a SQL client.
Anyway should you need to implement it, that's a typical use case for a script language like plperlu. Example:
CREATE FUNCTION nosecurity_ls(text) RETURNS setof text AS $$
opendir(my $d, $_[0]) or die $!;
while (my $f=readdir($d)) {
return_next($f);
}
return undef;
$$ language plperlu;
That's equivalent to the pg_ls_dir(text) function mentioned in System Administration Functions except for the restrictions.
=> select * from nosecurity_ls('/var/lib/postgresql/9.1/main') as ls;
ls
-----------------
pg_subtrans
pg_serial
pg_notify
pg_clog
pg_multixact
..
base
pg_twophase
etc...