Escaping Backslashes in Postgresql - postgresql

I need to write a file to disk from postgres that has character string of a backslash immediately followed by a forward slash \/
Code similar to this has not worked:
drop table if exists test;
create temporary table test (linetext text);
insert into test values ('\/\/foo foo foo\/bar\/bar');
copy (select linetext from test) to '/filepath/postproductionscript.sh';
The above code yields \\/\\/foo foo foo\\/bar\\/bar ... it inserts an extra backslash.
When you view the temp table, the string is correctly viewed as \/\/, so I am not sure where or when the text is changed into \\/\\/
I've tried doubling the \, variations of E before the string, and quote_literal() without luck.
I have note found a solution here Postgres Manual
Running Postgres 9.2, encoded UTF-8.

The problem is that COPY is not intended to write out plain-text files. It is intended to write out files that can be read back by COPY. And the semi-internal encoding that it uses does some backslash escaping.
For what you want to do, you need to write some custom code. Either use a normal client library to read the query results and write them to a file, or, if you want to do it in-server, use something like PL/Perl or PL/Python.

The \ excaping is only recognised if the stringliteral is prefixed with E , otherwise the standard_conforming_strings setting (or the like) is respected (ANSI-SQL has a different way of string escaping, probably stemming from COBOL;-).
drop table if exists test;
create temporary table test (linetext text);
insert into test values ( E'\/\/foo foo foo\/bar\/bar');
copy (select linetext from test) to '/tmp/postproductionscript.sh';
UPATE: an ugly hack is to use .csv format and still use \t as delimter.
The #!/bin/sh as a shebang headerline should be consdered a feature
-- without a header line
drop table if exists test;
create temporary table test (linetext text);
insert into test values ( '\/\/foo foo foo\/bar\/bar');
copy (select linetext AS "#linetext" from test) to '/tmp/postproductionscript_c.sh'
WITH CSV
DELIMITER E'\t'
;
-- with a shebang header line
drop table if exists test;
create temporary table test (linetext text);
insert into test values ( '\/\/foo foo foo\/bar\/bar');
copy (select linetext AS "#/bin/sh" from test) to '/tmp/postproductionscript_h.sh'
WITH CSV
HEADER
DELIMITER E'\t'
;

Related

Postgresql load file csv with new line characters

I'm new to this very interesting blog. This is my problem: I have to load a csv file with three columns (field1, field2 and field3), in a postgresql table.
In the string contained in the field1 column there are new line characters.
I use sql statements:
COPY test (regexp_replace (field1,, E '[\\n\\r] +', '', 'g'),
field2, field3)
from 'D:\zzz\aaa20.csv' WITH DELIMITER '|';
but it reports me an error.
How can I remove new line characters?
If the newlines are properly escaped by quoting the value, this should not be a problem.
If your data are corrupted CSV files with unescaped newlines, you will have to do some pre-processing. If you are willing to give the database user permission to execute programs on the database server, you could use
COPY mytable FROM PROGRAM 'demangle D:\zzz\aaa20.csv' (FORMAT 'csv');
Here, demangle is a program or script that reads the file, fixes the data and outputs them to standard output. Since you are on Windows, you probably don't have access to tools like sed and awk that can be used for such purposes, and you may have to write your own.
So, this is syntax of COPY command:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
You can only add optional list of column names, and not function calls (regexp_replace in your case) or some other complex constructions.
You can create some temporal table import data into it and than copy data in your table using ordinal INSERT...SELECT query.

Using ASCII 31 field separator character as Postgresql COPY delimiter

We are exporting data from Postgres 9.3 into a text file for ingestion by Spark.
We would like to use the ASCII 31 field separator character as a delimiter instead of \t so that we don't have to worry about escaping issues.
We can do so in a shell script like this:
#!/bin/bash
DELIMITER=$'\x1F'
echo "copy ( select * from table limit 1) to STDOUT WITH DELIMITER '${DELIMITER}'" | (psql ...) > /tmp/ascii31
But we're wondering, is it possible to specify a non-printable glyph as a delimiter in "pure" postgres?
edit: we attempted to use the postgres escaping convention per http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html
warehouse=> copy ( select * from table limit 1) to STDOUT WITH DELIMITER '\x1f';
and received
ERROR: COPY delimiter must be a single one-byte character
Try prepending E before the sequence you're trying to use as a delimter. For example E'\x1f' instead of '\x1f'. Without the E PostgreSQL will read '\x1f' as four separate characters and not a hexadecimal escape sequence, hence the error message.
See the PostgreSQL manual on "String Constants with C-style Escapes" for more information.
From my testing, both of the following work:
echo "copy (select 1 a, 2 b) to stdout with delimiter u&'\\001f'"| psql;
echo "copy (select 1 a, 2 b) to stdout with delimiter e'\\x1f'"| psql;
I've extracted a small file from Actian Matrix (a fork of Amazon Redshift, both derivatives of postgres), using this notation for ASCII character code 30, "Record Separator".
unload ('SELECT btrim(class_cd) as class_cd, btrim(class_desc) as class_desc
FROM transport.stg.us_fmcsa_carrier_classes')
to '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter as '\036' leader;
This is an example of how this file looks in VI:
C^^Private Property
D^^Private Passenger Business
E^^Private Passenger Non-Business
I then moved this file over to a machine hosting PostgreSQL 9.5 via sftp, and used the following copy command, which seems to work well:
copy fmcsa.carrier_classes
from '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter u&'\001E';
Each derivative of postgres, and postgres itself seems to prefer a slightly different notation. Too bad we don't have a single standard!

PG COPY error: invalid input syntax for integer

Running COPY results in ERROR: invalid input syntax for integer: "" error message for me. What am I missing?
My /tmp/people.csv file:
"age","first_name","last_name"
"23","Ivan","Poupkine"
"","Eugene","Pirogov"
My /tmp/csv_test.sql file:
CREATE TABLE people (
age integer,
first_name varchar(20),
last_name varchar(20)
);
COPY people
FROM '/tmp/people.csv'
WITH (
FORMAT CSV,
HEADER true,
NULL ''
);
DROP TABLE people;
Output:
$ psql postgres -f /tmp/sql_test.sql
CREATE TABLE
psql:sql_test.sql:13: ERROR: invalid input syntax for integer: ""
CONTEXT: COPY people, line 3, column age: ""
DROP TABLE
Trivia:
PostgreSQL 9.2.4
ERROR: invalid input syntax for integer: ""
"" isn't a valid integer. PostgreSQL accepts unquoted blank fields as null by default in CSV, but "" would be like writing:
SELECT ''::integer;
and fail for the same reason.
If you want to deal with CSV that has things like quoted empty strings for null integers, you'll need to feed it to PostgreSQL via a pre-processor that can neaten it up a bit. PostgreSQL's CSV input doesn't understand all the weird and wonderful possible abuses of CSV.
Options include:
Loading it in a spreadsheet and exporting sane CSV;
Using the Python csv module, Perl Text::CSV, etc to pre-process it;
Using Perl/Python/whatever to load the CSV and insert it directly into the DB
Using an ETL tool like CloverETL, Talend Studio, or Pentaho Kettle
I think it's better to change your csv file like:
"age","first_name","last_name"
23,Ivan,Poupkine
,Eugene,Pirogov
It's also possible to define your table like
CREATE TABLE people (
age varchar(20),
first_name varchar(20),
last_name varchar(20)
);
and after copy, you can convert empty strings:
select nullif(age, '')::int as age, first_name, last_name
from people
Just came across this while looking for a solution and wanted to add I was able to solve the issue by adding the "null" parameter to the copy_from call:
cur.copy_from(f, tablename, sep=',', null='')
I got this error when loading '|' separated CSV file although there were no '"' characters in my input file. It turned out that I forgot to specify FORMAT:
COPY ... FROM ... WITH (FORMAT CSV, DELIMITER '|').
Use the below command to copy data from CSV in a single line without casting and changing your datatype.
Please replace "NULL" by your string which creating error in copy data
copy table_name from 'path to csv file' (format csv, null "NULL", DELIMITER ',', HEADER);
I had this same error on a postgres .sql file with a COPY statement, but my file was tab-separated instead of comma-separated and quoted.
My mistake was that I eagerly copy/pasted the file contents from github, but in that process all the tabs were converted to spaces, hence the error. I had to download and save the raw file to get a good copy.
CREATE TABLE people (
first_name varchar(20),
age integer,
last_name varchar(20)
);
"first_name","age","last_name"
Ivan,23,Poupkine
Eugene,,Pirogov
copy people from 'file.csv' with (delimiter ';', null '');
select * from people;
Just in first column.....
Ended up doing this using csvfix:
csvfix map -fv '' -tv '0' /tmp/people.csv > /tmp/people_fixed.csv
In case you know for sure which columns were meant to be integer or float, you can specify just them:
csvfix map -f 1 -fv '' -tv '0' /tmp/people.csv > /tmp/people_fixed.csv
Without specifying the exact columns, one may experience an obvious side-effect, where a blank string will be turned into a string with a 0 character.
this ought to work without you modifying the source csv file:
alter table people alter column age type text;
copy people from '/tmp/people.csv' with csv;
There is a way to solve "", the quoted null string as null in integer column,
use FORCE_NULL option :
\copy table_name FROM 'file.csv' with (FORMAT CSV, FORCE_NULL(column_name));
see postgresql document, https://www.postgresql.org/docs/current/static/sql-copy.html
All in python (using psycopg2), create the empty table first then use copy_expert to load the csv into it. It should handle for empty values.
import psycopg2
conn = psycopg2.connect(host="hosturl", database="db_name", user="username", password="password")
cur = conn.cursor()
cur.execute("CREATE TABLE schema.destination_table ("
"age integer, "
"first_name varchar(20), "
"last_name varchar(20)"
");")
with open(r'C:/tmp/people.csv', 'r') as f:
next(f) # Skip the header row. Or remove this line if csv has no header.
conn.cursor.copy_expert("""COPY schema.destination_table FROM STDIN WITH (FORMAT CSV)""", f)
Incredibly, my solution to the same error was to just re-arrange the columns. For anyone else doing the above solutions and still not getting past the error.
I apparently had to arrange the columns in my CSV file to match the same sequence in the table listing in PGADmin.

How to treat a comma as text in output?

There's 1 column that contains commas. When I output my query to csv, these commas break the csv format. What I've been doing to avoid this is a simple
replace(A."Sales Rep",',','')
Is there a better way of doing this so that I can actually get the commas in the final output without breaking the csv file?
Thanks!
You can use the COPY command to get PostgreSQL to build the CSV for you:
COPY -- copy data between a file and a table
Something like one of these:
copy your_table to 'filename' csv
copy your_table to 'filename' csv force quote *
copy your_table to stdout csv force quote *
copy your_table to stdout csv force quote * header
...
You have to be the super user to copy to a filename though. If you're inside psql, you can use the \copy command:
Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system.
The syntax is pretty much the same:
\copy your_table to 'filename.csv' csv force quote * header
...
Quote the fields with "
a,this has a , in it,b
would become
a,"this has a, in it",b
and if the fields have BOTH a , and a ", double the quotes:
a,this has a " and , in it,b
becomes
a,"this has a "" and , in it",b

ERROR: COPY delimiter must be a single one-byte character

I want to load the data from a flat file with delimiter "~,~" into a PostgreSQL table. I have tried it as below but looks like there is a restriction for the delimiter. If COPY statement doesn't allow multiple chars for delimiter, is there any alternative to do this?
metadb=# \COPY public.CME_DATA_STAGE_TRANS FROM 'E:\Infor\Outbound_Marketing\7.2.1\EM\metadata\pgtrans.log' WITH DELIMITER AS '~,~'
ERROR: COPY delimiter must be a single one-byte character
\copy: ERROR: COPY delimiter must be a single one-byte character
If you are using Vertica, you could use E'\t'or U&'\0009'
To indicate a non-printing delimiter character (such as a tab),
specify the character in extended string syntax (E'...'). If your
database has StandardConformingStrings enabled, use a Unicode string
literal (U&'...'). For example, use either E'\t' or U&'\0009' to
specify tab as the delimiter.
Unfortunatelly there is no way to load flat file with multiple characters delimiter ~,~ in Postgres unless you want to modify source code (and recompile of course) by yourself in some (terrific) way:
/* Only single-byte delimiter strings are supported. */
if (strlen(cstate->delim) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must be a single one-byte character")));
What you want is to preprocess your input file with some external tool, for example sed might to be best companion on GNU/Linux platfom, for example:
sed s/~,~/\\t/g inputFile
The obvious thing to do is what all other answers advised. Edit import file. I would do that, too.
However, as a proof of concept, here are two ways to accomplish this without additional tools.
1) General solution
CREATE OR REPLACE FUNCTION f_import_file(OUT my_count integer)
RETURNS integer AS
$BODY$
DECLARE
myfile text; -- read xml file into that var.
datafile text := '\path\to\file.txt'; -- !pg_read_file only accepts relative path in database dir!
BEGIN
myfile := pg_read_file(datafile, 0, 100000000); -- arbitrary 100 MB max.
INSERT INTO public.my_tbl
SELECT ('(' || regexp_split_to_table(replace(myfile, '~,~', ','), E'\n') || ')')::public.my_tbl;
-- !depending on file format, you might need additional quotes to create a valid format.
GET DIAGNOSTICS my_count = ROW_COUNT;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
This uses a number of pretty advanced features. If anybody is actually interested and needs an explanation, leave a comment to this post and I will elaborate.
2) Special case
If you can guarantee that '~' is only present in the delimiter '~,~', then you can go ahead with a plain COPY in this special case. Just treat ',' in '~,~' as an additional columns.
Say, your table looks like this:
CREATE TABLE foo (a int, b int, c int);
Then you can (in one transaction):
CREATE TEMP TABLE foo_tmp ON COMMIT DROP (
a int, tmp1 "char"
,b int, tmp2 "char"
,c int);
COPY foo_tmp FROM '\path\to\file.txt' WITH DELIMITER AS '~';
ALTER TABLE foo_tmp DROP COLUMN tmp1;
ALTER TABLE foo_tmp DROP COLUMN tmp2;
INSERT INTO foo SELECT * FROM foo_tmp;
Not quite sure if you're looking for a postgresql solution or just a general one.
If it were me, I would open up a copy of vim (or gvim) and run the commend :%s/~,~/~/g
That replaces all "~,~" with "~".
you can use a single character delimiter, open notepad press ctrl+h replace ~,~ with something will not interfere. like |