I want to load the data from a flat file with delimiter "~,~" into a PostgreSQL table. I have tried it as below but looks like there is a restriction for the delimiter. If COPY statement doesn't allow multiple chars for delimiter, is there any alternative to do this?
metadb=# \COPY public.CME_DATA_STAGE_TRANS FROM 'E:\Infor\Outbound_Marketing\7.2.1\EM\metadata\pgtrans.log' WITH DELIMITER AS '~,~'
ERROR: COPY delimiter must be a single one-byte character
\copy: ERROR: COPY delimiter must be a single one-byte character
If you are using Vertica, you could use E'\t'or U&'\0009'
To indicate a non-printing delimiter character (such as a tab),
specify the character in extended string syntax (E'...'). If your
database has StandardConformingStrings enabled, use a Unicode string
literal (U&'...'). For example, use either E'\t' or U&'\0009' to
specify tab as the delimiter.
Unfortunatelly there is no way to load flat file with multiple characters delimiter ~,~ in Postgres unless you want to modify source code (and recompile of course) by yourself in some (terrific) way:
/* Only single-byte delimiter strings are supported. */
if (strlen(cstate->delim) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must be a single one-byte character")));
What you want is to preprocess your input file with some external tool, for example sed might to be best companion on GNU/Linux platfom, for example:
sed s/~,~/\\t/g inputFile
The obvious thing to do is what all other answers advised. Edit import file. I would do that, too.
However, as a proof of concept, here are two ways to accomplish this without additional tools.
1) General solution
CREATE OR REPLACE FUNCTION f_import_file(OUT my_count integer)
RETURNS integer AS
$BODY$
DECLARE
myfile text; -- read xml file into that var.
datafile text := '\path\to\file.txt'; -- !pg_read_file only accepts relative path in database dir!
BEGIN
myfile := pg_read_file(datafile, 0, 100000000); -- arbitrary 100 MB max.
INSERT INTO public.my_tbl
SELECT ('(' || regexp_split_to_table(replace(myfile, '~,~', ','), E'\n') || ')')::public.my_tbl;
-- !depending on file format, you might need additional quotes to create a valid format.
GET DIAGNOSTICS my_count = ROW_COUNT;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
This uses a number of pretty advanced features. If anybody is actually interested and needs an explanation, leave a comment to this post and I will elaborate.
2) Special case
If you can guarantee that '~' is only present in the delimiter '~,~', then you can go ahead with a plain COPY in this special case. Just treat ',' in '~,~' as an additional columns.
Say, your table looks like this:
CREATE TABLE foo (a int, b int, c int);
Then you can (in one transaction):
CREATE TEMP TABLE foo_tmp ON COMMIT DROP (
a int, tmp1 "char"
,b int, tmp2 "char"
,c int);
COPY foo_tmp FROM '\path\to\file.txt' WITH DELIMITER AS '~';
ALTER TABLE foo_tmp DROP COLUMN tmp1;
ALTER TABLE foo_tmp DROP COLUMN tmp2;
INSERT INTO foo SELECT * FROM foo_tmp;
Not quite sure if you're looking for a postgresql solution or just a general one.
If it were me, I would open up a copy of vim (or gvim) and run the commend :%s/~,~/~/g
That replaces all "~,~" with "~".
you can use a single character delimiter, open notepad press ctrl+h replace ~,~ with something will not interfere. like |
Related
I am trying to insert some text data into a table in SQL Server 9.
The text includes a single quote '.
How do I escape that?
I tried using two single quotes, but it threw me some errors.
eg. insert into my_table values('hi, my name''s tim.');
Single quotes are escaped by doubling them up, just as you've shown us in your example. The following SQL illustrates this functionality. I tested it on SQL Server 2008:
DECLARE #my_table TABLE (
[value] VARCHAR(200)
)
INSERT INTO #my_table VALUES ('hi, my name''s tim.')
SELECT * FROM #my_table
Results
value
==================
hi, my name's tim.
If escaping your single quote with another single quote isn't working for you (like it didn't for one of my recent REPLACE() queries), you can use SET QUOTED_IDENTIFIER OFF before your query, then SET QUOTED_IDENTIFIER ON after your query.
For example
SET QUOTED_IDENTIFIER OFF;
UPDATE TABLE SET NAME = REPLACE(NAME, "'S", "S");
SET QUOTED_IDENTIFIER ON;
-- set OFF then ON again
How about:
insert into my_table values('hi, my name' + char(39) + 's tim.')
Many of us know that the Popular Method of Escaping Single Quotes is by Doubling them up easily like below.
PRINT 'It''s me, Arul.';
we are going to look on some other alternate ways of escaping the single quotes.
1. UNICODE Characters
39 is the UNICODE character of Single Quote. So we can use it like below.
PRINT 'Hi,it'+CHAR(39)+'s Arul.';
PRINT 'Helo,it'+NCHAR(39)+'s Arul.';
2. QUOTED_IDENTIFIER
Another simple and best alternate solution is to use QUOTED_IDENTIFIER.
When QUOTED_IDENTIFIER is set to OFF, the strings can be enclosed in double quotes.
In this scenario, we don’t need to escape single quotes.
So,this way would be very helpful while using lot of string values with single quotes.
It will be very much helpful while using so many lines of INSERT/UPDATE scripts where column values having single quotes.
SET QUOTED_IDENTIFIER OFF;
PRINT "It's Arul."
SET QUOTED_IDENTIFIER ON;
CONCLUSION
The above mentioned methods are applicable to both AZURE and On Premises .
2 ways to work around this:
for ' you can simply double it in the string, e.g.
select 'I''m happpy' -- will get: I'm happy
For any charactor you are not sure of: in sql server you can get any char's unicode by select unicode(':') (you keep the number)
So this case you can also select 'I'+nchar(39)+'m happpy'
The doubling up of the quote should have worked, so it's peculiar that it didn't work for you; however, an alternative is using double quote characters, instead of single ones, around the string. I.e.,
insert into my_table values("hi, my name's tim.");
Also another thing to be careful of is whether or not it is really stored as a classic ASCII ' (ASCII 27) or Unicode 2019 (which looks similar, but not the same). This isn't a big deal on inserts, but it can mean the world on selects and updates. If it's the unicode value then escaping the ' in a WHERE clause (e.g where blah = 'Workers''s Comp') will return like the value you are searching for isn't there if the ' in "Worker's Comp" is actually the unicode value.If your client application supports free-key, as well as copy and paste based input, it could be Unicode in some rows, and ASCII in others!
A simple way to confirm this is by doing some kind of open ended query that will bring back the value you are searching for, and then copy and paste that into notepad++ or some other unicode supporting editor. The differing appearance between the ascii value and the unicode one should be obvious to the eyes, but if you lean towards the anal, it will show up as 27 (ascii) or 92 (unicode) in a hex editor.
The following syntax will escape you ONLY ONE quotation mark:
SELECT ''''
The result will be a single quote. Might be very helpful for creating dynamic SQL :).
Double quotes option helped me
SET QUOTED_IDENTIFIER OFF;
insert into my_table values("hi, my name's tim.");
SET QUOTED_IDENTIFIER ON;
This should work
DECLARE #singleQuote CHAR
SET #singleQuote = CHAR(39)
insert into my_table values('hi, my name'+ #singleQuote +'s tim.')
Just insert a ' before anything to be inserted. It will be like a escape character in sqlServer
Example:
When you have a field as, I'm fine.
you can do:
UPDATE my_table SET row ='I''m fine.';
I had the same problem, but mine was not based of static data in the SQL code itself, but from values in the data.
This code lists all the columns names and data types in my database:
SELECT DISTINCT QUOTENAME(COLUMN_NAME),DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS
But some column names actually have a single-quote embedded in the name of the column!, such as ...
[MyTable].[LEOS'DATACOLUMN]
To process these, I had to use the REPLACE function along with the suggested QUOTED_IDENTIFIER setting. Otherwise it would be a syntax error, when the column is used in a dynamic SQL.
SET QUOTED_IDENTIFIER OFF;
SET #sql = 'SELECT DISTINCT ''' + #TableName + ''',''' + REPLACE(#ColumnName,"'","''") + ...etc
SET QUOTED_IDENTIFIER ON;
The STRING_ESCAPE funtion can be used on newer versions of SQL Server
This should work: use a back slash and put a double quote
"UPDATE my_table SET row =\"hi, my name's tim.\";
I inserted a bunch of rows with a text field like content='...\n...\n...'.
I didn't use e in front, like conent=e'...\n...\n..., so now \n is not actually displayed as a newline - it's printed as text.
How do I fix this, i.e. how to change every row's content field from '...' to e'...'?
The syntax variant E'string' makes Postgres interpret the given string as Posix escape string. \n encoding a newline is only one of many interpreted escape sequences (even if the most common one). See:
Insert text with single quotes in PostgreSQL
To "re-evaluate" your Posix escape string, you could use a simple function with dynamic SQL like this:
CREATE OR REPLACE FUNCTION f_eval_posix_escapes(INOUT _string text)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'SELECT E''' || _string || '''' INTO _string;
END
$func$;
WARNING 1: This is inherently unsafe! We have to evaluate input strings dynamically without quoting and escaping, which allows SQL injection. Only use this in a safe environment.
WARNING 2: Don't apply repeatedly. Or it will misinterpret your actual string with genuine \ characters, etc.
WARNING 3: This simple function is imperfect as it cannot cope with nested single quotes properly. If you have some of those, consider instead:
Unescape a string with escaped newlines and carriage returns
Apply:
UPDATE tbl
SET content = f_eval_posix_escapes(content)
WHERE content IS DISTINCT FROM f_eval_posix_escapes(content);
db<>fiddle here
Note the added WHERE clause to skip updates that would not change anything. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Use REPLACE in an update query. Something like this: (I'm on mobile so please ignore any typo or syntax erro)
UPDATE table
SET
column = REPLACE(column, '\n', e'\n')
We are exporting data from Postgres 9.3 into a text file for ingestion by Spark.
We would like to use the ASCII 31 field separator character as a delimiter instead of \t so that we don't have to worry about escaping issues.
We can do so in a shell script like this:
#!/bin/bash
DELIMITER=$'\x1F'
echo "copy ( select * from table limit 1) to STDOUT WITH DELIMITER '${DELIMITER}'" | (psql ...) > /tmp/ascii31
But we're wondering, is it possible to specify a non-printable glyph as a delimiter in "pure" postgres?
edit: we attempted to use the postgres escaping convention per http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html
warehouse=> copy ( select * from table limit 1) to STDOUT WITH DELIMITER '\x1f';
and received
ERROR: COPY delimiter must be a single one-byte character
Try prepending E before the sequence you're trying to use as a delimter. For example E'\x1f' instead of '\x1f'. Without the E PostgreSQL will read '\x1f' as four separate characters and not a hexadecimal escape sequence, hence the error message.
See the PostgreSQL manual on "String Constants with C-style Escapes" for more information.
From my testing, both of the following work:
echo "copy (select 1 a, 2 b) to stdout with delimiter u&'\\001f'"| psql;
echo "copy (select 1 a, 2 b) to stdout with delimiter e'\\x1f'"| psql;
I've extracted a small file from Actian Matrix (a fork of Amazon Redshift, both derivatives of postgres), using this notation for ASCII character code 30, "Record Separator".
unload ('SELECT btrim(class_cd) as class_cd, btrim(class_desc) as class_desc
FROM transport.stg.us_fmcsa_carrier_classes')
to '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter as '\036' leader;
This is an example of how this file looks in VI:
C^^Private Property
D^^Private Passenger Business
E^^Private Passenger Non-Business
I then moved this file over to a machine hosting PostgreSQL 9.5 via sftp, and used the following copy command, which seems to work well:
copy fmcsa.carrier_classes
from '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter u&'\001E';
Each derivative of postgres, and postgres itself seems to prefer a slightly different notation. Too bad we don't have a single standard!
I need to write a file to disk from postgres that has character string of a backslash immediately followed by a forward slash \/
Code similar to this has not worked:
drop table if exists test;
create temporary table test (linetext text);
insert into test values ('\/\/foo foo foo\/bar\/bar');
copy (select linetext from test) to '/filepath/postproductionscript.sh';
The above code yields \\/\\/foo foo foo\\/bar\\/bar ... it inserts an extra backslash.
When you view the temp table, the string is correctly viewed as \/\/, so I am not sure where or when the text is changed into \\/\\/
I've tried doubling the \, variations of E before the string, and quote_literal() without luck.
I have note found a solution here Postgres Manual
Running Postgres 9.2, encoded UTF-8.
The problem is that COPY is not intended to write out plain-text files. It is intended to write out files that can be read back by COPY. And the semi-internal encoding that it uses does some backslash escaping.
For what you want to do, you need to write some custom code. Either use a normal client library to read the query results and write them to a file, or, if you want to do it in-server, use something like PL/Perl or PL/Python.
The \ excaping is only recognised if the stringliteral is prefixed with E , otherwise the standard_conforming_strings setting (or the like) is respected (ANSI-SQL has a different way of string escaping, probably stemming from COBOL;-).
drop table if exists test;
create temporary table test (linetext text);
insert into test values ( E'\/\/foo foo foo\/bar\/bar');
copy (select linetext from test) to '/tmp/postproductionscript.sh';
UPATE: an ugly hack is to use .csv format and still use \t as delimter.
The #!/bin/sh as a shebang headerline should be consdered a feature
-- without a header line
drop table if exists test;
create temporary table test (linetext text);
insert into test values ( '\/\/foo foo foo\/bar\/bar');
copy (select linetext AS "#linetext" from test) to '/tmp/postproductionscript_c.sh'
WITH CSV
DELIMITER E'\t'
;
-- with a shebang header line
drop table if exists test;
create temporary table test (linetext text);
insert into test values ( '\/\/foo foo foo\/bar\/bar');
copy (select linetext AS "#/bin/sh" from test) to '/tmp/postproductionscript_h.sh'
WITH CSV
HEADER
DELIMITER E'\t'
;
I have a column url encoded with urlencode in php. I wish to make a select like this
SELECT some_mix_of_functions(...) AS Decoded FROM table
Replace is not a good solution because I will have to add all the decoding by hand. Any other solution to get the desire result ?
Yes you can:
CREATE OR REPLACE FUNCTION decode_url_part(p varchar) RETURNS varchar AS $$
SELECT convert_from(CAST(E'\\x' || string_agg(CASE WHEN length(r.m[1]) = 1 THEN encode(convert_to(r.m[1], 'SQL_ASCII'), 'hex') ELSE substring(r.m[1] from 2 for 2) END, '') AS bytea), 'UTF8')
FROM regexp_matches($1, '%[0-9a-f][0-9a-f]|.', 'gi') AS r(m);
$$ LANGUAGE SQL IMMUTABLE STRICT;
This creates a function decode_url_part, then you can use it like that:
SELECT decode_url_part('your%20urlencoded%20string')
Or you can just use the mix of functions and subqueries from the body of the above function.
This doesn't handle '+' characters (representing whitespace), but I guess adding this is quite easy (if you ever need it).
Also, this assumes utf-8 encoding for non-ascii characters, but you can replace 'UTF8' with your own encoding if you want.
It should be noted that the above code relies on undocumented postgresql feature, namely that the results of regexp_matches function are processed in the order they occur in the original string (which is natural, but not specified in docs).
As Pablo Santa Cruz notes, string_agg is a PostgreSQL 9.0 aggregate function. The equivalent code below doesn't use it (I hope it works for 8.x):
SELECT convert_from(CAST(E'\\x' || array_to_string(ARRAY(
SELECT CASE WHEN length(r.m[1]) = 1 THEN encode(convert_to(r.m[1], 'SQL_ASCII'), 'hex') ELSE substring(r.m[1] from 2 for 2) END
FROM regexp_matches($1, '%[0-9a-f][0-9a-f]|.', 'gi') AS r(m)
), '') AS bytea), 'UTF8');
Not out of the box. But you could create a pl/perl function that wraps the perl equivalent. (Or a pl/php function).