Uploading data to RedShift using COPY

Uploading data to RedShift using COPY - copy

I am trying to upload data to RedShift using COPY command.
On this row:
4072462|10013868|default|2015-10-14 21:23:18.0|0|'A=0
I am getting this error:
Delimited value missing end quote
This is the COPY command:
copy test
from 's3://test/test.gz'
credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx' removequotes escape gzip

First, I hope you know why you are getting the mentioned error: You have a a single quote in one of the column values. While using the removequotes option, Redshift documentation clearly says that:
If a string has a beginning single or double quotation mark but no corresponding ending mark, the COPY command fails to load that row and returns an error.
One thing is certain: removequotes is certainly not what you are looking for.
Second, so what are your options?
If preprocessing the S3 file is in your control, consider using the escape option. Per the documentation,
When this parameter is specified, the backslash character (\) in input data is treated as an escape character.
So your input row in S3 should change to something like:
4072462|10013868|default|2015-10-14 21:23:18.0|0|\'A=0
See if the CSV DELIMITER '|' works for you. Check documentation here.

Related

CR/LF as line terminator in Synapse data flow using delimited text as a sink

I am using a data flow in Synapse, the sink is Delimited text. I have to provide the output to a system that expects CR/LF (\r\n) as the row terminator.
Default (\r,\n, or \r\n)
returns \n (LF) only as the row terminator in all of my tests. Has anyone had this requirement and found a work around?

In DataFlow mapping for separating rows in Delimited text format there are certain default values:
To Read from file: "\r\n", "\r," or "\n"
To Write in file: "\n"
To workaround try by manually adding in dataflow script and running the pipeline.
rowDelimiter: '\r\n'

[pratiklad] (https://stackoverflow.com/users/18043699/pratiklad) has it right. As odd as it sounds CR/LF is not supported in data flow in sink. Seems the best I can do, if I am using a data flow, is to out to my storage account, then use copy activity to open the file and re-write it adding the CR/LF (\r\n).

You can concat a '\r' on the last column of your dataset and then it will read \r\n on the text or csv files.

What options would load an escape character into Redshift?

Having a tough time playing with Redshift's COPY options to load a field that has an escape character immediately followed by a delimiter ('|'). Data looks like this:
00b9e290000f8350b9c780832a210000|MY DATA\|AB
So that has 3 fields that I'm trying to load. When I run with just ESCAPE, Redshift seems to properly add \ to doubleescape, but then the pipe delimiter gets ignored. So Redshift ends up trying to load all of the following into the second field: MY DATA|AB. Error message is that the delimiter was not found, since that's read as the second field with no following delimiter
I've tried running COPY with just the ESCAPE option, the CSV + ESCAPE options and a few others with no luck. Is there anything else I should try? Or should I be adding some pre-process step to doubleescape?

postgresql - pgloader - quotes handling

I am new to postgresql and just starting to use it. I am trying to load a file into a table and facing some issues.
Sample data - the file file1.RPT contains data in the below format
"Bharath"|Kumar|Krishnan
abc"|def|ghi
qwerty|asdfgh|lkjhg
Below is the load script that is used
LOAD CSV
INTO table1
....
WITH truncate,
fields optionally enclosed by '"',
fields escaped by '"'
fields terminated by '|'
....
However, the above script is not working and is not loading any data into the table. I am not sure whats the issue here. My understanding is that first row data has to be successfully loaded (since I have given optionally enclosed by) and the second row also must be loaded (since I am trying to escape the double quote).
Request help in getting the same rectified.
Thank you.

We cannot escape and optionally quote the same character. If the double-quote will be part of the data, then it can be ignored using field not enclosed option. The default option is field optionally enclosed by double-quote.

Apparently, you're not escaping the quote in the second row, because either you must use a backslash (or another quoting character) before:
abc\"|def|ghi
or you should enclose the entire line with quote
another alternative is to accept to have quotes in the first field, then you should use the following:
fields not enclosed
in your load script

Single quotes stored in a Postgres database

I've been working on an Express app that has a form designed to hold lines and quotes.
Some of the lines will have single quotes('), but overall it's able to store the info and I'm able to back it up and store it without any problems. Now, when I want do pg_dump and have the database put into an SQL file, the quotes seem to cause some things to appear a bit wonky in my text editor.
Would I have to create a method to change all the single quotation marks into double, or can I leave it as is and be able to upload it back to the database without causing major issues. I know people will continue to enter in lines that contain either single or double quotations, so I would like to know any solution or answer that would help greatly.

Single quotes in character data types are no problem at all. You just need to escape them properly in string literals.
To write data with INSERT you need to quote all string literals according to SQL syntax rules. There are tools to do that for you ...
Insert text with single quotes in PostgreSQL
However, pg_dump takes care of escaping automatically. The default mode produces text output to be re-imported with COPY (much faster than INSERT), and single quotes have no special meaning there. And in (non-default) csv mode, the default quote character is double-quote (") and configurable. The manual:
QUOTE
Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
The format is defined by rules for COPY and not by SQL syntax rules.

Postgres using FOREIGN TABLE and data include "\"

My text file look like:
\home\stanley:123456789
c:/kobe:213
\tej\home\ant:222312
and create FOREIGN TABLE Steps:
CREATE FOREIGN TABLE file_check(txt text) SERVER file_server OPTIONS (format 'text', filename '/home/stanley/check.txt');
after select file_check (using: select * from file_check)
my console show me
homestanley:123456789
c:/kobe:213
ejhomeant:222312
Anyone can help me??

The file foreign-data-wrapper uses the same rules as COPY (presumably because it's the same code underneath). You've got to consider that backslash is an escape character...
http://www.postgresql.org/docs/9.2/static/sql-copy.html
Any other backslashed character that is not mentioned in the above table will be taken to represent itself. However, beware of adding backslashes unnecessarily, since that might accidentally produce a string matching the end-of-data marker (.) or the null string (\N by default). These strings will be recognized before any other backslash processing is done.
So you'll either need to double-up the backslashes or perhaps try it as a single-column csv file and see if that helps

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Uploading data to RedShift using COPY - copy

Related

CR/LF as line terminator in Synapse data flow using delimited text as a sink

What options would load an escape character into Redshift?

postgresql - pgloader - quotes handling

Single quotes stored in a Postgres database

Postgres using FOREIGN TABLE and data include "\"

Categories

Resources