What options would load an escape character into Redshift? - amazon-redshift

Having a tough time playing with Redshift's COPY options to load a field that has an escape character immediately followed by a delimiter ('|'). Data looks like this:
00b9e290000f8350b9c780832a210000|MY DATA\|AB
So that has 3 fields that I'm trying to load. When I run with just ESCAPE, Redshift seems to properly add \ to doubleescape, but then the pipe delimiter gets ignored. So Redshift ends up trying to load all of the following into the second field: MY DATA|AB. Error message is that the delimiter was not found, since that's read as the second field with no following delimiter
I've tried running COPY with just the ESCAPE option, the CSV + ESCAPE options and a few others with no luck. Is there anything else I should try? Or should I be adding some pre-process step to doubleescape?

Related

CR/LF as line terminator in Synapse data flow using delimited text as a sink

I am using a data flow in Synapse, the sink is Delimited text. I have to provide the output to a system that expects CR/LF (\r\n) as the row terminator.
Default (\r,\n, or \r\n)
returns \n (LF) only as the row terminator in all of my tests. Has anyone had this requirement and found a work around?
In DataFlow mapping for separating rows in Delimited text format there are certain default values:
To Read from file: "\r\n", "\r," or "\n"
To Write in file: "\n"
To workaround try by manually adding in dataflow script and running the pipeline.
rowDelimiter: '\r\n'
[pratiklad] (https://stackoverflow.com/users/18043699/pratiklad) has it right. As odd as it sounds CR/LF is not supported in data flow in sink. Seems the best I can do, if I am using a data flow, is to out to my storage account, then use copy activity to open the file and re-write it adding the CR/LF (\r\n).
You can concat a '\r' on the last column of your dataset and then it will read \r\n on the text or csv files.

Trouble rendering CSV data as an interactive table in GitHub

When viewed, any .csv file committed to a GitHub repository automatically renders as an interactive table, complete with headers and row numbering. By default, the first row is your header row. The tables were supposed to look nice as below:
However, there's an error happening in my tabular data, and despite indicating the error, I can't fix it:
I'm using a .csv file with a semicolon separator. Does anyone have an idea of what's happening?
According to the docs, Github can only do its lay-out thing with .csv (comma-separated) and .tsv (tab-separated) files.
Using a semicolon as a separator isn't supported, at least not officially, and a spurious comma in a semicolon-separated file could well throw the algorithm off.
You could try replacing all semicolons with tabs and see how you fare.
If that doesn't work, try using commas as separators and enclose all text table cell data with quotes, like:
"Liver fibrosis, sclerosis, and cirrhosis","c370800","102922","Cystic fibrosis related cirrhosis","Diagnosis of liver fibrosis, sclerosis, and cirrhosis"
Note: no spaces after the commas. Also, if you have quotes in the text fields, you will have to escape those to "" (two quotes), or the algorithm will get confused.
You may get away with using quotes only for the offending text data, but that could well be more difficult to generate than just putting the quotes around all fields.

postgresql - pgloader - quotes handling

I am new to postgresql and just starting to use it. I am trying to load a file into a table and facing some issues.
Sample data - the file file1.RPT contains data in the below format
"Bharath"|Kumar|Krishnan
abc"|def|ghi
qwerty|asdfgh|lkjhg
Below is the load script that is used
LOAD CSV
INTO table1
....
WITH truncate,
fields optionally enclosed by '"',
fields escaped by '"'
fields terminated by '|'
....
However, the above script is not working and is not loading any data into the table. I am not sure whats the issue here. My understanding is that first row data has to be successfully loaded (since I have given optionally enclosed by) and the second row also must be loaded (since I am trying to escape the double quote).
Request help in getting the same rectified.
Thank you.
We cannot escape and optionally quote the same character. If the double-quote will be part of the data, then it can be ignored using field not enclosed option. The default option is field optionally enclosed by double-quote.
Apparently, you're not escaping the quote in the second row, because either you must use a backslash (or another quoting character) before:
abc\"|def|ghi
or you should enclose the entire line with quote
another alternative is to accept to have quotes in the first field, then you should use the following:
fields not enclosed
in your load script

Single quotes stored in a Postgres database

I've been working on an Express app that has a form designed to hold lines and quotes.
Some of the lines will have single quotes('), but overall it's able to store the info and I'm able to back it up and store it without any problems. Now, when I want do pg_dump and have the database put into an SQL file, the quotes seem to cause some things to appear a bit wonky in my text editor.
Would I have to create a method to change all the single quotation marks into double, or can I leave it as is and be able to upload it back to the database without causing major issues. I know people will continue to enter in lines that contain either single or double quotations, so I would like to know any solution or answer that would help greatly.
Single quotes in character data types are no problem at all. You just need to escape them properly in string literals.
To write data with INSERT you need to quote all string literals according to SQL syntax rules. There are tools to do that for you ...
Insert text with single quotes in PostgreSQL
However, pg_dump takes care of escaping automatically. The default mode produces text output to be re-imported with COPY (much faster than INSERT), and single quotes have no special meaning there. And in (non-default) csv mode, the default quote character is double-quote (") and configurable. The manual:
QUOTE
Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
The format is defined by rules for COPY and not by SQL syntax rules.

Uploading data to RedShift using COPY

I am trying to upload data to RedShift using COPY command.
On this row:
4072462|10013868|default|2015-10-14 21:23:18.0|0|'A=0
I am getting this error:
Delimited value missing end quote
This is the COPY command:
copy test
from 's3://test/test.gz'
credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx' removequotes escape gzip
First, I hope you know why you are getting the mentioned error: You have a a single quote in one of the column values. While using the removequotes option, Redshift documentation clearly says that:
If a string has a beginning single or double quotation mark but no corresponding ending mark, the COPY command fails to load that row and returns an error.
One thing is certain: removequotes is certainly not what you are looking for.
Second, so what are your options?
If preprocessing the S3 file is in your control, consider using the escape option. Per the documentation,
When this parameter is specified, the backslash character (\) in input data is treated as an escape character.
So your input row in S3 should change to something like:
4072462|10013868|default|2015-10-14 21:23:18.0|0|\'A=0
See if the CSV DELIMITER '|' works for you. Check documentation here.