How to export postgres data containing newlines to CSV without breaking records on several lines - postgresql

I am trying to export data from postgresq to CSV files but when I do have newlines in text in the database, the exported data will be broken on several lines, which makes much harder to read the CSV file, not to say that most applications will fail to load it properly.
Here is how I export the data now:
PRESQL="\pset format unaligned
\pset fieldsep \",\"
\pset footer off
\o 'out.csv'
"
cat <(echo $PRESQL) $QUERYFILE | psql …
Sa far, so good, unless you have newlines in the text fields. Any hack that would allow me to generate a very simple to parse CSV file (with one record per line)?

It was a mistake to consider that a CSV can be forced to have one line per row. The RFC states clear that newlines are to be enclosed in double quotes.

You can try replace() or regexp_replace() function.
The answer to the followinig SO question should give you an idea: How to remove carriage returns and new lines in Postgresql?

Related

Escape \n when exporting CSV using COPY command

I need to export various tables in CSV for AWS Glue Catalog and I just noticed a major showstopper:
COPY command does not escape new line inputs in columns, only quotes them.
What confuses me even more is that I can switch to TEXT and get the format right - escape the special characters - but I cannot have HEADER in that format!
COPY (%s) TO STDOUT DELIMITER ',' NULL ''
Is there a way to get both HEADER and to escape the new line through COPY command?
I'm hoping that it's my overlook as the code is obviously there.
The text format does not produce CSV, that is why you cannot get headers or change the delimiter. It is the “internal” tab-separated format of PostgreSQL.
There are no provisions to replace newlines with \n in a CSV file, and indeed that would produce invalid CSV (according to what most people think; there is no standard).
You'll have to post-process the file.

Copy Command to insert CSV file - Escape Special Characters

I am trying to do a bulk insert into postgres db using copy command from csv file. All the columns in the db table are character_varying(1024) type.The copy command is failing on certain values which are in Double quotes
For example:
"TODD'S JAMES RENO PHCY,INC."
My copy command looks like below:
\copy file_tmp FROM /srv/data0/transfer/data_2.csv USING DELIMITERS ','
Could you please help in how to escape these special characters and get this working?
Although you have specified a delimiter, you have not specified a format, so it is still using "text". In "text" format, thing are escaped by backslashes, not quotes.
Also, 'USING DELIMITERS' is an extremely obsolete syntax.
You probably want something like:
\copy file_tmp FROM /srv/data0/transfer/data_2.csv WITH (FORMAT CSV)
You don't need to specify the delimiter, because it defaults to ',' when using CSV format.
Of course this still might fail on parts of the data you haven't shown us.

Remove some rows with " in front

I have a CSV file that is causing me serious headaches going into Tableau. Some of the rows in the CSV are wrapped in a " " and some not. I would like them all to be imported without this (i.e. ignore it on rows that have it).
Some data:
"1;2;Red;3"
1;2;Green;3
1;2;Blue;3
"1;2;Hello;3"
Do you have any suggestions?
If you have a bash prompt hanging around...
You can use cat to output the file contents so you can make sure you're working with the right data:
cat filename.csv
Then, pipe it through sed so you can visually check that the quotes were delted:
cat filename.csv | sed 's/"// g'
If the output looks good, use the -i flag to edit the file in place:
sed -i 's/"// g' filename.csv
All quotes should now be missing from filename.csv
If your data has quotes in it, and you want to only strip the quotes that appear at the beginning and end of each line, you can use this instead:
sed -i 's/^"\(.*\)"$/\1/' filename.csv
It's not the most elegant way to do it in Tableau but if you cannot remove it in the source file, you could create a calculated field for the first and last column that strips the quotation marks.
right click on the field for the first column choose Create/Calculated Field
Use this formula: INT(REPLACE([FirstColumn],'"',''))
Name the column accordingly
Do the same for the last column
Assuming the data you provided fits the data you work on. The assumption is that these fields are integer field (thus the INT() usage). In case they are string fields you would want to make sure that you don't remove quotation marks that belong to the field value.

Trying to import a CSV file into postgres with comma as delimeter

I am trying to import a CSV file into Posgres that has a comma as delimiter. I do:
\COPY products(title, department) from 'toyd.csv' with (DELIMITER ',');
All super cool.
However, title and department are both strings. I have some commas that are in these columns that I don't want to be interpreted as delimiters. So I pass the strings in quotes. But this doesn't work. Postgres still thinks they are delimiters. What are my missing?
Here is a snippet from the CSV that causes the problem:
"Light","Reading, Writing & Spelling"
Any ideas?
You aren't using CSV format there, just a comma-delimited one.
Tell it you want FORMAT CSV and it should default to quoted text - you could also change the quoting character if necessary.

Ignore quotation marks when importing a CSV file into PostgreSQL?

I'm trying to import a tab-delimited file into my PostgreSQL database. One of the fields in my file is a "title" field, which occasionally contains actual quotation marks. For example, my tsv might look like:
id title
5 Hello/Bleah" Foo
(Yeah, there's just that one quotation mark in the title.)
When I try importing the file into my database:
copy articles from 'articles.tsv' with delimiter E'\t' csv header;
I get this error, referencing that line:
ERROR: unterminated CSV quoted field
How do I fix this? Quotation marks are never used to surround entire fields in the file. I tried copy articles from 'articles.tsv' with delimiter E'\t' escape E'\\' csv header; but I get the same error on the same line.
Assuming the file never actually tries to quote its fields:
The option you want is "with quote", see http://www.postgresql.org/docs/8.2/static/sql-copy.html
Unfortunately, I'm not sure how to turn off quote processing altogether, one kludge would be to specify a character that does not appear in your file at all.
Tab separated is the default format for copy statements. Treating them as CSV is just silly. (do you take this path just to skip the header ?)
copy articles from 'articles.tsv';
does exactly what you want.
I struggled with the same error and a few more. Finally gathering knowledge from few SO questions I came up with the following setup for making COPY TO/FROM successful even for quite sophisticated JSON columns:
COPY "your_schema_name.yor_table_name" (your, column_names, here)
FROM STDIN WITH CSV DELIMITER E'\t' QUOTE '\b' ESCAPE '\';
--here rows data
\.
the most important parts:
QUOTE '\b' - quote with backspace (thanks a lot #grautur!)
DELIMITER E'\t' - delimiter with tabs
ESCAPE '\' - and escape with a backslash