postgresql how to have COPY interpret formatted numeric fields automatically? - postgresql

I have an input CSV file containing something like:
SD-32MM-1001,"100.00",4/11/2012
SD-32MM-1001,"1,000.00",4/12/2012
I was trying to COPY import that into a postgresql table(varchar,float8,date) and ran into an error:
# copy foo from '/tmp/foo.csv' with header csv;
ERROR: invalid input syntax for type double precision: "1,000.00"
Time: 1.251 ms
Aside from preprocessing the input file, is there some setting in PG that will have it read a file like the one above and convert to numeric form in COPY? Something other than COPY?
If preprocessing is required, can it be set as part of the COPY command? (Not the psql \copy)?
Thanks a lot.

The option to pre processing is to first copy to a temporary table as text. From there insert into the definitive table using the to_number function:
select to_number('1,000.00', 'FM000,009.99')::double precision;

It's an odd CSV file that surrounds numeric values with double quotes, but leaves values like SD-32MM-1001 unquoted. In fact, I'm not sure I've ever seen a CSV file like that.
If I were in your shoes, I'd try copy against a file formatted like this.
"SD-32MM-1001",100.00,4/11/2012
"SD-32MM-1001",1000.00,4/12/2012
Note that numbers have no commas. I was able to import that file successfully with
copy test from '/fullpath/test.dat' with csv
I think your best bet is to get better formatted output from your source.

Related

PostgreSQL how to read csv file with decimal comma?

I try to read a csv file containing real numbers with a comma as separator. I try to read this file with \copy in psql:
\copy table FROM 'filename.csv' DELIMITER ';' CSV HEADER;
psql does not recognize the comma as decimal point.
psql:filename.sql:44: ERROR: invalid input syntax for type real: "9669,84"
CONTEXT: COPY filename, line 2, column col-3: "9669,84"
I did some googling but could not find any answer other than "change the decimal comma into a decimal point". I tried SET DECIMALSEPARATORCOMMA=ON; but that did not work. I also experimented with some encoding but I couldn't find whether encoding governs the decimal point (I got the impression it didn't).
Is there really no solution other than changing the input data?
COPY to a table where you insert the number into a varchar field. Then do something like in psql:
--Temporarily change numeric formatting to one that uses ',' as
--decimal separator.
set lc_numeric = "de_DE.UTF-8";
--Below is just an example. In your case the select would be part of
--insert into the target table. Also the first part of to_number
--would be the field from your staging table.
select to_number('9669,84', '99999D999');
9669.84
You might need to change the format string to match all the numbers. For more information on what is available see Data formatting Table 9.28. Template Patterns for Numeric Formatting.

How can I import a jsonb column from a csv file using the COPY command?

I am trying to import the following csv file into YugaByte DB YSQL. Note that the second entry in each row is a JSON object.
"15-06-2018","{\"file_name\": \"myfile1\", \"remote_ip\": \"X.X.X.X\"}"
"15-06-2018","{\"file_name\": \"myfile2\", \"remote_ip\": \"Y.Y.Y.Y\"}"
My table schema is:
postgres=# create table downloads_raw (request_date text, payload jsonb);
I want the JSON snippet in the imported file to become a JSONB value.
I tried doing the following:
postgres=# COPY downloads_raw FROM 'data.csv';
Hitting the following error:
ERROR: 22P04: missing data for column "payload"
CONTEXT: COPY downloads_raw, line 1: ""15-06-2018","{\"file_name\": \"myfile1\", \"remote_ip\": \"X.X.X.X\"}""
LOCATION: NextCopyFrom, copy.c:3443
Time: 2.439 ms
You need to specify FORMAT csv and ESCAPE '\'. Also, the format and escape options need to be enclosed in parenthesis. This should work:
COPY downloads_raw FROM 'data.csv' WITH (FORMAT csv, ESCAPE '\');
List of supported options for COPY command can be found here:
https://docs.yugabyte.com/latest/api/ysql/commands/cmd_copy/

Remove quotes for String in Clickhouse while exporting

I'm trying to export data to csv from clickhouse cli.
I have a field which is string and when exported to CSV this field has quotes around it.
I want to export without the quotes but couldn't find any setting that can be set.
I went through https://clickhouse.yandex/docs/en/interfaces/formats but the Values section mentions
Strings, dates, and dates with times are output in quotes
While for JSON they have a flag that is to be set for removing quotes around Int64 and UInt64
For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
I was wondering if there is such kind of flag for strings in CSV as well.
I'm exporting using the below command
clickhouse client --multiquery --host="localhost" --port="9000" --query="SELECT field1, field2 from tableName format CSV" > /data/content.csv
I want to try removing the quotes from the shell as the last thing if nothing works.
Any help on the way I can remove the quotes while the CSV is generated would be appreciated.
Nope, there isn't. However you can easily achieve this by arrayStringConcat.
SELECT arrayStringConcat([toString(field1), toString(field2)], ',') from tableName format TSV;
Edit
In order to make Nullable output as empty string, you might need if function.
if(isNull(field1), '', assumeNotNull(field1))
This works for any types, while assumeNotNull alone only works for String

how import csv file into Postgres with empty values?

I am trying to import one csv file into Postgres which does contain age values, however there are also some empty values, since not all ages are known.
I would like to import the columns as real, since the columns contain ages with decimals like 98.45. The empty values for people when age is not known is apparently considered as strings, however I still would like to import the ages values as numbers. So I was wondering how to import real values, even when some cells in the csv are empty and thus are considered according to Postgres as string values.
for creation I used the following code, since I am dealing with decimal values.
Create table psychosocial.age (
respnr integer Primary key,
fage real,
gage real,
hage real);
after importing csv file, I get the following error
ERROR: invalid input syntax for integer: "11455, , , "
CONTEXT: COPY age, line 2, column respnr: "11455, , , "
One problem is that you're trying to import white spaces into numeric fields. So, first you have to pre-process your csv file before importing it.
Below is an example of how you can solve it using awk. From your console execute the following command:
$ cat file.csv | awk '{sub(/^ +/,""); gsub(/, /,",")}1' | psql db -c "COPY psychosocial.age FROM STDIN WITH CSV HEADER"
In case you're wondering how to pipe commands, take a look at these answers. Here a more detailed example on how to use COPY and the STDIN.
You also have to take into account that having quotation marks on integer fields can be problematic, e.g:
"11455, , , "
This will result in an error, since postgres will parse "11455 as a single value and will try to store it in an interger field, which will obviously fail. Instead, format your csv file to be like this:
11455, , ,
or even
11455,,,
You can achieve this also using awk from your console:
$ awk '{gsub(/\"/,"")};1' file.csv

Postgresql - import from CSV null values wrapped in double quotes

So I am trying to import some data into postgresql using the COPY command.
Here is a sample of what the data looks like:
"UNIQ_ID","SP_grd1","SACN_grd1","BIOME_grd1","Meso_grd1","DM_grd1","VEG_grd1","lcov90_alb","WMA_grd1"
"G01_00000002","199058001.00000","1.00000","6.00000","24889.00000","2.00000","381.00000","33.00000","9.00000"
"G01_00000008","*********************","1.00000","*********************","24889.00000","2.00000","*********************","34.00000","*********************"
the issue that I am having is the double quotes that are wrapping the ********************* which are the null values.
I am using the following in order to create the data table and copy the data:
CREATE TABLE bravo.G01(UNIQ_ID character varying(18), SP_grd1 double precision ,SACN_grd1 numeric,BIOME_grd1 numeric,Meso_grd1 double precision,DM_grd1 numeric,VEG_grd1 numeric,lcov90_alb numeric,WMA_grd1 numeric);
COPY bravo.g01(UNIQ_ID,SP_grd1,SACN_grd1,BIOME_grd1,Meso_grd1,DM_grd1,VEG_grd1,lcov90_alb,WMA_grd1) FROM 'F:\GreenBook-Backup\LUdatacube_20171206\CSV_Data_bravo\G01.csv' DELIMITER ',' NUll AS '*********************' CSV HEADER ;
the create table command works fine but I encounter an error with the NULL AS statement. If I edit the text file and remove the double quotes then the import works fine.
I assume that as CSVs with double quotes and null values are very common there must be a work around here that I am missing. I certainly don't want to go and edit each of my CSVs so that it doesn't have double quotes!
You might want to try adding FORCE_NULL( column_name [, ...] ) option.
As the documentation stated for FORCE_NULL:
Match the specified columns' values against the null string, even if it has been quoted, and if a match is found set the value to NULL. In the default case where the null string is empty, this converts a quoted empty string into NULL. This option is allowed only in COPY FROM, and only when using CSV format.
The option available from Postgres 9.4: https://www.postgresql.org/docs/10/static/sql-copy.html
If you're on a unix-like platform, you could use sed to replace the null-strings with something postgresql will recognize automatically as null. On windows, powershell exposes similar functionality.
This approach is more general if you need to perform other types of clean up on the data before loading.
The regex pattern to match your null-string is "[\*]*"
cleaning the file with sed:
[unix]>sed 's/"[\*]*"//g' test.csv > test2.csv
cleaning the file with windows powershell:
[windows-powershell]>cat test.csv | %{$_ -replace '"[\*]*"', ""} > test2.csv
loading into postgresql can then be shorter.:
psql>\copy bravo.g01 FROM 'test2.csv' WITH CSV HEADER;