Postgres: Error when using COPY from a CSV with timestamptz type - postgresql

I am using Postgres 9.5.3(On Ubuntu 16.04) and I have a table with some timestamptz fields
...
datetime_received timestamptz NULL,
datetime_manufactured timestamptz NULL,
...
I used the following SQL command to generate CSV file:
COPY (select * from tmp_table limit 100000) TO '/tmp/aa.csv' DELIMITER ';' CSV HEADER;
and used:
COPY tmp_table FROM '/tmp/aa.csv' DELIMITER ';' CSV ENCODING 'UTF-8';
to import into the table.
The example of rows in the CSV file:
CM0030;;INV_AVAILABLE;2016-07-30 14:50:42.141+07;;2016-08-06 00:00:000+07;FAHCM00001;;123;;;;;1.000000;1.000000;;;;;;;;80000.000000;;;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;
But I encounter the following error when running the second command:
ERROR: invalid input syntax for type timestamp with time zone: "datetime_received"
CONTEXT: COPY inventory_item, line 1, column datetime_received: "datetime_received"
My database's timezone is:
show timezone;
TimeZone
-----------
localtime(GMT+7)
(1 row)
Is there any missing step or wrong configuration?
Any suggestions are appreciated!

The error you're seeing means that Postgres is trying (and failing) to convert the string 'datetime_received' to a timestamp value.
This is happening because COPY is trying to insert the header row into your table. You need to include a HEADER clause on the COPY FROM command, just like you did for the COPY TO.
More generally, when using COPY to move data around, you should make sure that the TO and FROM commands are using exactly the same options. Specifying ENCODING for one command and not the other can lead to errors, or silently corrupt data, if your client encoding is not UTF8.

Related

csv_expert of Postgres hook in Airflow does not write anything to table without errors

I want to write a CSV file to a table in Postgres via Airflow.
I came across this Airflow documentation denoting that the hook already has a builtin function for CSV export.
And used this thread on how to use it.
I have a python operator whose python_callable is as follows:
def copy_expert_csv():
hook = PostgresHook(postgres_conn_id='warehouse',host='data-warehouse',
database='datalake',
user='root',
password='root',
port=9999)
with hook.get_conn() as connection:
hook.copy_expert("""COPY datalake.public.wcc_users FROM stdin WITH CSV HEADER
DELIMITER as ',' """,
'includes/cleaned_data/wwc/' + str(date.today()) + '_wwc_cleaned ')
connection.commit()
The task finishes successfully as shown in the image.
:
And there is no error logs on my database either:
materials-data-warehouse-1 | 2022-04-29 17:43:01.942 UTC [198] STATEMENT: COPY datalake.public.wcc_users FROM STDIN WITH (FORMAT CSV) HEADER
My file has around 1000 rows. However, when I select from my table, there are 0 rows inserted.
The column naming in table are different from the file and also 2 columns have date and timestamp datatypes rather than text. Can it be the cause? Then why no errors are thrown?
Seems that table definition was incorrect. This throws no errors but inserts nothing either.

Postgres: INVALID input syntax for type numeric: "2021-02-14" ... but it's in datetime format?

I'm very confused about this error I'm getting in my Query Tool in PgAdmin. I've been working on this for days, and cannot find a solution to fixing this error when attempting to upload this csv file to my Postgres table.
ERROR: invalid input syntax for type numeric: "2021-02-14"
CONTEXT: COPY CardData, line 2, column sold_price: "2021-02-14"
SQL state: 22P02
Here is my code in the Query Tool that I am running
CREATE TABLE Public."CardData"(Title text, Sold_Price decimal, Bids int, Sold_Date date, Card_Link text, Image_Link text)
select * from Public."CardData"
COPY Public."CardData" FROM 'W:\Python_Projects\cardscrapper_project\ebay_api\card_data_test.csv' DELIMITER ',' CSV HEADER ;
Here is a sample from the first row of my csv file.
Title,Sold_Date,Sold_Price,Bids,Card_Link,Image_Link
2018 Contenders Optic Sam Darnold #103 Red Rookie #/99 PSA 8 NM-MT AUTO 10,2021-02-14,104.5,26,https://www.ebay.com/itm/2018-Contenders-Optic-Sam-Darnold-103-Red-Rookie-99-PSA-8-NM-MT-AUTO-10/143935698791?hash=item21833c7767%3Ag%3AjewAAOSwNb9gGEvi&LH_Auction=1,https://i.ebayimg.com/thumbs/images/g/jewAAOSwNb9gGEvi/s-l225.jpg
The "Sold_Date" column is in the correct datetime format that is easy for Postgres to understand, but the error is calling on the "Sold-Price" column?
I'm very confused. Any help is greatly appreciated.
Notice that the columns are not in the same order in the csv file and in the table.
You would have to specify the proper column order
COPY Public."CardData" (Title,Sold_Date,Sold_Price,Bids,Card_Link,Image_Link)
FROM 'W:\Python_Projects\cardscrapper_project\ebay_api\card_data_test.csv'
DELIMITER ',' CSV HEADER ;
You have created the table with sold_price as the second column, so the COPY command will expect a price/number to be the second column in your CSV file. Your CSV file, however has sold_date as the second column, which will lead to the data type mismatch error that you see.
Either you can re-define your CREATE TABLE statement with the sold_date as second column and sold_price as 4th column, or you can specify the column parsing order in your COPY statement as COPY public."CardData" (<column order>)
Another option is to open up the CSV file in Excel and re-order the columns and do a Save As...

stl_load_errors returning invalid timestamp format I can't figure out

I'm trying to use the copy function to create a table in Redshift. I've setup this particular field that keeps failing in my schema as a standard timestamp because I don't know why it would be anything otherwise. But when I run this statement:
copy sample_table
from 's3://aws-bucket/data_push_2018-10-05.txt'
credentials 'aws_access_key_id=XXXXXXXXXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/XXX'
dateformat 'auto'
ignoreheader 1;
It keeps returning this error: Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SS]
raw_field_value: "2018-08-29 15:04:52"
raw_line: 12039752|311525|"67daf211abbe11e8b0010a28385dd2bc"|98953|"2018-08-20"|"2018-11-30"|"active"|"risk"|||||||"sample"|15750|0|"2018-08-29 15:04:52"|"2018-08-29 16:05:01"
There is a very similar table in our database (that I did not make) which has the aforementioned error value as timestamp and values for that field identical to 2018-08-29 15:04:52 so what's happening when I run it that's causing the issue?
Your copy command seems OK, and seems like you are missing FORMAT as CSV QUOTE AS '"' AND DELIMITER AS '|' parameters and It should work.
I'm here using some sample data and command to prove my case, to make it simple, I did made the table simple and covered all your data points though.
create table sample_table(
salesid integer not null,
category varchar(100),
created_at timestamp,
update_at timestamp );
Here goes your sample data test_file.csv,
12039752|"67daf211abbe11e8b0010a28385dd2bc"|"2018-08-29 11:04:52"|"2018-08-29 14:05:01"
12039754|"67daf211abbe11e8b0010a2838cccddbc"|"2018-08-29 15:04:52"|"2018-08-29 16:05:01"
12039755|"67daf211abbe11e8b0010a28385ff2bc"|"2018-08-29 12:04:52"|"2018-08-29 13:05:01"
12039756|"67daf211abbe11e8b0010a28385bb2bc |"2018-08-29 10:04:52"|"2018-08-29 15:05:01"
Here goes your copy command,
COPY sample_table FROM 's3://path/to/csv/test_file.csv' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXX;aws_secret_access_key=XXXXXXXXX' FORMAT as CSV QUOTE AS '"' DELIMITER AS '|';
It will returns,
INFO: Load into table 'sample_table' completed, 4 record(s) loaded successfully.
COPY
Though this command works fine, but if there are more issues with your data you could try MAXERROR option as well.
Hope it answers your question.

SQL server Openquery equivalent to PostgresQL

Is there query equivalent to sql server's openquery or openrowset to use in postgresql to query from excel or csv ?
You can use PostgreSQL's COPY
As per doc:
COPY moves data between PostgreSQL tables and standard file-system
files. COPY TO copies the contents of a table to a file, while COPY
FROM copies data from a file to a table (appending the data to
whatever is in the table already). COPY TO can also copy the results
of a SELECT query
COPY works like this:
Importing a table from CSV
Assuming you already have a table in place with the right columns, the command is as follows
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV;
Exporting a CSV from a table.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV;
Its important to mention here that generally if your data is in unicode or need strict Encoding, then Always set client_encoding before running any of the above mentioned commands.
To set CLIENT_ENCODING parameter in PostgreSQL
set client_encoding to 'UTF8'
or
set client_encoding to 'latin1'
Another thing to guard against is nulls, while exporting , if some fields are null then PostgreSQL will add '/N' to represent a null field, this is fine but may cause issues if you are trying to import that data in say SQL server.
A quick fix is modify the export command by specifying what would you prefer as a null placeholder in exported CSV
COPY (select * from tblemployee ) TO '~/exp_tblemployee.csv' DELIMITERS ',' NULL as E'';
Another common requirement is import or export with the header.
Import CSV to table with Header for columns present in first row of csv file.
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV HEADER
Export a table to CSV with Headers present in the first row.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV HEADER

Data correction exporting CSV file to Postgres

I am importing a csv file into postgres, and would like to know how to import the correct data type while using the COPY command. For instance, I have a column column_1 integer; and want to insert the value 6 into it from my csv file.
I run the command copy "Table" from 'path/to/csv' DELIMITERS ',' CSV; and every time I try to do this I get the error ERROR: invalid input syntax for integer: "column_1". I figured out that it's because it is automatically importing every piece of data from the csv file as a string or text. If I change the column type to text then it works successfully, but this defeats the purpose of using a number as I need it for various calculations. Is there a way to conserve the data type when transferring? Is there something I need to change in the csv file? Or is there another datatype to assign to column_1? Hope this makes sense. Thanks in advance!
I did this and it worked flawlessly:
I put the plain number in the stack.csv
(The stack.csv has only one value 6)
# create table stack(i int);
# \copy stack from 'stack.csv' with (format csv);
I read in your comment that you have 25 columns in your CSV file. You need to have at least 25 columns in your table. All columns need to be mapped from CSV. If you have more than 25 columns in table you need the map only the columns mapped from CSV.
That's why it works at a text field because all data is put in one row cell.
If you have more columns that "fields" in your CSV file than the format is like this
\copy stack(column1, column2, ..., column25) from 'stack.csv' with (format csv);