COPY only some columns from an input CSV? - postgresql

I have created a table in my database with name 'con' which has two columns with the name 'date' and 'kgs'. I am trying to extract data from this 'hi.rpt' file copied on this location 'H:Sir\data\reporting\hi.rpt' and want to store values in the table 'con' in my database.
I have tried this code in pgadmin
When I run:
COPY con (date,kgs)
FROM 'H:Sir\data\reporting\hi.rpt'
WITH DELIMITER ','
CSV HEADER
date AS 'Datum/Uhrzeit'
kgs AS 'Summe'
I get the error:
ERROR: syntax error at or near "date"
LINE 5: date AS 'Datum/Uhrzeit'
^
********** Error **********
ERROR: syntax error at or near "date"
SQL state: 42601
Character: 113
"hi.rpt" file from which i am reading the data look like this:
Datum/Uhrzeit,Sta.,Bez.,Unit,TBId,Batch,OrderNr,Mat1,Total1,Mat2,Total2,Mat3,Total3,Mat4,Total4,Mat5,Total5,Mat6,Total6,Summe
41521.512369(04.09.13 12:17:48),TB01,TB01,005,300,9553,,2,27010.47,0,0.00,0,0.00,3,1749.19,0,0.00,0,0.00,28759.66
41521.547592(04.09.13 13:08:31),TB01,TB01,005,300,9570,,2,27057.32,0,0.00,0,0.00,3,1753.34,0,0.00,0,0.00,28810.66
Is it possible to extract only two data values from 20 different type of data that i have in this 'hi.rpt' file or not?
or is there only a mistake in the syntax that i have written?
What is the correct way to write it?

I don't know where you got that syntax, but COPY doesn't take a list of column aliases like that. See the help:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
(AS isn't one of the listed options; to see the full output run \d copy in psql, or look at the manual for the copy command online).
There is no mapping facility in COPY that lets you read only some columns of the input CSV. It'd be really useful, but nobody's had the time/interest/funding to implement it yet. It's really only one of many data transform/filtering tasks people want anyway.
PostgreSQL expects the column-list given in COPY to be in the same order, left-to-right, as what's in the CSV file, and have the same number of entries as the CSV file has columns. So if you write:
COPY con (date,kgs)
then PostgreSQL will expect an input CSV with exactly two columns. It'll use the first csv column for the "date" table column and the second csv column for the "kgs" table column. It doesn't care what the CSV headers are, they're ignored if you specify WITH (FORMAT CSV, HEADER ON), or treated as normal data rows if you don't specify HEADER.
PostgreSQL 9.4 adds FROM PROGRAM to COPY, so you could run a shell command to read the file and filter it. A simple Python or Perl script would do the job.
If it's a small file, just open a copy in the spreadsheet of your choice as a csv file, delete the unwanted columns, and save it, so only the date and kgs columns remain.
Alternately, COPY to a staging table that has all the same columns as the CSV, then do an INSERT INTO ... SELECT to transfer just the wanted data into the real target table.

Related

Postgres: INVALID input syntax for type numeric: "2021-02-14" ... but it's in datetime format?

I'm very confused about this error I'm getting in my Query Tool in PgAdmin. I've been working on this for days, and cannot find a solution to fixing this error when attempting to upload this csv file to my Postgres table.
ERROR: invalid input syntax for type numeric: "2021-02-14"
CONTEXT: COPY CardData, line 2, column sold_price: "2021-02-14"
SQL state: 22P02
Here is my code in the Query Tool that I am running
CREATE TABLE Public."CardData"(Title text, Sold_Price decimal, Bids int, Sold_Date date, Card_Link text, Image_Link text)
select * from Public."CardData"
COPY Public."CardData" FROM 'W:\Python_Projects\cardscrapper_project\ebay_api\card_data_test.csv' DELIMITER ',' CSV HEADER ;
Here is a sample from the first row of my csv file.
Title,Sold_Date,Sold_Price,Bids,Card_Link,Image_Link
2018 Contenders Optic Sam Darnold #103 Red Rookie #/99 PSA 8 NM-MT AUTO 10,2021-02-14,104.5,26,https://www.ebay.com/itm/2018-Contenders-Optic-Sam-Darnold-103-Red-Rookie-99-PSA-8-NM-MT-AUTO-10/143935698791?hash=item21833c7767%3Ag%3AjewAAOSwNb9gGEvi&LH_Auction=1,https://i.ebayimg.com/thumbs/images/g/jewAAOSwNb9gGEvi/s-l225.jpg
The "Sold_Date" column is in the correct datetime format that is easy for Postgres to understand, but the error is calling on the "Sold-Price" column?
I'm very confused. Any help is greatly appreciated.
Notice that the columns are not in the same order in the csv file and in the table.
You would have to specify the proper column order
COPY Public."CardData" (Title,Sold_Date,Sold_Price,Bids,Card_Link,Image_Link)
FROM 'W:\Python_Projects\cardscrapper_project\ebay_api\card_data_test.csv'
DELIMITER ',' CSV HEADER ;
You have created the table with sold_price as the second column, so the COPY command will expect a price/number to be the second column in your CSV file. Your CSV file, however has sold_date as the second column, which will lead to the data type mismatch error that you see.
Either you can re-define your CREATE TABLE statement with the sold_date as second column and sold_price as 4th column, or you can specify the column parsing order in your COPY statement as COPY public."CardData" (<column order>)
Another option is to open up the CSV file in Excel and re-order the columns and do a Save As...

copy columns of a csv file into postgresql table

I have a CSV file with 12 - 11 - 10 or 5 columns.
After creating a PostgreSQL table with 12 columns, I want to copy this CSV into the table.
I use this request:
COPY absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
FROM 'C:\temp\absence\absence.csv'
DELIMITER '\'
CSV
My CSV file contains 80000 line.
Ex :
20\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\21/02/2020\1471\03\54\Matin
21\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\\8130\7H48\Formation avec repas\
30\05\ 191\MARKEY CLAUDIE\GA0\51430\39H00\\167H42\
22\9993\Temps de déplacement\98\37
when I execute the request, I get a message indicating that there is missing data for the lines with less than 12 fields.
Is there a trick?
copy is extremely fast and efficient, but less flexible because of that. Specifically it can't cope with files that have a different number of "columns" for each line.
You can either use a different import tool, or if you want to stick to built-in tools, copy the file into staging table that only has a single column, then use Postgres string functions to split the lines into the columns:
create unlogged table absence_import
(
line text
);
\COPY absence_import(line) FROM 'C:\temp\absence\absence.csv' DELIMITER E'\b' CSV
E'\b' is the "backspace" character which can't really appear in a text file, so no column splitting is taking place.
Once you have imported the file, you can split the line using string_to_array() and the insert that into the real table:
insert into absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
select line[1], line[2], line[3], .....
from (
select string_to_array(line, '\') as line
from absence_import
) t;
If there are non-text columns, might need to cast the values to the target data type explicitly: e.g. line[3]::int.
You can add additional expressions to deal with missing columns, e.g. something like: coalesce(line[10], 'default value')

Why does my postgresql csv export have more rows than the table?

I am trying to copy a table in a postgresql database (version 10.12) via psql. One of the rows contains strings representing xml data. When I query the database for a row count with this query I get a count of about 50,000:
select count(column) from table;
But when I try to export the data to a csv file the output has more than 1,000,000 rows! I don't understand how a csv export could have a different number of rows than the table!
This is the copy command:
\copy (select column from table) to 'directory/output.csv' with csv;
It doesn't seem to matter if I change the delimiter or quote either. I've tried using | as a delimiter and ` as a quote and the number of rows in the csv was the same. Why is the row count different in the csv export?
The row count is not different: the CSV output simply has linefeeds (LF, ASCII code 10) embedded in fields, which is expected in XML.
If you want one line per row with COPY, don't use CSV, use the text format, that is, just omit with csv. Then newlines are encoded with \n instead of being output verbatim.

Which delimiter to use when loading CSV data into Postgres?

I've come across a problem with loading some CSV files into my Postgres tables. I have data that looks like this:
ID,IS_ALIVE,BODY_TEXT
123,true,Hi Joe, I am looking for a new vehicle, can you help me out?
Now, the problem here is that the text in what is supposed to be the BODY_TEXT column is unstructured email data and can contain any sort of characters, and when I run the following COPY command it's failing because there are multiple , characters within the BODY_TEXT.
COPY sent from ('my_file.csv') DELIMITER ',' CSV;
How can I resolve this so that everything in the BODY_TEXT column gets loaded as-is without the load command potentially using characters within it as separators?
Additionally to the fixing the source file format you can do it by PostgreSQL itself.
Load all lines from file to temporary table:
create temporary table t (x text);
copy t from 'foo.csv';
Then you can to split each string using regexp like:
select regexp_matches(x, '^([0-9]+),(true|false),(.*)$') from t;
regexp_matches
---------------------------------------------------------------------------
{123,true,"Hi Joe, I am looking for a new vehicle, can you help me out?"}
{456,false,"Hello, honey, there is what I want to ask you."}
(2 rows)
You can use this query to load data to your destination table:
insert into sent(id, is_alive, body_text)
select x[1], x[2], x[3]
from (
select regexp_matches(x, '^([0-9]+),(true|false),(.*)$') as x
from t) t

Data correction exporting CSV file to Postgres

I am importing a csv file into postgres, and would like to know how to import the correct data type while using the COPY command. For instance, I have a column column_1 integer; and want to insert the value 6 into it from my csv file.
I run the command copy "Table" from 'path/to/csv' DELIMITERS ',' CSV; and every time I try to do this I get the error ERROR: invalid input syntax for integer: "column_1". I figured out that it's because it is automatically importing every piece of data from the csv file as a string or text. If I change the column type to text then it works successfully, but this defeats the purpose of using a number as I need it for various calculations. Is there a way to conserve the data type when transferring? Is there something I need to change in the csv file? Or is there another datatype to assign to column_1? Hope this makes sense. Thanks in advance!
I did this and it worked flawlessly:
I put the plain number in the stack.csv
(The stack.csv has only one value 6)
# create table stack(i int);
# \copy stack from 'stack.csv' with (format csv);
I read in your comment that you have 25 columns in your CSV file. You need to have at least 25 columns in your table. All columns need to be mapped from CSV. If you have more than 25 columns in table you need the map only the columns mapped from CSV.
That's why it works at a text field because all data is put in one row cell.
If you have more columns that "fields" in your CSV file than the format is like this
\copy stack(column1, column2, ..., column25) from 'stack.csv' with (format csv);