Copy CSV with fewer columns than PostgreSQL table? - postgresql

Let's say I have a table like this following:
CREATE TABLE table_a
(
cola text,
colb text,
colc text,
cold text,
cole text
)
Currently, I am loading the table with the following:
\copy table_a from PATH/TO/CSV/CSV_FILE.csv DELIMITER ',' CSV HEADER;
where CSV_FILE.csv also has all 5 columns: cola, colb, colc, cold, cole.
But what if I have CSV_FILE_2.csv that only has cola, colb, colc?
I want to be able to do something like:
\copy table_a from PATH/TO/CSV/CSV_FILE_2.csv DELIMITER ',' CSV HEADER;
to insert new rows from CSV_FILE_2.csv but leaves cold and cole null.
But when I try to do the above, I get a
ERROR: missing data for column "cold"
Is there an efficient way to use the copy command to just add the new rows from CSV_FILE_2.csv?
One workaround I thought of is insert the rows into a temporary table, insert the rows from the temporary table into table_a and then delete the temporary table, but that seems cumbersome.

The manual page for the \copy command only gives a brief overview of the syntax, and doesn't separate the "copy from" and "copy to" options, but does say this:
The syntax of this command is similar to that of the SQL COPY command. All options other than the data source/destination are as specified for COPY.
If we look up the manual page for COPY instead, we see a clearer synopsis of the COPY FROM version:
COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | PROGRAM 'command' | STDIN } [ [ WITH ] ( option [, ...] ) ] [ WHERE condition ]
Notably, this includes a column list, which is explained below:
For COPY FROM, each field in the file is inserted, in order, into the specified column. Table columns not specified in the COPY FROM column list will receive their default values.
This sounds like exactly what you're looking for (the default value for a nullable column being null, if not otherwise specified). So in your example, you would run:
\copy table_a (cola, colb, colc) from 'PATH/TO/CSV/CSV_FILE_2.csv' DELIMITER ',' CSV HEADER;

Related

Discarding rows containing empty string in CSV from uploading through SQL Loader control file

I am trying to upload a CSV which may/may not contain empty value for a column in a row.
I want to discard the rows that contain empty value from uploading to the DB through SQL Loader.
How can this be handled in ctrl file:
I have tried below conditions in the ctl file :
when String_Value is not null
when String_Value <> ''
but the rows are still getting inserted
This worked for me using either '<>' or '!='. I suspect the order of the clauses was incorrect for you. Note colc (also the third column in the data file) matches the column name in the table.
load data
infile 'c:\temp\x_test.dat'
TRUNCATE
into table x_test
when colc <> ''
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
cola char,
colb char,
colc char,
cold integer external
)

COPY column order

I'm trying to use COPY with HEADER option but my header line in file is in different order than the column order specified in database.
Is the column name order necessary in my file ??
My code is as below:
COPY table_name (
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'schema_name'
AND table_name = 'table_name'
)
FROM 'file.csv'
WITH DELIMITER ',' CSV HEADER;
My database table has got a different order from file.csv and i wanted to select the table order and copy data from csv to table.
You can't issue an SQL query in copy from. You can only list the columns.
If the CSV columns are in the b, a, c order then list that in the copy from command:
copy target_table (b, a, c)
from file.csv
with (delimiter ',', format csv, header)
Assuming the order of the columns we need is the one of the table from which we are copying the results, the next logical step would be to simulate a sub-query through a Bash script.
psql schema_origin -c 'COPY table_origin TO stdout' | \
psql schema_destination -c \
"$(echo 'COPY table_destination (' \
$(psql schema_origin -t -c "select string_agg(column_name, ',') \
from information_schema.columns where table_name = 'table_origin'") \
') FROM stdin')"
StackOverflow answer on COPY command
StackExchange answer on fetching column names
StackOverflow answer on fetching results as tuples
I came up with the following setup for making COPY TO/FROM successful even for quite sophisticated JSON columns:
COPY "your_schema_name.yor_table_name" (
SELECT string_agg(
quote_ident(column_name),
','
) FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'yuour_table_name'
AND TABLE_SCHEMA = 'your_schema_name'
) FROM STDIN WITH CSV DELIMITER E'\t' QUOTE '\b' ESCAPE '\';
--here rows data
\.
the most important parts:
be explicit in filtering from information_schema.columns and user also the table_schema. Otherwise, you may end up with unexpected columns when one table name occurs in multiple schemas.
use quote_ident to make sure your command does not crash if someone made poor naming of table columns using Postgres registred keywords like user or unique. Thanks to quote_ident you will get them wrapped in double-quotes what makes them safe for importing.
I also found the following setup:
QUOTE '\b' - quote with backspace
DELIMITER E'\t' - delimiter with tabs
ESCAPE '\' - and escape with a backslash
for making both COPY to and from most reliable also for dealing with sophisticated/nested JSON columns.

Importing variable number of columns into SQLite database

I have a list of synonyms in a csv file format: word,meaning1,meaning2,meaning3....
Different words have different number of synonyms which means that rows are likely to have a variable number of columns. I am trying to import the csv file into an sqlite database like so:
sqlite3 synonyms
sqlite> create table list(word text, meaning0 text, meaning1 text, meaning2 text, meaning3 text, meaning4 text, meaning5 text, meaning6 text, meaning7 text, meaning8 text, meaning9 text);
sqlite> .mode list
sqlite> .separator ,
sqlite> .import ./csv/synonyms.csv list
To be on the safe side, I assumed a max. number of 10 columns to each word. For those words with less than 10 synonyms, the other columns should be null. The error I get on executing the import command is:
Error: ./csv/synonyms.csv line 1: expected 11 columns of data but found 3
My question(s):
1. In case the number of columns is less than 10, how can I tell SQLite to substitute it with null?
2. Is there some way of specifying that I want 10 columns after word instead of typing it automatically?
You can do following:
Import all data into single column;
Update table splitting column contents into other columns.
Sample:
-- Create a table with only one column;
CREATE TABLE table_name(first);
-- Choose a separator which doesn't exist within file
.separator ~
-- Import data
.import file.csv table_name
-- Add another column to split data
ALTER TABLE table_name ADD COLUMN second;
-- Split data between first and second column
UPDATE table_name SET first=SUBSTR(first, 1, INSTR(first, ",")-1), second=SUBSTR(first, INSTR(first, ",")+1) WHERE INSTR(first, ",")>0;
-- Repeat to next column
ALTER TABLE table_name ADD COLUMN third;
-- Split data between second and third column
UPDATE table_name SET second=SUBSTR(second, 1, INSTR(second, ",")-1), third=SUBSTR(second, INSTR(second, ",")+1) WHERE INSTR(second, ",")>0;
-- And so on...
ALTER TABLE table_name ADD COLUMN fourth;
UPDATE table_name SET third=SUBSTR(third, 1, INSTR(third, ",")-1), fourth=SUBSTR(third, INSTR(third, ",")+1) WHERE INSTR(third, ",")>0;
-- Many times as needed...
Not being an optimal method, sqlite performance should render it enough fast.

COPY command: copy only specific columns from csv

I had a question surrounding the COPY command in PostgreSQL. I have a CSV file that I only want to copy some of the columns values into my PostgreSQL table.
Is it possible to do this? I am familiar with using the COPY command to copy all of the data from a CSV into a table using the header to map to the column names but how is this possible when I only want some of the columns?
Either pre-process the CSV file, or (what I probably would do) import into a temporary copy of the target table and INSERT only selected columns in a second step:
CREATE TEMP TABLE tmp AS SELECT * FROM target_table LIMIT 0;
ALTER TABLE tmp ADD COLUMN etra_column1 text
, ADD COLUMN etra_column2 text; -- add excess columns
COPY tmp FROM '/path/tp/file.csv';
INSERT INTO target_table (col1, col2, col3)
SELECT col1, col2, col3 FROM tmp -- only reelvant columns
WHERE ... -- optional, to also filter rows
A temporary table is dropped automatically at the end of the session. If the processing takes longer, use a regular table.
COPY target_table FROM PROGRAM 'cut -f1,2,3 -d, /path/tp/file.csv';

Dump file from Sqlite3 to PostgreSQL: why do I always get errors when import it?

I have many tables in Sqlite3 db and now I want to export it to PostgreSQL, but all the time I get errors.
I've used different techniques to dump from sqlite:
.mode csv
.header on
.out ddd.sql
select * from my_table
and
.mode insert
.out ddd.sql
select * from my_table
And when I try to import it through phppgadmin I get errors like this:
ERROR: column "1" of relation "my_table" does not exist
LINE 1: INSERT INTO "public"."my_table" ("1", "1", "Vitas", "a#i.ua", "..
How to avoid this error?
Thanks in advance!
Rant
You get this "column ... does not exist" error with INSERT INTO "public"."my_table" ("1", ... - because quotes around the "1" mean this is an identifier, not literal.
Even if you fix this, the query still will give error, because of missing VAULES keyword, as Jan noticed in other answer.
The correct form would be:
INSERT INTO "public"."my_table" VALUES ('1', ...
If this SQL was autogenerated by sqlite, bad for sqlite.
This great chapter about SQL syntax is only about 20 pages in print. My advice to whoever generated this INSERT, is: read it :-) it will pay off.
Real solution
Now, to the point... To transfer table from sqlite to postgres, you should use COPY because it's way faster than INSERT.
Use CSV format as it's understood on both sides.
In sqlite3:
create table tbl1(one varchar(20), two smallint);
insert into tbl1 values('hello',10);
insert into tbl1 values('with,comma', 20);
insert into tbl1 values('with "quotes"', 30);
insert into tbl1 values('with
enter', 40);
.mode csv
.header on
.out tbl1.csv
select * from tbl1;
In PostgreSQL (psql client):
create table tbl1(one varchar(20), two smallint);
\copy tbl1 from 'tbl1.csv' with csv header delimiter ','
select * from tbl1;
See http://wiki.postgresql.org/wiki/COPY.
Seems there is missing "VALUES" keyword:
INSERT INTO "public"."my_table" VALUES (...)
But! - You have to insert values with appropriate quotes - single quotes for text and without quotes for numbers.