COPY command: copy only specific columns from csv - postgresql

I had a question surrounding the COPY command in PostgreSQL. I have a CSV file that I only want to copy some of the columns values into my PostgreSQL table.
Is it possible to do this? I am familiar with using the COPY command to copy all of the data from a CSV into a table using the header to map to the column names but how is this possible when I only want some of the columns?

Either pre-process the CSV file, or (what I probably would do) import into a temporary copy of the target table and INSERT only selected columns in a second step:
CREATE TEMP TABLE tmp AS SELECT * FROM target_table LIMIT 0;
ALTER TABLE tmp ADD COLUMN etra_column1 text
, ADD COLUMN etra_column2 text; -- add excess columns
COPY tmp FROM '/path/tp/file.csv';
INSERT INTO target_table (col1, col2, col3)
SELECT col1, col2, col3 FROM tmp -- only reelvant columns
WHERE ... -- optional, to also filter rows
A temporary table is dropped automatically at the end of the session. If the processing takes longer, use a regular table.

COPY target_table FROM PROGRAM 'cut -f1,2,3 -d, /path/tp/file.csv';

Related

Copy CSV with fewer columns than PostgreSQL table?

Let's say I have a table like this following:
CREATE TABLE table_a
(
cola text,
colb text,
colc text,
cold text,
cole text
)
Currently, I am loading the table with the following:
\copy table_a from PATH/TO/CSV/CSV_FILE.csv DELIMITER ',' CSV HEADER;
where CSV_FILE.csv also has all 5 columns: cola, colb, colc, cold, cole.
But what if I have CSV_FILE_2.csv that only has cola, colb, colc?
I want to be able to do something like:
\copy table_a from PATH/TO/CSV/CSV_FILE_2.csv DELIMITER ',' CSV HEADER;
to insert new rows from CSV_FILE_2.csv but leaves cold and cole null.
But when I try to do the above, I get a
ERROR: missing data for column "cold"
Is there an efficient way to use the copy command to just add the new rows from CSV_FILE_2.csv?
One workaround I thought of is insert the rows into a temporary table, insert the rows from the temporary table into table_a and then delete the temporary table, but that seems cumbersome.
The manual page for the \copy command only gives a brief overview of the syntax, and doesn't separate the "copy from" and "copy to" options, but does say this:
The syntax of this command is similar to that of the SQL COPY command. All options other than the data source/destination are as specified for COPY.
If we look up the manual page for COPY instead, we see a clearer synopsis of the COPY FROM version:
COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | PROGRAM 'command' | STDIN } [ [ WITH ] ( option [, ...] ) ] [ WHERE condition ]
Notably, this includes a column list, which is explained below:
For COPY FROM, each field in the file is inserted, in order, into the specified column. Table columns not specified in the COPY FROM column list will receive their default values.
This sounds like exactly what you're looking for (the default value for a nullable column being null, if not otherwise specified). So in your example, you would run:
\copy table_a (cola, colb, colc) from 'PATH/TO/CSV/CSV_FILE_2.csv' DELIMITER ',' CSV HEADER;

How does SELECT INTO works with SAS

I'm new with SAS and I try to copy my Code from Access vba into SAS.
In Access I use often the SELECT INTO funtion, but it seems to me this function is not in SAS.
I have two tables and I get each day new data and I want to update my table with the new lines. Now I Need to check if some new lines appear -> if yes insert this lines into the old table.
I tried some Code from stackoverflow and other stuff from Google, but I didn't find something which works.
INSERT INTO OLD_TABLE T
VALUES (GRVID = VTGONR)
FROM NEW_TABLE V
WHERE not exists (SELECT V.VTGONR FROM NEW_TABLE V WHERE T.GRVID = V.VTGONR);
Not sure what the purpose of using the VALUES keyword is in your example. PROC SQL uses VALUES() to list static values. Like:
VALUES (100)
SAS just uses normal SQL syntax instead. See for example: https://www.techonthenet.com/sql/insert.php
To specify the observations to insert just use SELECT. You can add a WHERE clause as part of the select to limit the rows that you select to insert. To tell INSERT which columns to insert into list them inside () after the table name. Otherwise it will expect the order that the columns are listed in the select statement to match the order of the columns in the target table.
insert into old_table(GRVID)
select VTGONR from new_table
where VTGONR not in (select GRVID from old_table)
;

How can I remove extra characters from a column?

I have a table with Customer/Phone/City/State/Zip/etc..
Occasionally, I'll be importing the info from a .csv file, and sometimes the zipcode is formatted like this: xxxxx-xxxx and I only need it to be a general, 5 digit zip code.
How can I delete the last 5 characters without having to do it from Excel, cell by cell (which is what I'm doing now)?
Thanks
EDIT: This is what I used after Craig's suggestion and it worked. However, some of the zip entries are canadian zipcodes and often time they are formated x1x-x2x. Running this deletes the last character in the field.
How could I remedy this?
You'll need to do one of these 3 ideas:
use an ETL tool to filter the data during insert;
COPY into a TEMPORARY or UNLOGGED table then do an INSERT INTO real_table SELECT ... that transforms the data with a suitable substring(...) call; or
Write a simple Perl/Python/whatever script that reads the csv, transforms it as desired, and inserts the results into PostgreSQL. I'd use Python with the csv module and psycopg2's copy_from.
Such an insert into ... select might look like:
INSERT INTO real_table(col1, col2, zip)
SELECT
col1,
col2,
substring(zip from 1 for 5)
FROM temp_table;

Create a temporary table from a selection or insert if table already exist

How to create a temporary table, if it does not already exist, and add the selected rows to it?
CREATE TABLE AS
is the simplest and fastest way:
CREATE TEMP TABLE tbl AS
SELECT * FROM tbl WHERE ... ;
Do not use SELECT INTO. See:
Combine two tables into a new one so that select rows from the other one are ignored
Not sure whether table already exists
CREATE TABLE IF NOT EXISTS ... was introduced in version Postgres 9.1.
For older versions, use the function provided in this related answer:
PostgreSQL create table if not exists
Then:
INSERT INTO tbl (col1, col2, ...)
SELECT col1, col2, ...
Chances are, something is going wrong in your code if the temp table already exists. Make sure you don't duplicate data in the table or something. Or consider the following paragraph ...
Unique names
Temporary tables are only visible within your current session (not to be confused with transaction!). So the table name cannot conflict with other sessions. If you need unique names within your session, you could use dynamic SQL and utilize a SEQUENCE:
Create once:
CREATE SEQUENCE tablename_helper_seq;
You could use a DO statement (or a plpgsql function):
DO
$do$
BEGIN
EXECUTE
'CREATE TEMP TABLE tbl' || nextval('tablename_helper_seq'::regclass) || ' AS
SELECT * FROM tbl WHERE ... ';
RAISE NOTICE 'Temporary table created: "tbl%"' || ', lastval();
END
$do$;
lastval() and currval(regclass) are instrumental to return the dynamically created table name.

Importing variable number of columns into SQLite database

I have a list of synonyms in a csv file format: word,meaning1,meaning2,meaning3....
Different words have different number of synonyms which means that rows are likely to have a variable number of columns. I am trying to import the csv file into an sqlite database like so:
sqlite3 synonyms
sqlite> create table list(word text, meaning0 text, meaning1 text, meaning2 text, meaning3 text, meaning4 text, meaning5 text, meaning6 text, meaning7 text, meaning8 text, meaning9 text);
sqlite> .mode list
sqlite> .separator ,
sqlite> .import ./csv/synonyms.csv list
To be on the safe side, I assumed a max. number of 10 columns to each word. For those words with less than 10 synonyms, the other columns should be null. The error I get on executing the import command is:
Error: ./csv/synonyms.csv line 1: expected 11 columns of data but found 3
My question(s):
1. In case the number of columns is less than 10, how can I tell SQLite to substitute it with null?
2. Is there some way of specifying that I want 10 columns after word instead of typing it automatically?
You can do following:
Import all data into single column;
Update table splitting column contents into other columns.
Sample:
-- Create a table with only one column;
CREATE TABLE table_name(first);
-- Choose a separator which doesn't exist within file
.separator ~
-- Import data
.import file.csv table_name
-- Add another column to split data
ALTER TABLE table_name ADD COLUMN second;
-- Split data between first and second column
UPDATE table_name SET first=SUBSTR(first, 1, INSTR(first, ",")-1), second=SUBSTR(first, INSTR(first, ",")+1) WHERE INSTR(first, ",")>0;
-- Repeat to next column
ALTER TABLE table_name ADD COLUMN third;
-- Split data between second and third column
UPDATE table_name SET second=SUBSTR(second, 1, INSTR(second, ",")-1), third=SUBSTR(second, INSTR(second, ",")+1) WHERE INSTR(second, ",")>0;
-- And so on...
ALTER TABLE table_name ADD COLUMN fourth;
UPDATE table_name SET third=SUBSTR(third, 1, INSTR(third, ",")-1), fourth=SUBSTR(third, INSTR(third, ",")+1) WHERE INSTR(third, ",")>0;
-- Many times as needed...
Not being an optimal method, sqlite performance should render it enough fast.