Autogenerating ID column when doing COPY from CSV

Autogenerating ID column when doing COPY from CSV - postgresql

I have a simple table (4 text columns, and an ID column). I am trying to import my CSV file which has no ID column.
In Postico I have the schema setup as such:
DROP TABLE changes;
CREATE TABLE changes(
id SERIAL PRIMARY KEY,
commit_id TEXT,
additions INTEGER,
deletions INTEGER,
file_id TEXT
);
CREATE TEMP TABLE tmp_x AS SELECT * FROM changes LIMIT 0;
COPY tmp_x(commit_id,additions,deletions,file_id) FROM '/Users/George/git-parser/change_file' (format csv, delimiter E'\t');
INSERT INTO changes SELECT * FROM tmp_x
ON CONFLICT DO NOTHING;
DROP TABLE tmp_x;
But I am getting the error ERROR: null value in column "id" violates not-null constraint

You need to specify the columns:
COPY tmp_x (commit_id, additions, deletions, file_id)
FROM '/Users/George/git-parser/change_file' (format csv, delimiter E'\t');
The order of columns specified in the copy statement must obviously match the order of the columns in the input file.
You need to change your insert statement as well.
INSERT INTO changes SELECT * FROM tmp_x
will insert all columns from tmp_x into the target table, but as you did not define the id column as serial in the tmp_x table, nothing got generated and null values were inserted. And your insert statement just copies those null values.
You need to skip the id column in the insert statement:
INSERT INTO changes (commit_id,additions,deletions,file_id)
SELECT commit_id,additions,deletions,file_id
FROM tmp_x
ON CONFLICT DO NOTHING;
You can actually remove the id column from tmp_x

Related

Insert Into: Returns inserted records

I'm trying to insert some new records into a table and return the inserted records.
DECLARE #newRecords TABLE
(
Id INT NULL,
Col1 VARCHAR(MAX) NOT NULL,
Col2 VARCHAR(MAX) NOT NULL,
ForeignKey VARCHAR(MAX) NOT NULL
)
INSERT INTO #newRecords
SELECT ColA AS Col1, ColB AS Col2, Key AS ForeignKey
FROM SomeTable;
INSERT INTO PrimaryTable(Col1, Col2)
OUTPUT inserted* INTO #newRecords
SELECT Col1, Col2
FROM #newRecords;
The problem is that outputting the inserted records via the OUTPUT statement causes an error because the column ForeignKey does not exist in PrimaryTable. I am worried about outputting the records to a second temp table because I have no good means of joining the data from PrimaryTable to the #newRecords table.
Ideally what I would like to see happen is:
The new records are populated into the #newRecords temp table from SomeOtherTable
The appropriate columns from the #newRecords temp table are then inserted into the PrimaryTable
Finally the primary keys that were assigned to the newly inserted records in the PrimaryTable are updated into the #newRecords.Id column so that I can go on and do some additional record creations
I've tried messing around with the MERGE statement as well as seeing if I could add some additional statements to the OUTPUT statement but either it's not possible or (more likely) I am not using the syntax correctly.
Any ideas would be greatly appreciated.

Insert Row into Postgresql Table with Only Default Values

Question: Is there a way to insert a row in PostgreSQL table using the default values for all columns without specifying any column name?
Background: If I have a table with two columns that both have default values (or simply accept NULL), I can omit the column name when I insert a row if I wish the value for that column to be the default value. For instance:
CREATE TABLE test_table ( column1 TEXT, column2 TEXT );
I can insert into the table by only specifying a value for column1 or column2 and the missing column will be populated with the default value (NULL in this case):
INSERT INTO test_table (column1) VALUES ('foo');
INSERT INTO test_table (column2) VALUES ('bar');
The above will result in two rows: [('foo', NULL), (NULL, 'bar')]. However, if I want to use the default value for both columns, it seems that I have to specify at least one column name and explicitly give it the default value. The follow commands are all legal:
INSERT INTO test_table (column1) VALUES (DEFAULT);
INSERT INTO test_table (column2) VALUES (DEFAULT);
INSERT INTO test_table (column1, column2) VALUES (DEFAULT, DEFAULT);
I was unable to create a valid command that allowed me to omit all column names. The following attempts are all illegal:
INSERT INTO test_table;
INSERT INTO test_table () VALUES ();
Is there a way to do this or is it explicitly forbidden? I wasn't able to find any documentation for a case like this. Thanks!

I found that there is special syntax for this exact use-case:
INSERT INTO test_table DEFAULT VALUES;

COPY if column exists

I am writing a PLPGSQL function, that needs to import files into a table.
I have created a temporary table with 4 columns
CREATE TABLE IF NOT EXISTS tmp_ID_Customer (
ID int4 NULL,
Name varchar(2000) NULL,
CodeEx varchar(256) NULL,
AccountID varchar(256) NULL
)ON COMMIT DROP;
I am then trying to copy a file into this table, with the following
EXECUTE format('COPY tmp_ID_Customer FROM %L (FORMAT CSV, HEADER TRUE, DELIMITER(''|''))', _fileName);
The issue I have is some of these files only contain the first 3 columns.
So I am receiving an error saying
extra data after last expected column
I've tried specifying the columns, but as the final column doesn't always exist. I get an error.

Specify the columns you are copying:
COPY tmp_ID_Customer(id, name, codex) FROM ...

Copying rows violates not-null constraint in PostgreSQL

I am trying to do what is described in this solution and also here. That means I would like to copy rows with many columns while changing only a few values. So my query looks like this:
CREATE TEMPORARY TABLE temp_table AS
SELECT * FROM original_table WHERE <conditions>;
UPDATE temp_table
SET <auto_inc_field>=NULL,
<fieldx>=<valuex>,
<fieldy>=<valuey>;
INSERT INTO original_table SELECT * FROM temporary_table;
However, the <auto_inc_field>=NULL part is not working for me, respectively my PostgreSQL 9.4 database:
Exception: null value in column "auto_inc_field" violates not-null constraint
The <auto_inc_field> column is defined as BIGINT, SERIAL, and has a primary key constraint.
What do I need to pass, if NULL is not working? Is there an alternative method?

I understand that the primary key is a serial. List all columns but the primary key in the insert command. List the correspondent columns and values in the select command:
insert into original_table (col_1, col_2, col_3)
select col_1, value_2, value_2
from original_table
where the_conditions;

Add a serial column based on a sorted column

I have a table that has one column with unordered value. I want to order this column descending and add a column to record its order. My SQL code is:
select *
into newtable
from oldtable
order by column_name desc;
alter table newtable add column id serial;
Would this implement my goal? I know that rows in PostgreSQL have no fixed order. So I am not sure about this.

Rather than (ab)using a SERIAL via ALTER TABLE, generate it at insert-time.
CREATE TABLE newtable (id serial unique not null, LIKE oldtable INCLUDING ALL);
INSERT INTO newtable
SELECT nextval('newtable_id_seq'), *
FROM oldtable
ORDER BY column_name desc;
This avoids a table rewrite, and unlike your prior approach, is guaranteed to produce the correct ordering.
(If you want it to be the PK, and the prior table had no PK, change unique not null to primary key. If the prior table had a PK you'll need to use a LIKE variant that excludes constraints).

You can first create a new table, sorted based on the column you want to use:
CREATE TABLE newtable AS
SELECT * FROM oldtable
ORDER BY column_name desc;
Afterwards, since you want to order from the largest to the smallest, you can add a new column to your table:
ALTER TABLE newtable ADD COLUMN id serial unique;