I’d like to update some parameters for a table, such as the dist and sort key. In order to do so, I’ve renamed the old version of the table, and recreated the table with the new parameters (these can not be changed once a table has been created).
I need to preserve the id field from the old table, which is an IDENTITY field. If I try the following query however, I get an error:
insert into edw.my_table_new select * from edw.my_table_old;
ERROR: cannot set an identity column to a value [SQL State=0A000]
How can I keep the same id from the old table?
You can't INSERT data setting the IDENTITY columns, but you can load data from S3 using COPY command.
First you will need to create a dump of source table with UNLOAD.
Then simply use COPY with EXPLICIT_IDS parameter as described in Loading default column values:
If an IDENTITY column is included in the column list, the EXPLICIT_IDS
option must also be specified in the COPY command, or the COPY command
will fail. Similarly, if an IDENTITY column is omitted from the column
list, and the EXPLICIT_IDS option is specified, the COPY operation
will fail.
You can explicitly specify the columns, and ignore the identity column:
insert into existing_table (col1, col2) select col1, col2 from another_table;
Use ALTER TABLE APPEND twice, first time with IGNOREEXTRA and the second time with FILLTARGET.
If the target table contains columns that don't exist in the source
table, include FILLTARGET. The command fills the extra columns in the
source table with either the default column value or IDENTITY value,
if one was defined, or NULL.
It moves the columns from one table to another, extremely quickly; took me 4s for 1GB table in dc1.large node.
Appends rows to a target table by moving data from an existing source
table.
...
ALTER TABLE APPEND is usually much faster than a similar CREATE TABLE
AS or INSERT INTO operation because data is moved, not duplicated.
Faster and simpler than UNLOAD + COPY with EXPLICIT_IDS.
Related
I have declared and temporary table successfully.
DECLARE GLOBAL TEMPORARY TABLE SESSION.MY_TEMP_TABLE
LIKE MYTABLES.AN_EXISTING_TABLE
INCLUDING IDENTITY
ON COMMIT PRESERVE ROWS
WITH REPLACE;
I then use the following to merge two tables and output this into my temporary table:
INSERT INTO SESSION.MY_TEMP_TABLE
SELECT a.*
FROM (SELECT * FROM MYTABLES.TABLE_A) as a
LEFT JOIN
(SELECT * FROM MYTABLES.TABLE_B) as b
ON a.KEY=b.KEY;
Now this above all works.
ISSUE: I now want to merge on two new variables from a further table (MYTABLES.TABLE_C), however it will not let me because I declared the temporary table with a certain number of columns and I am trying to add further columns. I did a google and it seems ALTER TABLE will not work with DECLARED TEMPORARY tables, any help please?
Session tables (DGTT) need to be declared with all the required columns , as you cannot use alter table to add additional columns to a session table.
A way around this limitation is to use session tables in a different manner, specifically to create a new session table on demand with whatever additional columns you need (possibly also including the data from other tables). This can be very fast when you use the NOT LOGGED option. It also works well if your session table uses DISTRIBUTE BY HASH on environments that support that feature.
Here is an example that shows 3 session tables, the third of which has all columns from the first two tables:
declare global temporary table session.m1 like emp including identity on commit preserve rows with replace not logged;
declare global temporary table session.m2 like org including identity on commit preserve rows with replace not logged;
declare global temporary table session.m3 as (select * from session.m1, session.m2) with data with replace not logged;
If you do not want to populate the session table at time of declaration you can use DEFINITION ONLY instead of WITH DATA (or use WITH NO DATA) and populate the table later via insert or merge.
When trying to do a query like below:
INSERT INTO employee_channels (employee_id, channels)
VALUES ('46356699-bed1-4ec4-9ac1-76f124b32184', '{a159d680-2f2e-4ba7-9498-484271ad0834}')
ON CONFLICT (employee_id)
DO UPDATE SET channels = array_append(channels, 'a159d680-2f2e-4ba7-9498-484271ad0834')
WHERE employee_id = '46356699-bed1-4ec4-9ac1-76f124b32184'
AND NOT lower(channels::text)::text[] #> ARRAY['a159d680-2f2e-4ba7-9498-484271ad0834'];
I get the following error
[42702] ERROR: column reference "channels" is ambiguous Position: 245
The specific reference to channels it's referring to is the 'channels' inside array_append.
channels is a CITEXT[] data type
You may need to specify the EXCLUDED table in your set statement.
SET channels = array_append(EXCLUDED.channels, 'a159d680-2f2e-4ba7-9498-484271ad0834')
When using the ON CONFLICT DO UPDATE clause the values that aren't inserted because of the conflict are stored in the EXCLUDED table. Its an ephemeral table you don't have to actually make, the way NEW and OLD are in triggers.
From the PostgreSQL Manual:
conflict_action specifies an alternative ON CONFLICT action. It can be either DO NOTHING, or a DO UPDATE clause specifying the exact
details of the UPDATE action to be performed in case of a conflict.
The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
existing row using the table's name (or an alias), and to rows
proposed for insertion using the special excluded table. SELECT
privilege is required on any column in the target table where
corresponding excluded columns are read.
Note that the effects of all per-row BEFORE INSERT triggers are reflected in excluded values, since those effects may have contributed
to the row being excluded from insertion.
I am new in PostgreSQL and I am working with this database.
I got a file which I imported, and I am trying to get rows with a certain ID. But the ID is not defined, as you can see it in this picture:
so how do I access this ID? I want to use an SQL command like this:
SELECT * from table_name WHERE ID = 1;
If any order of rows is ok for you, just add a row number according to the current arbitrary sort order:
CREATE SEQUENCE tbl_tbl_id_seq;
ALTER TABLE tbl ADD COLUMN tbl_id integer DEFAULT nextval('tbl_tbl_id_seq');
The new default value is filled in automatically in the process. You might want to run VACUUM FULL ANALYZE tbl to remove bloat and update statistics for the query planner afterwards. And possibly make the column your new PRIMARY KEY ...
To make it a fully fledged serial column:
ALTER SEQUENCE tbl_tbl_id_seq OWNED BY tbl.tbl_id;
See:
Creating a PostgreSQL sequence to a field (which is not the ID of the record)
What you see are just row numbers that pgAdmin displays, they are not really stored in the database.
If you want an artificial numeric primary key for the table, you'll have to create it explicitly.
For example:
CREATE TABLE mydata (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
obec text NOT NULL,
datum timestamp with time zone NOT NULL,
...
);
Then to copy the data from a CSV file, you would run
COPY mydata (obec, datum, ...) FROM '/path/to/csvfile' (FORMAT 'csv');
Then the id column is automatically filled.
We had a few tables have their data get deleted.
Now I have a copy database running where we were able to restore the lost data. This is running in the same server instance.
However, the id columns are now out of sync. As some rows have obviously been deleted over time etc and so for example row 98 in the new table data is 98. But maybe 101 in the old table.
Is it possible to copy all the data from the table across including the id? Or can I turn identity off from the column. Inert the data and then turn it back on.
I have tried the following code;
Truncate Table DEV_DB.dbo.UserNames
Go
Set Identity_Insert DEV_DB.dbo.UserNames On
Insert Into DEV_DB.dbo.UserNames
Select * From LIVE_DB.dbo.UserNames
Set Identity_Insert DEV_DB.dbo.UserNames Off
However I get the error
Msg 8101, Level 16, State 1, Line 6
An explicit value for the identity column in table 'DEV_DB.dbo.UserNames' can only be specified when a column list is used and IDENTITY_INSERT is ON.
You need to read the error more carefully:
Msg 8101, Level 16, State 1, Line 6
An explicit value for the identity column in table 'DEV_DB.dbo.UserNames' can only be specified when a column list is used and IDENTITY_INSERT is ON.
Notice that it says “can only be used when a column list is used”. You are not using a column list in your INSERT. If you specified a column list, it should succeed. This would look something like:
SET IDENTITY_INSERT DEV_DB.dbo.UserNames ON;
INSERT INTO DEV_DB.dbo.UserNames (
Id
,Column1
,Column2
) SELECT *
FROM LIVE_DB.dbo.UserNames;
SET IDENTITY_INSERT DEV_DB.dbo.UserNames OFF;
It can be tedious to write that, however. You might want to generate the SQL statement with other code.
What i do to avoid this is In a new table you are copying to do not make id an auto increment
copy the data over via a loop and the id will stay the same
create a separate table with one column (Not Auto Inc) and put the last record number in there
then when adding new records create a script to get the last number and increase it by one and use that number for the new record.
Never have an issue of losing id number then.
I've a lot of records that are originally from MySQL. I massaged the data so it will be successfully inserted into PostgreSQL using ActiveRecord. This I can easily do with insertions on row basis i.e one row at a time. This is very slow I want to do bulk insert but this fails if any of the rows contains invalid data. Is there anyway I can achieve bulk insert and only the invalid rows failing instead of the whole bulk?
COPY
When using SQL COPY for bulk insert (or its equivalent \copy in the psql client), failure is not an option. COPY cannot skip illegal lines. You have to match your input format to the table you import to.
If data itself (not decorators) is violating your table definition, there are ways to make this a lot more tolerant though. For instance: create a temporary staging table with all columns of type text. COPY to it, then fix offending rows with SQL commands before converting to the actual data type and inserting into the actual target table.
Consider this related answer:
How to bulk insert only new rows in PostreSQL
Or this more advanced case:
"ERROR: extra data after last expected column" when using PostgreSQL COPY
If NULL values are offending, remove the NOT NULL constraint from your target table temporarily. Fix the rows after COPY, then reinstate the constraint. Or take the route with the staging table, if you cannot afford to soften your rules temporarily.
Sample code:
ALTER TABLE tbl ALTER COLUMN col DROP NOT NULL;
COPY ...
-- repair, like ..
-- UPDATE tbl SET col = 0 WHERE col IS NULL;
ALTER TABLE tbl ALTER COLUMN col SET NOT NULL;
Or you just fix the source table. COPY tells you the number of the offending line. Use an editor of your preference and fix it, then retry. I like to use vim for that.
INSERT
For an INSERT (like commented) the check for NULL values is trivial:
To skip a row with a NULL value:
INSERT INTO (col1, ...
SELECT col1, ...
WHERE col1 IS NOT NULL
To insert sth. else instead of a NULL value (empty string in my example):
INSERT INTO (col1, ...
SELECT COALESCE(col1, ''), ...
A common work-around for this is to import the data into a TEMPORARY or UNLOGGED table with no constraints and, where data in the input is sufficiently bogus, text typed columns.
You can then do INSERT INTO ... SELECT queries against the data to populate the real table with a big query that cleans up the data during import. You can use a lot of CASE statements for this. The idea is to transform the data in one pass.
You might be able to do many of the fixes in Ruby as you read the data in, then push the data to PostgreSQL using COPY ... FROM STDIN. This is possible with Ruby's Pg gem, see eg https://bitbucket.org/ged/ruby-pg/src/tip/sample/copyfrom.rb .
For more complicated cases, look at Pentaho Kettle or Talend Studio ETL tools.