PostgreSQL, CREATE TABLE AS with predefined column(s) - postgresql

For a first time I find very handy way for importing "last year data" to "this year data".
This works well:
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable AS
SELECT col1, col2, col3, col4
FROM dblink('host=localhost port=xxxx user=xxxx password=xxxx dbname=mylastyeardb',
'SELECT col1, col2, col3, col4
FROM mytable
WHERE TRIM(col1)<>'''' ')
AS x(col1 text, col2 text, col3 text, col4 text);
ALTER TABLE mytable ADD COLUMN cols_id SERIAL PRIMARY KEY;
Since 'cols_id' from old table is not appropriate for a new table maybe some of experienced users know how to setup a table in CREATE TABLE AS that it have 'cols_id' as (serial) primary key nice ordered and as a first column. Maybe such way I can avoid using of second (ALTER) command?
Any other advice for showed situation will be welcome too.

you either create table, defining its structure (with all handy shortcuts and options in one statement), or create table as select, "inheriting" [partially] the structure. Thus if you want primary key, you will need alter tabale any way...
To put id as first column in one statement, you can simply use a dummy value, eg sequential number:
t=# create table s as select row_number() over() as id,chr(n) from generate_series(197,200) n;
SELECT 4
t=# select * from s;
id | chr
----+-----
1 | Å
2 | Æ
3 | Ç
4 | È
(4 rows)
Of course after that you still need to create sequence, assign its value as default to the id column and add primary key on ot. Which makes it even more statements then you have ATM...

Related

DB2 - REPLACE INTO SELECT from table

Is there a way in db2 where I can replace the entire table with just selected rows from the same table ?
Something like REPLACE into tableName select * from tableName where col1='a';
(I can export the selected rows, delete the entire table and load/import again, but I want to avoid these steps and use a single query).
Original table
col1 col2
a 0 <-- replace all rows and replace with just col1 = 'a'
a 1 <-- col1='a'
b 2
c 3
Desired resultant table
col1 col2
a 0
a 1
Any help appreciated !
Thanks.
This is a duplicate of my answer to your duplicate question:
You can't do this in a single step. The locking required to truncate the table precludes you querying the table at the same time.
The best option you would have is to declare a global temporary table (DGTT) and insert the rows you want into it, truncate the source table, and then insert the rows from the DGTT back into the source table. Something like:
declare global temporary table t1
as (select * from schema.tableName where ...)
with no data
on commit preserve rows
not logged;
insert into session.t1 select * from schema.tableName;
truncate table schema.tableName immediate;
insert into schema.tableName select * from session.t1;
I know of no way to do what you're asking in one step...
You'd have to select out to a temporary table then copy back.
But I don't understand why you'd need to do this in the first place. Lets assume there was a REPLACE TABLE command...
REPLACE TABLE mytbl WITH (
SELECT * FROM mytbl
WHERE col1 = 'a' AND <...>
)
Why not simply delete the inverse set of rows...
DELETE FROM mytbl
WHERE NOT (col1 = 'a' AND <...>)
Note the comparisons done in the WHERE clause are the exact same. You just wrap them in a NOT ( ) to delete the ones you don't want to keep.

SELECT * except nth column

Is it possible to SELECT * but without n-th column, for example 2nd?
I have some view that have 4 and 5 columns (each has different column names, except for the 2nd column), but I do not want to show the second column.
SELECT * -- how to prevent 2nd column to be selected?
FROM view4
WHERE col2 = 'foo';
SELECT * -- how to prevent 2nd column to be selected?
FROM view5
WHERE col2 = 'foo';
without having to list all the columns (since they all have different column name).
The real answer is that you just can not practically (See LINK). This has been a requested feature for decades and the developers refuse to implement it. The best practice is to mention the column names instead of *. Using * in itself a source of performance penalties though.
However, in case you really need to use it, you might need to select the columns directly from the schema -> check LINK. Or as the below example using two PostgreSQL built-in functions: ARRAY and ARRAY_TO_STRING. The first one transforms a query result into an array, and the second one concatenates array components into a string. List components separator can be specified with the second parameter of the ARRAY_TO_STRING function;
SELECT 'SELECT ' ||
ARRAY_TO_STRING(ARRAY(SELECT COLUMN_NAME::VARCHAR(50)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='view4' AND
COLUMN_NAME NOT IN ('col2')
ORDER BY ORDINAL_POSITION
), ', ') || ' FROM view4';
where strings are concatenated with the standard operator ||. The COLUMN_NAME data type is information_schema.sql_identifier. This data type requires explicit conversion to CHAR/VARCHAR data type.
But that is not recommended as well, What if you add more columns in the long run but they are not necessarily required for that query?
You would start pulling more column than you need.
What if the select is part of an insert as in
Insert into tableA (col1, col2, col3.. coln) Select everything but 2 columns FROM tableB
The column match will be wrong and your insert will fail.
It's possible but I still recommend writing every needed column for every select written even if nearly every column is required.
Conclusion:
Since you are already using a VIEW, the simplest and most reliable way is to alter you view and mention the column names, excluding your 2nd column..
-- my table with 2 rows and 4 columns
DROP TABLE IF EXISTS t_target_table;
CREATE TEMP TABLE t_target_table as
SELECT 1 as id, 1 as v1 ,2 as v2,3 as v3,4 as v4
UNION ALL
SELECT 2 as id, 5 as v1 ,-6 as v2,7 as v3,8 as v4
;
-- my computation and stuff that i have to messure, any logic could be done here !
DROP TABLE IF EXISTS t_processing;
CREATE TEMP TABLE t_processing as
SELECT *, md5(t_target_table::text) as row_hash, case when v2 < 0 THEN true else false end as has_negative_value_in_v2
FROM t_target_table
;
-- now we want to insert that stuff into the t_target_table
-- this is standard
-- INSERT INTO t_target_table (id, v1, v2, v3, v4) SELECT id, v1, v2, v3, v4 FROM t_processing;
-- this is andvanced ;-)
INSERT INTO t_target_table
-- the following row select only the columns that are pressent in the target table, and ignore the others.
SELECT r.* FROM (SELECT to_jsonb(t_processing) as d FROM t_processing) t JOIN LATERAL jsonb_populate_record(NULL::t_target_table, d) as r ON TRUE
;
-- WARNING : you need a object that represent the target structure, an exclusion of a single column is not possible
For columns col1, col2, col3 and col4 you will need to request
SELECT col1, col3, col4 FROM...
to omit the second column. Requesting
SELECT *
will get you all the columns

select where not exists excluding identity column

I am inserting only new records that do not exist in a live table from a "dump" table. My issue is there is an identity column that I don't want to insert into the live, I want the live tables identity column to take care of incrementing the value but I am getting an insert error "Insert Error: Column name or number of supplied values does not match table definition." Is there a way around this or is the only fix to remove the identity column all together?
Thanks,
Sam
You need to list of all the needed columns in your query, excluding the identity column.
One more reason why you should never use SELECT *.
INSERT liveTable
(col1, col2, col3)
SELECT col1, col2, col3
FROM dumpTable dt
WHERE NOT EXISTS
(
SELECT 1
FROM liveTable lt
WHERE lt.Id == dt.Id
)
Pro tip: You can also achieve the above by using an OUTER JOIN between the dump and live tables and using WHERE liveTable.col1 = NULL (you will probably need to qualify the column names selected with the dump table alias).
I figured out the issue.... my live table didn't have the ID field set as an identity, somehow when I created it that field wasn't set up correctly.
you can leave that column in your insert statment like this
insert into destination (col2, col3, col4)
select col2, col3 col4 from source
Don't do just
insert into destination
select * from source

Compact or renumber IDs for all tables, and reset sequences to max(id)?

After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.
I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.
But it's not easy to compact IDs by hand because of references and constraints.
Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?
Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.
The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:
Compacting a sequence in PostgreSQL
The currently accepted answer will fail for most cases.
Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.
The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:
BEGIN;
LOCK tbl;
-- remove all FK constraints to the column
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
-- for the simple case without FK references - or see below:
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
-- Update referencing value in FK columns at the same time (if any)
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
-- add all FK constraints to the column back
COMMIT;
This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.
If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.
Example for a table fk_tbl and a FK column fk_id:
WITH u1 AS (
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id
RETURNING t.id, t1.new_id -- return old and new ID
)
UPDATE fk_tbl f
SET fk_id = u1.new_id -- set to new ID
FROM u1
WHERE f.fk_id = u1.id; -- match on old ID
More in the referenced answer on dba.SE.
Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.
CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.
For example:
Create the table, put some data in, and remove a middle value:
db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
4 | are
5 | you
(4 rows)
Reset your sequence:
db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE
Update your data:
db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
3 | are
4 | you
(4 rows)
new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)
\i tmp.sql
-- the test tables
CREATE TABLE one (
id serial NOT NULL PRIMARY KEY
, payload text
);
CREATE TABLE two (
id serial NOT NULL PRIMARY KEY
, the_fk INTEGER REFERENCES one(id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);
-- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;
INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;
-- make some gaps
DELETE FROM one WHERE id % 13 > 0;
-- SELECT * FROM two;
-- Add new keycolumns to one and two
ALTER TABLE one
ADD COLUMN new_id SERIAL NOT NULL UNIQUE
;
-- UPDATE:
-- This could need DEFERRABLE
-- Note since the update is only a permutation of the
-- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;
ALTER TABLE two
ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
;
-- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
;
SELECT * FROM two;
-- The crucial part: the final renaming
-- (at this point it would be better not to allow other sessions
-- messing with the {one,two} tables ...
-- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);
ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);
-- Some checks.
-- (the automatically generated names for the indexes
-- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two
UPDATE: added the permutation of new_id (after creating it as a serial)
Funny thing is: it doesn't seem to need 'DEFERRABLE'.
*This script will work for postgresql
This is a generic solution that works for all cases
This query find the desciption of the fields of all tables from any database.
WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from
( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
FROM information_schema.columns
) colum
full join (SELECT schemaname, relid, relname,objoid, objsubid, description
FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid ) descre
on descre.relname = colum.table_name and descre.objsubid=colum.ordinal_position and descre.schemaname=colum.schemaname )
This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).
It finds the number of records of the table and then increment this number by one.
SELECT table_name, column_name, ordinal_position,column_default,
data_type, is_nullable, character_maximum_length, is_updatable,
description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''', (select max( '||column_name ||')+1 from '|| table_name ||' ), true);' as req
FROM description_bd where column_default like '%nextva%'
Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job.
It is called like this :
=> SELECT resequence('port','id','port_id_seq');
resequence
--------------
5090 -> 3919
Takes 3 parameters
name of table
name of column that is SERIAL
name of sequence that the SERIAL uses
The function returns a short report of what it has done, with the previous value of the sequence and the new value.
The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.
The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.
The code :
CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE
_old BIGINT;_new BIGINT := 0;
BEGIN
FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
_new=_new+1;
EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
END LOOP;
RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;

sql: sort from two tables and order by date

I met a problem in my iPhone App - I create tow tables with sqlite3:
create table A (Name varchar(50), Added datetime);
create table B (UserID varchar(50), Username varchar(50), Created datetime);
I need to get all the values of the two tables ordered by time, which is like:
Alen 2011-06-25 17:56:00
12 Fire 2011-06-26 17:56:00
Bale 2011-07-01 17:56:00
As you see there is no relationship between of the tables, I've no idea about it.
The app is undergoing, and it's difficult to redesign the DB.
I'd like to know the solution based on the current DB schema (this is also the boss's requirement).
SELECT NULL AS Col1, Name AS Col2, Added AS Col3
FROM A
UNION ALL
SELECT UserID AS Col1, Username AS Col2, Created AS Col3
FROM B
ORDER BY 3