postgres: temporary column default that is unique and not nullable, without relying on a sequence? - postgresql

Hi, I want to add a unique, non-nullable column to a table.
It
already has data. I would therefore like to instantly populate the
new column with unique values, eg 'ABC123', 'ABC124', 'ABC125', etc.
The data will eventually be wiped and
replaced with proper data, so i don't want to introduce a sequence
just to populate the default value.
Is it possible to generate a default value for the existing rows, based on something like rownumber()? I realise the use case is ridiculous but is it possible to achieve... if so how?
...
foo text not null unique default 'ABC'||rownumber()' -- or something similar?
...

can be applied generate_series?
select 'ABC' || generate_series(123,130)::text;
ABC123
ABC124
ABC125
ABC126
ABC127
ABC128
ABC129
ABC130
Variant 2 add column UNIQUE and not null
begin;
alter table test_table add column foo text not null default 'ABC';
with s as (select id,(row_number() over(order by id))::text t from test_table) update test_table set foo=foo || s.t from s where test_table.id=s.id;
alter table test_table add CONSTRAINT unique_foo1 UNIQUE(foo);
commit;
results
select * from test_table;
id | foo
----+------
1 | ABC1
2 | ABC2
3 | ABC3
4 | ABC4
5 | ABC5
6 | ABC6

Related

Postgresql track serial by ID in another column

So I am trying to create a database that can store videos from products, but I do intend to add a few million of them. So obviously I want the performance to be as good as possible.
I wanted to achieve the following:
BIGINT | SMALLSERIAL | VARCHAR(30)
product_id | video_id | video_hash
1 1 Dkfjoie124
1 2 POoieqlgkQ
1 3 Xd2t9dakcx
2 1 Df2459Afdw
However, when I insert a new video for a product:
INSERT INTO TABLE (product_id, video_hash) VALUES (2, DSpewirncS)
I want the following to happen:
BIGINT | SMALLSERIAL | VARCHAR(30)
product_id | video_id | video_hash
1 1 Dkfjoie124
1 2 POoieqlgkQ
1 3 Xd2t9dakcx
2 1 Df2459Afdw
2 2 DSpewirncS
Will this happen when I set the column type for video_id to SMALLSERIAL? Because I am afraid that it will insert a different value (the highest in the entire column), which I do not want.
Thanks.
No, a serial is bound to a sequence and that doesn't reset without telling it to do.
But if you want an ordinal for the videos per products you can query the table to produce it using the row_number() window function.
SELECT product_id,
row_number() OVER (PARTITION BY product_id
ORDER BY video_id) video_ordinal,
video_hash
FROM table;
You could also create a view for this query for convenience, so that you can query the view instead of the table and the view would look like you want it.

How to remove bulk rows of duplicated chars from string in Postgres or SQLAlchemy?

I have a table with a column named "ids" , type of String. Could someone tell me how to remove the duplicated values in each of the rows?
Example, table is:
--------------------------------------------------
primary_key | ids
--------------------------------------------------
1 | {23,40,23}
--------------------------------------------------
2 | {78,40,13,78}
--------------------------------------------------
3 | {20,13,20}
--------------------------------------------------
4 | {7,2,7}
--------------------------------------------------
and I want to update it into:
--------------------------------------------------
primary_key | ids
--------------------------------------------------
1 | {23,40}
--------------------------------------------------
2 | {78,40,13}
--------------------------------------------------
3 | {20,13}
--------------------------------------------------
4 | {7,2}
--------------------------------------------------
In postgres I wrote:
UPDATE table_name
SET ids = (SELECT DISTINCT UNNEST(
(SELECT ids FROM table_name)::text[]))
In sqlalchemy I wrote:
session.query(table_name.ids).\
update({table_name.ids: func.unnest(table_name.ids,String).alias('data_view')},
synchronize_session=False)
None of these are working, so please help me, thanks in advance!
I think you could improve the design by storing these ids in another table one id per row with a foreign key referencing table_name.primary_key.
Also storing Array data as text strings seems strange.
Anyway, here is one way to do it: I wrapped the set returned by UNNEST with an inner subselect to be able to apply the aggregate_function needed to concatenate the strings again.
UPDATE table_name
SET ids = new_ids
FROM LATERAL (
SELECT primary_key, array_agg(elem)::text AS new_ids
FROM (SELECT DISTINCT primary_key, UNNEST(ids::text[]) as elem
FROM table_name ) t_inner
GROUP by primary_key )t_sub
WHERE t_sub.primary_key = table_name.primary_key

unique index violation during update

I have run into a unique index violation in a bigger db. The original problem occurs in a stored pl/pgsql function.
I have simplified everything to show my problem. I can reproduce it in a rather simple table:
CREATE TABLE public.test
(
id integer NOT NULL DEFAULT nextval('test_id_seq'::regclass),
pos integer,
text text,
CONSTRAINT text_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.test
OWNER TO root;
GRANT ALL ON TABLE public.test TO root;
I define a unique index on 'pos':
CREATE UNIQUE INDEX test_idx_pos
ON public.test
USING btree
(pos);
Before the UPDATE the data in the table looks like this:
testdb=# SELECT * FROM test;
id | pos | text
----+-----+----------
2 | 1 | testpos1
3 | 2 | testpos2
1 | 5 | testpos4
4 | 4 | testpos3
(4 Zeilen)
tr: (4 rows)
Now I want to decrement all 'pos' values by 1 that are bigger than 2 and get an error (tr are my translations from German to English):
testdb=# UPDATE test SET pos = pos - 1 WHERE pos > 2;
FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »test_idx_pos«
tr: ERROR: duplicate key violates unique constraint »test_idx_pos«
DETAIL: Schlüssel »(pos)=(4)« existiert bereits.
tr: key »(pos)=(4) already exists.
If the UPDATE had run complete the table would look like this and be unique again:
testdb=# SELECT * FROM test;
id | pos | text
----+-----+----------
2 | 1 | testpos1
3 | 2 | testpos2
1 | 4 | testpos4
4 | 3 | testpos3
(4 Zeilen)
tr: (4 rows)
How can I avoid such situation? I learned that stored pl/pgsql functions are embedded into transactions, so this problem shouldn't appear?
Unique indexes are evaluated per row not per statement (which is e.g. different to Oracle's implementation)
The solution to this problem is to use a unique constraint which can be deferred and thus is evaluated at the end of the transaction.
So instead of the unique index, define a constraint:
alter table test add constraint test_idx_pos unique (pos)
deferrable initially deferred;

Postgresql - `serial` column & inheritence (sequence sharing policy)

In postgresql, when inherit a serial column from parent table, the sequence is shared by parent & child table.
Is it possible to inherit the serial column, while let the 2 table have separated sequence values, e.g both table's column could have value 1.
Is this possible & reasonable, and if yes, how to do that?
#Update
The reasons that I want to avoid sequence sharing are:
Sharing a single int range by multiple table might use up the
MAX_INT, using bigint could improve this, but it takes more space
too.
There is a kind of resource locking when multiple table doing insert concurrently, so it's a performance issue I guess.
The id jump from 1 to 5 then might to 1000 don't look as beautiful as it could.
#Summary
solutions:
If want child table have its own sequence, while still keep the global sequence among parent & child table. (As described in #wildplasser 's answer.)
Then could add a sub_id serial column for each child table.
If want child table have its own sequence, while don't need a global sequence among parent & child table,
There there are 2 ways:
Using int instead of serial. (As described in #lsilva 's answer.)
Steps:
define type as int or bigint in parent table,
for each parent & child table, create a individual sequence,
specify default value for int type for each table using nextval of their own sequence,
don't forget to maintain/reset the sequence, when re-create table,
Define id serial directly in child table, and not in parent table.
DROP schema tmp CASCADE;
CREATE schema tmp;
set search_path = tmp, pg_catalog;
CREATE TABLE common
( seq SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE one
( subseq SERIAL NOT NULL
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
CREATE TABLE two
( subseq SERIAL NOT NULL
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
/**
\d common
\d one
\d two
\q
***/
INSERT INTO one(payload)
SELECT gs FROM generate_series(1,5) gs
;
INSERT INTO two(payload)
SELECT gs FROM generate_series(101,105) gs
;
SELECT * FROM common;
SELECT * FROM one;
SELECT * FROM two;
Results:
NOTICE: drop cascades to table tmp.common
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 5
seq
-----
1
2
3
4
5
6
7
8
9
10
(10 rows)
seq | subseq | payload
-----+--------+---------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
5 | 5 | 5
(5 rows)
seq | subseq | payload
-----+--------+---------
6 | 1 | 101
7 | 2 | 102
8 | 3 | 103
9 | 4 | 104
10 | 5 | 105
(5 rows)
But: in fact you don't need the subseq columns, since you can always enumerate them by means of row_number():
CREATE VIEW vw_one AS
SELECT seq
, row_number() OVER (ORDER BY seq) as subseq
, payload
FROM one;
CREATE VIEW vw_two AS
SELECT seq
, row_number() OVER (ORDER BY seq) as subseq
, payload
FROM two;
[results are identical]
And, you could add UNIQUE AND PRIMARY KEY constraints to the child tables, like:
CREATE TABLE one
( subseq SERIAL NOT NULL UNIQUE
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
ALTER TABLE one ADD PRIMARY KEY (seq);
[similar for table two]
I use this :
Parent table definition:
CREATE TABLE parent_table (
id bigint NOT NULL,
Child table definition:
CREATE TABLE cild_schema.child_table
(
id bigint NOT NULL DEFAULT nextval('child_schema.child_table_id_seq'::regclass),
I am emulating the serial by using a sequence number as a default.

PostgreSQL: Can't use DISTINCT for some data types

I have a table called _sample_table_delme_data_files which contains some duplicates. I want to copy its records, without duplicates, into data_files:
INSERT INTO data_files (SELECT distinct * FROM _sample_table_delme_data_files);
ERROR: could not identify an ordering operator for type box3d
HINT: Use an explicit ordering operator or modify the query.
Problem is, PostgreSQL can not compare (or order) box3d types. How do I supply such an ordering operator so I can get only the distinct into my destination table?
Thanks in advance,
Adam
If you don't add the operator, you could try translating the box3d data to text using its output function, something like:
INSERT INTO data_files (SELECT distinct othercols,box3dout(box3dcol) FROM _sample_table_delme_data_files);
Edit The next step is: cast it back to box3d:
INSERT INTO data_files SELECT othercols, box3din(b) FROM (SELECT distinct othercols,box3dout(box3dcol) AS b FROM _sample_table_delme_data_files);
(I don't have box3d on my system so it's untested.)
The datatype box3d doesn't have an operator for the DISTINCT-operation. You have to create the operator, or ask the PostGIS-project, maybe somebody has already fixed this problem.
Finally, this was solved by a colleague.
Let's see how many dups are there:
SELECT COUNT(*) FROM _sample_table_delme_data_files ;
count
-------
12728
(1 row)
Now, we shall add another column to the source table to help us differentiate similar rows:
ALTER TABLE _sample_table_delme_data_files ADD COLUMN id2 serial;
We can now see the dups:
SELECT id, id2 FROM _sample_table_delme_data_files ORDER BY id LIMIT 10;
id | id2
--------+------
198748 | 6449
198748 | 85
198801 | 166
198801 | 6530
198829 | 87
198829 | 6451
198926 | 88
198926 | 6452
199062 | 6532
199062 | 168
(10 rows)
And remove them:
DELETE FROM _sample_table_delme_data_files
WHERE id2 IN (SELECT max(id2) FROM _sample_table_delme_data_files
GROUP BY id
HAVING COUNT(*)>1);
Let's see it worked:
SELECT id FROM _sample_table_delme_data_files GROUP BY id HAVING COUNT(*)>1;
id
----
(0 rows)
Remove the auxiliary column:
ALTER TABLE _sample_table_delme_data_files DROP COLUMN id2;
ALTER TABLE
Insert the remaining rows into the destination table:
INSERT INTO data_files (SELECT * FROM _sample_table_delme_data_files);
INSERT 0 6364