So I am trying to create a database that can store videos from products, but I do intend to add a few million of them. So obviously I want the performance to be as good as possible.
I wanted to achieve the following:
BIGINT | SMALLSERIAL | VARCHAR(30)
product_id | video_id | video_hash
1 1 Dkfjoie124
1 2 POoieqlgkQ
1 3 Xd2t9dakcx
2 1 Df2459Afdw
However, when I insert a new video for a product:
INSERT INTO TABLE (product_id, video_hash) VALUES (2, DSpewirncS)
I want the following to happen:
BIGINT | SMALLSERIAL | VARCHAR(30)
product_id | video_id | video_hash
1 1 Dkfjoie124
1 2 POoieqlgkQ
1 3 Xd2t9dakcx
2 1 Df2459Afdw
2 2 DSpewirncS
Will this happen when I set the column type for video_id to SMALLSERIAL? Because I am afraid that it will insert a different value (the highest in the entire column), which I do not want.
Thanks.
No, a serial is bound to a sequence and that doesn't reset without telling it to do.
But if you want an ordinal for the videos per products you can query the table to produce it using the row_number() window function.
SELECT product_id,
row_number() OVER (PARTITION BY product_id
ORDER BY video_id) video_ordinal,
video_hash
FROM table;
You could also create a view for this query for convenience, so that you can query the view instead of the table and the view would look like you want it.
Related
Hi, I want to add a unique, non-nullable column to a table.
It
already has data. I would therefore like to instantly populate the
new column with unique values, eg 'ABC123', 'ABC124', 'ABC125', etc.
The data will eventually be wiped and
replaced with proper data, so i don't want to introduce a sequence
just to populate the default value.
Is it possible to generate a default value for the existing rows, based on something like rownumber()? I realise the use case is ridiculous but is it possible to achieve... if so how?
...
foo text not null unique default 'ABC'||rownumber()' -- or something similar?
...
can be applied generate_series?
select 'ABC' || generate_series(123,130)::text;
ABC123
ABC124
ABC125
ABC126
ABC127
ABC128
ABC129
ABC130
Variant 2 add column UNIQUE and not null
begin;
alter table test_table add column foo text not null default 'ABC';
with s as (select id,(row_number() over(order by id))::text t from test_table) update test_table set foo=foo || s.t from s where test_table.id=s.id;
alter table test_table add CONSTRAINT unique_foo1 UNIQUE(foo);
commit;
results
select * from test_table;
id | foo
----+------
1 | ABC1
2 | ABC2
3 | ABC3
4 | ABC4
5 | ABC5
6 | ABC6
I have a table with 10 million records, there are about 1 million records with id from 1-1 million, and about 9 million records with the null values. How can I set the id for null values with a sequence of id's that following the existing id.
Try this in a test area to see how long it takes to populate your table. We'll use a short example here.
create table test (id int, fullname text);
insert into test values (1, 'john');
insert into test values (2, 'john');
insert into test values (NULL, 'john');
insert into test values (NULL, 'john');
This simulation shows that records 1 and 2 have an ID and 3 and 4 don't have an ID, yet.
Create a sequence using which we will populate ID in records 3 and 4.
create sequence populate_test start 3;
Now, let's populate:
update test set id = nextval('populate_test') where id is null;
Result:
select * from test;
id | fullname
----+----------
1 | john
2 | john
3 | john
4 | john
In your case, you could try the cache option of create sequence like so: create sequence populate_test start 3 cache 1000000; to cache 1MM numbers at a time.
I want to filter out the bookings of Table 1 which is in between the date range of Table 2.
Table 1:
Booking_ID | starts | ends
Table 2:
ID | holiday_starts | holiday_ends
I know there is a overlap function in redshift which can be used but I am not able to figure out how to pass all the date values of table 2 into the overlap function.
I want to do something like
select Booking_ID
from table1
where (table1.starts, table1.ends) overlaps (select holiday_starts, holiday_ends
from table2)
In postgresql, when inherit a serial column from parent table, the sequence is shared by parent & child table.
Is it possible to inherit the serial column, while let the 2 table have separated sequence values, e.g both table's column could have value 1.
Is this possible & reasonable, and if yes, how to do that?
#Update
The reasons that I want to avoid sequence sharing are:
Sharing a single int range by multiple table might use up the
MAX_INT, using bigint could improve this, but it takes more space
too.
There is a kind of resource locking when multiple table doing insert concurrently, so it's a performance issue I guess.
The id jump from 1 to 5 then might to 1000 don't look as beautiful as it could.
#Summary
solutions:
If want child table have its own sequence, while still keep the global sequence among parent & child table. (As described in #wildplasser 's answer.)
Then could add a sub_id serial column for each child table.
If want child table have its own sequence, while don't need a global sequence among parent & child table,
There there are 2 ways:
Using int instead of serial. (As described in #lsilva 's answer.)
Steps:
define type as int or bigint in parent table,
for each parent & child table, create a individual sequence,
specify default value for int type for each table using nextval of their own sequence,
don't forget to maintain/reset the sequence, when re-create table,
Define id serial directly in child table, and not in parent table.
DROP schema tmp CASCADE;
CREATE schema tmp;
set search_path = tmp, pg_catalog;
CREATE TABLE common
( seq SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE one
( subseq SERIAL NOT NULL
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
CREATE TABLE two
( subseq SERIAL NOT NULL
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
/**
\d common
\d one
\d two
\q
***/
INSERT INTO one(payload)
SELECT gs FROM generate_series(1,5) gs
;
INSERT INTO two(payload)
SELECT gs FROM generate_series(101,105) gs
;
SELECT * FROM common;
SELECT * FROM one;
SELECT * FROM two;
Results:
NOTICE: drop cascades to table tmp.common
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 5
seq
-----
1
2
3
4
5
6
7
8
9
10
(10 rows)
seq | subseq | payload
-----+--------+---------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
5 | 5 | 5
(5 rows)
seq | subseq | payload
-----+--------+---------
6 | 1 | 101
7 | 2 | 102
8 | 3 | 103
9 | 4 | 104
10 | 5 | 105
(5 rows)
But: in fact you don't need the subseq columns, since you can always enumerate them by means of row_number():
CREATE VIEW vw_one AS
SELECT seq
, row_number() OVER (ORDER BY seq) as subseq
, payload
FROM one;
CREATE VIEW vw_two AS
SELECT seq
, row_number() OVER (ORDER BY seq) as subseq
, payload
FROM two;
[results are identical]
And, you could add UNIQUE AND PRIMARY KEY constraints to the child tables, like:
CREATE TABLE one
( subseq SERIAL NOT NULL UNIQUE
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
ALTER TABLE one ADD PRIMARY KEY (seq);
[similar for table two]
I use this :
Parent table definition:
CREATE TABLE parent_table (
id bigint NOT NULL,
Child table definition:
CREATE TABLE cild_schema.child_table
(
id bigint NOT NULL DEFAULT nextval('child_schema.child_table_id_seq'::regclass),
I am emulating the serial by using a sequence number as a default.
Postgres:
create table stock(item_id int primary key, balance float);
insert into stock values(10,2200);
insert into stock values(20,1900);
select * from stock;
create table buy(item_id int primary key, volume float);
insert into buy values(10,1000);
insert into buy values(30,300);
select * from buy;
results:
item_id | balance
---------+---------
10 | 2200
20 | 1900
(2 rows)
item_id | volume
---------+--------
10 | 1000
30 | 300
(2 rows)
Now i want another table which include these two table's data.
The new table which has 3 rows of data with item_id(10,20,30) and no duplication
I need query for this; either by merge or join.
I'm guessing:
that you really want a view rather than a table
that the values in the 'buy' table are supposed to be deducted from the 'stock'
so here's what I think you are after:
create view v_current_stock as
select item_id, sum(balance) as balance
from ( select item_id, balance from stock
union all
select item_id, -volume from buy )
group by item_id;
EDIT: seems like my guesswork was a bit off (see comments). Perhaps you are looking for a full join:
create view v as
select * from stock full join buy using (item_id);
select * from v;
item_id | balance | volume
---------+---------+--------
10 | 2200 | 1000
20 | 1900 |
30 | | 300
You can use a insert into ... select syntax :
create table mytable(item_id int primary key, balance float, volume float);
insert into mytable
select distinct stock.item_id, balance, volume
from stock
inner join buy on buy.item_id = stock.item_id;
You can use a different type of join if needed (left join or full join). In your case, I think you need a full join, but since I'm not sure I'll stick with the inner join in the example.