Given these tables
Foo
id (PK)
name
updated
Bar
foo_id (FK)
name
updated
And this query:
SELECT *
FROM Foo as f
JOIN Bar as b
ON f.id=b.foo_id
WHERE b.name = 'Baz' AND f.name = 'Baz'
ORDER BY f.updated ASC, f.id ASC
LIMIT 10
OFFSET 10
Are these appropriate indexes to add - in MySql InnoDB the primary key column is automatically added to the end of a secondary index. What is the case with Postgres?
CREATE INDEX foo_name_id_idx ON foo(name, id)
CREATE INDEX bar_name_id_idx ON bar(name, id)
PostgreSQL does not make the distinction between primary and secondary indexes, and the primary key index is no different from other indexes. So the primary key is not added to other indexes, and there is no point in doing that unless you have a special reason for it.
Depending on which of the conditions are selective, there are three possible strategies:
If the condition on bar.name is selective, use bar as the driving site:
CREATE INDEX ON bar (name);
-- foo.id is already indexed
If the condition on foo.name is selective:
CREATE INDEX ON foo (name);
CREATE INDEX ON bar(foo_id); -- for a nested loop join
If none of the conditions are selective:
/* here the "id" is actually at the end of the index,
but that is just because it appears in ORDER BY */
CREATE INDEX ON foo (name, updated, id); -- for the ORDER BY
CREATE INDEX ON bar (foo_id); -- for a nested loop join
Related
I have a table cusers with a primary key:
primary key(uid, lid, cnt)
And I try to insert some values into the table:
insert into cusers (uid, lid, cnt, dyn, ts)
values
(A, B, C, (
select C - cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 1
), now())
on conflict do nothing
Quite often (with the possibility of 98%) a row cannot be inserted to cusers because it violates the primary key constraint, so hard select queries do not need to be executed at all. But as I can see PostgreSQL first counts the select query as a result of dyn column and only then rejects row because of uid, lid, cnt violation.
What is the best way to insert rows quickly in such situation?
Another explanation
I have a system where one row depends on another. Here is an example:
(x, x, 2, 2, <timestamp>)
(x, x, 5, 3, <timestamp>)
Two columns contain an absolute value (2 and 5) and relative value (2, 5 - 2). Each time I insert new row it should:
avoid same rows (see primary key constraint)
if new row differs, it should count a difference and put it into the dyn column (so I take the last inserted row for the user according to the timestamp and subtract values).
Another solution I've found is to use returning uid, lid, ts for inserts and get user ids which were really inserted - this is how I know they have differences from existing rows. Then I update inserted values:
update cusers
set dyn = (
select max(cnt) - min(cnt)
from (
select cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 2) Table
)
where uid = A and lid = B and ts = TS
But it is not a fast approach either, as it seeks all over the ts column to find the two last inserted rows for each user. I need a fast insert query as I insert millions of rows at a time (but I do not write duplicates).
What the solution can be? May be I need a new index for this? Thanks in advance.
I have the following table
CREATE TABLE T2
( ID_T2 integer NOT NULL PRIMARY KEY,
FK_T1 integer, <--- foreign key to T1(Table1)
FK_DATE date, <--- foreign key to T1(Table1)
T2_DATE date, <--- user input field
T2_MAX_DIFF COMPUTED BY ( (SELECT DATEDIFF (day, MAX(T2_DATE), CURRENT_DATE) FROM T2 GROUP BY FK_T1) )
);
I want T2_MAX_DIFF to display the number of days since last input across all similar entries with a common FK_T1.
It does work, but if another FK_T1 values is added to the table, I'm getting an error about "multiple rows in singleton select".
I'm assuming that I need some sort of WHERE FK_T1 = FK_T1 of corresponding row. Is it possible to add this? I'm using Firebird 3.0.7 with flamerobin.
The error "multiple rows in singleton select" means that a query that should provide a single scalar value produced multiple rows. And that is not unexpected for a query with GROUP BY FK_T1, as it will produce a row per FK_T1 value.
To fix this, you need to use a correlated sub-query by doing the following:
Alias the table in the subquery to disambiguate it from the table itself
Add a where clause, making sure to use the aliased table (e.g. src, and src.FK_T1), and explicitly reference the table itself for the other side of the comparison (e.g. T2.FK_T1)
(optional) remove the GROUP BY clause because it is not necessary given the WHERE clause. However, leaving the GROUP BY in place may uncover certain types of errors.
The resulting subquery then becomes:
(SELECT DATEDIFF (day, MAX(src.T2_DATE), CURRENT_DATE)
FROM T2 src
WHERE src.FK_T1 = T2.FK_T1
GROUP BY src.FK_T1)
Notice the alias src for the table referenced in the subquery, the use of src.FK_T1 in the condition, and the explicit use of the table in T2.FK_T1 to reference the column of the current row of the table itself. If you'd use src.FK_T1 = FK_T1, it would compare with the FK_T1 column of src (as if you'd used src.FK_T1 = src.FK_T2), so that would always be true.
CREATE TABLE T2
( ID_T2 integer NOT NULL PRIMARY KEY,
FK_T1 integer,
FK_DATE date,
T2_DATE date,
T2_MAX_DIFF COMPUTED BY ( (
SELECT DATEDIFF (day, MAX(src.T2_DATE), CURRENT_DATE)
FROM T2 src
WHERE src.FK_T1 = T2.FK_T1
GROUP BY src.FK_T1) )
);
I have a number of records that are common to all schemas. I place these records in a shared schema table, and would like to inheritthe rows of record from this shared parent table in each of the child schemas.
Suppose I have the following schemas:
CREATE SCHEMA parent;
CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE parent.component (product_id serial PRIMARY KEY, title text);
CREATE TABLE a.component () INHERITS (parent.component);
CREATE TABLE b.product () INHERITS (parent.component);
INSERT INTO parent.component(title) VALUES ('parent');
INSERT INTO a.component(title) VALUES ('a_test') ,('a_test2') ;
INSERT INTO b.component(title) VALUES ('b_test') ,('b_test2');
Is there a way to select the union of rows from the parent and either a.component or b.component when I issue a select on either a or b ?
So for example:
SELECT * FROM a.component;
returns rows:
id | title
---------------
1 parent
2 a_test
3 a_test2
PostgreSQL has multiple inheritance, so a table can be the child of many tables. You could try to inherit in the other direction!
But maybe a simple UNION ALL query is a simpler and better solution.
My query
delete from test.t1 where t2_id = 1;
My main table is t1 (includes around 1M rows and need to delete about 100k rows)
CREATE TABLE test.t1
(
id bigserial NOT NULL,
t2_id bigint,
... other fields
CONSTRAINT pk_t1 PRIMARY KEY (id),
CONSTRAINT fk_t1_t2 FOREIGN KEY (t2_id)
REFERENCES test.t2 (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I have index on t2_id and 3 other indexes on plain string fields.
CREATE INDEX t1_t2_idx ON test.t1 USING btree (t2_id);
There are multiple (around 50) tables that reference test.t1. I have an index on t1_id for every table that references it.
CREATE TABLE test.t7
(
id bigserial NOT NULL,
t1_id bigint,
... other fields
CONSTRAINT pk_objekt PRIMARY KEY (id),
CONSTRAINT fk_t7_t1 FOREIGN KEY (t1_id)
REFERENCES test.t1 (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE INDEX t7_t1_idx ON test.t7 USING btree (t1_id);
//No other indexes here
Contents of t7 are deleted before t1 and it's super fast compared to removing from t1. Ratio of rows to delete is the same (~10%), but the total number of rows is considerably smaller (around 100K).
I have not been able to reduce the time to reasonable length:
I've tried removing all index-is (cancelled after 24h-s)
Kept only t1_t2_idx and t7_t1_idx ... t50_t1_idx indexes (cancelled after 24h-s)
Kept all indexes (cancelled after 24h-s)
Also vacuum analyze is performed before deletion and there should not be any locks (only active query in db).
I have not tried copying to temp table and truncating t1, but this does not seem reasonable since t1 can grow up to 10M rows from which 1M need to be deleted at some point.
Any ideas how to improve removing?
EDIT
Quite sure there are no locks because pg_stat_activity shows only 2 active queries (delete and pg_stat_activity)
"Delete on test.t1 (cost=0.43..6713.66 rows=107552 width=6)"
" -> Index Scan using t1_t2_idx on test.t1 (cost=0.43..6713.66 rows=107552 width=6)"
" Output: ctid"
" Index Cond: (t1.t1_id = 1)"
Here is what I have so far:
INSERT INTO Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
SELECT CURRENT_DATE, NULL, NewRentPayments.Rent, NewRentPayments.LeaseTenantSSN, FALSE from NewRentPayments
WHERE NOT EXISTS (SELECT * FROM Tenants, NewRentPayments WHERE NewRentPayments.HouseID = Tenants.HouseID AND
NewRentPayments.ApartmentNumber = Tenants.ApartmentNumber)
So, HouseID and ApartmentNumber together make up the primary key. If there is a tuple in table B (NewRentPayments) that doesn't exist in table A (Tenants) based on the primary key, then it needs to be inserted into Tenants.
The problem is, when I run my query, it doesn't insert anything (I know for a fact there should be 1 tuple inserted). I'm at a loss, because it looks like it should work.
Thanks.
Your subquery was not correlated - It was just a non-correlated join query.
As per description of your problem, you don't need this join.
Try this:
insert into Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
select current_date, null, p.Rent, p.LeaseTenantSSN, FALSE
from NewRentPayments p
where not exists (
select *
from Tenants t
where p.HouseID = t.HouseID
and p.ApartmentNumber = t.ApartmentNumber
)