I got some strange behavior of my program, and maybe you can bring some light into it.
Today i started testing some code, and realized, that a specific query was really slow (took about 2 minutes).
here the select:
select distinct table1.someName
from table1
INNER JOIN table2 ON table2.id = table1.t2_id
INNER JOIN table3 ON table1.id = table3.t1_id
INNER JOIN table4 ON Table3.id = table4.t3_id
INNER JOIN table5 ON table5.id = table4.t5_id
INNER JOIN table6 ON table4.id = table6.t4_id
where t4_name = 'whatever'
and t2_name = 'moarWhatever'
and timestamp_till is null
order by someName
So the thing is, the result is about 120 records. the INNER JOINs reduce the amount of checks for timestamp_till is null to about 20 records on each record.
What bugs me most is, i've tested to insert the whole table table6 into a new created table and renamed timestamp_till to ende. On that table the select is done in about 0.1 seconds ...
Is timestamp_till some sort of reserved name of SQLite3? Could this be a bug in the SQLite engine? Is it my fault? oO
edit: add the EXPLAIN QUERY PLAN output...
When querying with the and timestamp_till is null he gives:
0|0|4|SEARCH TABLE table5 USING COVERING INDEX sqlite_autoindex_table5 (t4_name=?) (~1 rows)
0|1|3|SEARCH TABLE table4 USING INDEX table4.fk_table4_1_idx (t5_id=?) (~10 rows)
0|2|2|SEARCH TABLE table3 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|3|0|SEARCH TABLE table1 USING INTEGER PRIMARY KEY (rowid=?) (~1rows)
0|4|1|SEARCH TABLE table2 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|5|5|SEARCH TABLE table6 USING INDEX table6.fk_table6_ts_till (timestamp_till=?) (~2 rows)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR DISTINCT
and the fast one is:
select distinct table1.someName
from table1
INNER JOIN table2 ON table2.id = table1.t2_id
INNER JOIN table3 ON table1.id = table3.t1_id
INNER JOIN table4 ON Table3.id = table4.t3_id
INNER JOIN table5 ON table5.id = table4.t5_id
INNER JOIN table6 ON table4.id = table6.t4_id
where t4_name = 'whatever'
and t2_name = 'moarWhatever'
order by someName
and its result:
0|0|4|SEARCH TABLE table5 USING COVERING INDEX sqlite_autoindex_table5_1 (t4name=?) (~1 rows)
0|1|3|SEARCH TABLE table4 USING INDEX table4.fk_table4_1_idx (t5_id=?) (~10 rows)
0|2|2|SEARCH TABLE table3 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|3|0|SEARCH TABLE table1 USING INTEGER PRIMARY KEY (rowid=?) (~1rows)
0|4|1|SEARCH TABLE table2 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|5|5|SEARCH TABLE table6 USING COVERING INDEX sqlite_autoindex_table6_1 (id=?) (~10 rows)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR DISTINCT
with the test-table that is a copy of table6
0|0|4|SEARCH TABLE table5 USING COVERING INDEX sqlite_autoindex_table5_1 (name=?) (~1 rows)
0|1|3|SEARCH TABLE table4 USING INDEX table4.fk_t5_idx (t5_id=?) (~10 rows)
0|2|2|SEARCH TABLE table3 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|3|0|SEARCH TABLE table1 USING INTEGER PRIMARY KEY (rowid=?) (~1rows)
0|4|1|SEARCH TABLE table2 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|5|5|SEARCH TABLE test USING INDEX test.fk_test__idx (id=?) (~2 rows)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR DISTINCT
create script for test
CREATE TABLE "test"(
"id" INTEGER NOT NULL,
"t12_id" INTEGER NOT NULL,
"value" DECIMAL NOT NULL,
"anfang" INTEGER NOT NULL,
"ende" INTEGER DEFAULT NULL,
PRIMARY KEY("id","t12_id","anfang"),
CONSTRAINT "fk_test_t12_id"
FOREIGN KEY("t12_id")
REFERENCES "table12"("id"),
CONSTRAINT "fk_test_id"
FOREIGN KEY("id")
REFERENCES "id_col"("id"),
CONSTRAINT "fk_test_anfang"
FOREIGN KEY("anfang")
REFERENCES "ts_col"("id"),
CONSTRAINT "fk_test_ende"
FOREIGN KEY("ende")
REFERENCES "ts_col"("id")
);
CREATE INDEX "test.fk_test_idx_t12_id" ON "test"("t12_id");
CREATE INDEX "test.fk_test_idx_id" ON "test"("id");
CREATE INDEX "test.fk_test_anfang" ON "test"("anfang");
CREATE INDEX "test.fk_test_ende" ON "test"("ende");
soo long zai
A first note: SQLite will use only 1 index in its query. Never more (with the current version).
Thus, here is what SQLite is doing:
Slow query: use the index on timestamp_till
Fast query (no timestamp_till): use the (auto) index on table6.id.
I see two workarounds.
You could use a subquery:
select distinct SomeName FROM
(
select table1.someName as "SomeName", timestamp_till
from table1
INNER JOIN table2 ON table2.id = table1.t2_id
INNER JOIN table3 ON table1.id = table3.t1_id
INNER JOIN table4 ON Table3.id = table4.t3_id
INNER JOIN table5 ON table5.id = table4.t5_id
INNER JOIN table6 ON table4.id = table6.t4_id
where t4_name = 'whatever'
and t2_name = 'moarWhatever'
) Q
where timestamp_till is null
order by SomeName;
Or you can drop your index on timestamp_till, if you don't need it elsewhere.
There is also perhaps some speed gains to be made by reordering your joins. Usually the smallest table first is faster, but this can vary greatly.
Related
I have two tables:
table1:
"table1_pkey" PRIMARY KEY, btree (id)
"table1_user_id" btree (user_id)
"table1_active" btree (bool_to_int(active))
...
Referenced by:
TABLE "table2" CONSTRAINT "my_id" FOREIGN KEY (my_id) REFERENCES table1(id) DEFERRABLE INITIALLY DEFERRED
table2:
"table2_pkey" PRIMARY KEY, btree (id)
"table2_my_id" btree (my_id)
...
"my_id" FOREIGN KEY (my_id) REFERENCES table1(id)
DEFERRABLE INITIALLY DEFERRED
I'm trying to perform the following statement:
delete from table2 using table1 where user_id = 3
and my_id = any(array[1, 2, 3])
What do I expect:
table2 and table1 are joined using a foreign key my_id <-> id
Rows with user_id = 3 from table1 and all the my_ids from given array from table2 are selected for delete.
Rows are deleted.
However, what happens is that this statement deletes all the rows which pass on part of the condition with my_id = any(array[1, 2, 3]) completely ignoring the user_id = 3 part. Causing a delete of records of other users.
So the question is: why does delete with using clause is not working as expected? What am I missing?
I already figured out a more simple solution to this problem which seem to be working as expected:
delete from table2 where my_id in (select my_id from table2 join table1 on table1.id = alert_id where user_id = 3 and my_id = any(array[1, 2, 3]))
But the original question still remains.
I'm trying to select the orders that are part of a trip with multiple orders.
I tried many approaches but can't find how to have a performant query.
To reproduce the problem here is the setup (here it's 100 000 rows, but really it's more 1 000 000 rows to see the timeout on db-fiddle).
Schema (PostgreSQL v14)
create table trips (id bigint primary key);
create table orders (id bigint primary key, trip_id bigint);
create index trips_idx on trips (id);
create index orders_idx on orders (id);
create index orders_trip_idx on orders (trip_id);
insert into trips (id) select seq from generate_series(1,100000) seq;
insert into orders (id, trip_id) select seq, floor(random() * 100000 + 1) from generate_series(1,100000) seq;
Query #1
explain analyze select orders.id
from orders
inner join trips on trips.id = orders.trip_id
inner join orders trips_orders on trips_orders.trip_id = trips.id
group by orders.id, trips.id
having count(trips_orders) > 1
limit 50
;
View on DB Fiddle
Here is what pgmustard gives me on the real query:
Do you actually need the join on trips? You could try
SELECT shared.id
FROM orders shared
WHERE EXISTS (SELECT * FROM orders other
WHERE other.trip_id = shared.trip_id
AND other.id != shared.id
)
;
to replace the group by with a hash join, or
SELECT unnest(array_agg(orders.id))
FROM orders
GROUP BY trip_id
HAVING count(*) > 1
;
to hopefully get Postgres to just use the trip_id index.
I am trying to consult a database using pgAdmin3 and I need to join to tables. I am using the following code:
SELECT table1.species, table1.trait, table1.value, table1.units, table2.id, table2.family, table2.latitude, table2.longitude, table2.species as speciescheck
FROM table1 INNER JOIN table2
ON table1.species = table2.species
But I keep running this error:
an out of memory error
So I've tried to insert my result in a new table, as follow:
CREATE TABLE new_table AS
SELECT table1.species, table1.trait, table1.value, table1.units, table2.id, table2.family, table2.latitude, table2.longitude, table2.species as speciescheck
FROM table1 INNER JOIN table2
ON table1.species = table2.species
And still got an error:
ERROR: could not extend file "base/17675/43101.15": No space left on device
SQL state: 53100
Hint: Check free disk space.
I am very very new at this (is the first time I have to deal with PostgreSQL) and I guess I can do something to optimize this query and avoid this type of error. I have no privileges in the database. Can anyone help??
Thanks in advance!
Updated:
Table 1 description
-- Table: table1
-- DROP TABLE table1;
CREATE TABLE table1
(
species character varying(100),
trait character varying(50),
value double precision,
units character varying(50)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table1
OWNER TO postgres;
GRANT ALL ON TABLE table1 TO postgres;
GRANT SELECT ON TABLE table1 TO banco;
-- Index: speciestable1_idx
-- DROP INDEX speciestable1_idx;
CREATE INDEX speciestable1_idx
ON table1
USING btree
(species COLLATE pg_catalog."default");
-- Index: traittype_idx
-- DROP INDEX traittype_idx;
CREATE INDEX traittype_idx
ON table1
USING btree
(trait COLLATE pg_catalog."default");
and table2 as:
-- Table: table2
-- DROP TABLE table2;
CREATE TABLE table2
(
id integer NOT NULL,
family character varying(40),
species character varying(100),
plotarea real,
latitude double precision,
longitude double precision,
source integer,
latlon geometry,
CONSTRAINT table2_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table2
OWNER TO postgres;
GRANT ALL ON TABLE table2 TO postgres;
GRANT SELECT ON TABLE table2 TO banco;
-- Index: latlon_gist
-- DROP INDEX latlon_gist;
CREATE INDEX latlon_gist
ON table2
USING gist
(latlon);
-- Index: species_idx
-- DROP INDEX species_idx;
CREATE INDEX species_idx
ON table2
USING btree
(species COLLATE pg_catalog."default");
You're performing a join between two tables on the column species.
Not sure what's in your data, but if species is a column with significantly fewer values than the number of records (e.g. if species is "elephant", "giraffe" and you're analyzing all animals in Africa), this join will match every elephant with every elephant.
When joining two tables most of the time you try to use a unique or close to unique attribute, like id (not sure what id means in your case, but could be it).
I am using Postgresql and need to query two tables like this:
Table1
ID Bill
A 1
B 2
B 3
C 4
Table2
ID
A
B
I want a table with all the columns in Table1 but keeping only the records with IDs that are available in Table2 (A and B in this case). Also, Table2's ID is unique.
ID Bill
A 1
B 2
B 3
Which join I should use or if I can use WHERE statement?
Thanks!
SELECT Table1.*
FROM Table1
INNER JOIN Table2 USING (ID);
or
SELECT *
FROM Table1
WHERE ID IN (SELECT ID FROM Table2);
but the first one is better for performance reason.
SELECT *
FROM Table1
WHERE EXISTS (
SELECT 1 FROM Table2 WHERE Table2.ID = Table1.ID LIMIT 1
)
I have two tables, where they have the same ID name (I cannot change the way the tables are designed) and I'm trying to query table2's ID, how would I do this when they are joined?
create table table1(
id integer, -- PG: serial
description MediumString not null,
primary key (id)
);
create table table2 (
id integer, -- PG: serial
tid references table1(id),
primary key (id)
);
So basically when they're joined, two columns will have the same name "id" if I do the following query
select * from table1
join table2 on table1.id = table2.tid;
Alias the columns if you want both "id"s
SELECT table1.id AS id1, table2.id AS id2
FROM table1...
If you want to query all * on both tables but still be able to reference a specific id you can do that too, you will end up with duplicate id columns that you probably won't use, but in some situations if you really need all the data, it's worth it.
select table1.*, table2.*, table1.id as 'table1.id', table2.id as 'table2.id'
from ...
You cannot select it using select *.
try this :
select table1.id, table1.description, table2.id, table2.tid
from table1
inner join table2
on table1.id = table2.tid