I am trying to Delete 1 record from a Parent table (TAB_A with Primary Key as COL_A) in DB2 10.5. This table has many child table, but the data is already deleted from Child tables. Due to the foreign key constraint one of the Child Table (TAB_B) Index Scan (IDX1_TAB_B) taking high cost. TAB_B has the composite index on (IDX1_TAB_B = COL_C + COL_A). Attaching the executing plan below:
Optimizer Plan:
Rows
Operator
(ID)
Cost
0.04
FILTER
( 2)
1450.45
+-------------------------+-------------------+--------------------------------------+----------+-------------------+---------------------+-------------------------+---------------------------+
1 1.66864 1.07095 0 6.30251 1.26543 25.0657 1
DELETE IXSCAN IXSCAN FETCH IXSCAN IXSCAN IXSCAN IXSCAN
( 3) ( 5) ( 6) ( 7) ( 9) (10) (11) (12)
22.7049 22.7038 **1344.45** 0.0107876 15.1445 22.7036 7.58684 15.1433
/---/ \ | | /----/ \ | | | |
1 6.91522e+06 6.56409e+06 75669 0 0 1.01318e+06 4.86182e+06 213 1.66563e+06
IXSCAN Table: Index: Index: IXSCAN Table: Index: Index: Index: Index:
( 4) GEXPDBA GEXPUSRT GEXPDBA ( 8) GEXPDBA GEXPUSRT GEXPUSRT GEXPDBA GEXPDBA
15.1415 TAB_A IX6_XXXXXXXXXXXX **IDX1_TAB_B** 0.0105474 TZZZZZZZZZZZZZZZZZZZZ IX1_LLLLLLLLLLLLLLL IX1_LOOOOOOOOOOOOOOOO IX1_LOCCCCCCCCCCCCCCCCCCCC IDX_GE
| |
6.91522e+06 0
Index: Index:
GEXPDBA GEXPDBA
IDX_TAB_A IDX_ZZZZZZZZZZZZZZZ
An index with FK columns not at the beginning of the index column list (i.e. (..., FKcol1 [, FKcol2, ...])) often is not so efficient as an index with those FK columns at the beginning of the index column list (i.e. (FKcol1 [, FKcol2, ...])), when the child table is involved in the operations requiring joining with its parent table like in the question.
So, the solution was just to create such an index to improve the operation performance (delete statement) on the parent table.
Related
I have a large (~110 million rows) table on PostgreSQL 12.3 whose relevant fields can be described by the following DDL:
CREATE TABLE tbl
(
item1_id integer,
item2_id integer,
item3_id integer,
item4_id integer,
type_id integer
)
One of the queries we execute often is:
SELECT type_id, item1_id, item2_id, item3_id, item4_id
FROM tbl
WHERE
type_id IS NOT NULL
AND item1_id IN (1, 2, 3)
AND (
item2_id IN (4, 5, 6)
OR item2_id IS NULL
)
AND (
item3_id IN (7, 8, 9)
OR item3_id IS NULL
)
AND (
item4_id IN (10, 11, 12)
OR item4_id IS NULL
)
Although we have indexes for each of the individual columns, the query is still relatively slow (a couple of seconds). Hoping to optimize this, I created the following index:
CREATE INDEX tbl_item_ids
ON public.tbl USING btree
(item1_id ASC, item2_id ASC, item3_id ASC, item4_id ASC)
WHERE type_id IS NOT NULL;
Unfortunately the query performance barely improved - EXPLAIN tells me this is because although an index scan is done with this newly created index, only item1_id is used as an Index Cond, whereas all the other filters are applied at table level (i.e. plain Filter).
I'm not sure why the index is not used in its entirety (or at least for more than the item1_id column). Is there an obvious reason for this? Is there a way I can restructure the index or the query itself to help with performance?
A multi-column index can only be used for more than the first column if the condition on the first column uses an equality comparison (=). IN or = ANY does not qualify.
So you will be better off with individual indexes for each column, which can be combined with a bitmap or.
You should try to avoid OR in the WHERE condition, perhaps with
WHERE coalesce(item2_id, -1) IN (-1, 4, 5, 6)
where -1 is a value that doesn't occur. Then you could use an index on the coalesce expression.
I am using a compound index on a table with more than 13 million records.
The index order is (center_code, created_on, status). The center_code and status both are varchar(100) not NULL and created_on is timestamp without time zone.
I read somewhere that order of indexes matter in a compound index. We have to check for number of unique values and put the one with the highest number of unique values at the first place in compound index.
The center_code can have 4000 distinct values.
The status can have 5 distinct values.
The min value of created_on is 2017-12-12 02:00:49.465317+00.
The question is what can be the number of unique values for created_on?
Should I put it first in the compound index?
Indexing on date column works on date basis, hour basis or second basis.
The problem is:
A simple SELECT query is taking more than 500 ms which is using just this compound index and nothing else.
Indexes on table:
Indexes:
"pa_key" PRIMARY KEY, btree (id)
"pa_uniq" UNIQUE CONSTRAINT, btree (wbill)
"pa_center_code_created_on_status_idx_new" btree (center_code, created_on, status)
The query is:
EXPLAIN ANALYSE
SELECT "pa"."wbill"
FROM "pa"
WHERE ("pa"."center_code" = 'IND110030AAC'
AND "pa"."status" IN ('Scheduled')
AND "pa"."created_on" >= '2018-10-10T00:00:00+05:30'::timestamptz);
Query Plan:
Index Scan using pa_center_code_created_on_status_idx_new on pa (cost=0.69..3769.18 rows=38 width=13) (actual time=5.592..15.526 rows=78 loops=1)
Index Cond: (((center_code)::text = 'IND110030AAC'::text) AND (created_on >= '2018-10-09 18:30:00+00'::timestamp with time zone) AND ((status)::text = 'Scheduled'::text))
Planning time: 1.156 ms
Execution time: 519.367 ms
Any help would be highly appreciated.
The index scan condition reads
(((center_code)::text = 'IND110030AAC'::text) AND
(created_on >= '2018-10-09 18:30:00+00'::timestamp with time zone) AND
((status)::text = 'Scheduled'::text))
but the index scan itself is only over (center_code, created_on), while the condition on status is applied as a filter.
Unfortunately this is not visible from the execution plan, but it follows from the following rule:
An index scan will only use conditions if the rows satisfying the conditions are next to each other in the index.
Let's consider this example (in index order):
center_code | created_on | status
--------------+---------------------+-----------
IND110030AAC | 2018-10-09 00:00:00 | Scheduled
IND110030AAC | 2018-10-09 00:00:00 | Xtra
IND110030AAC | 2018-10-10 00:00:00 | New
IND110030AAC | 2018-10-10 00:00:00 | Scheduled
IND110030AAC | 2018-10-11 00:00:00 | New
IND110030AAC | 2018-10-11 00:00:00 | Scheduled
You will see that the query needs the 4th and 6th row.
PostgreSQL cannot scan the index with all three conditions, because the required rows are not next to each other. It will have to scan only with the first two conditions, because all rows satisfying those are right next to each other.
Your rule for multi-column indexes is wrong. The columns at the left of the index have to be the ones where = is used as comparison operator in the conditions.
The perfect index would be one on (center_code, status, created_on).
One of the tips that I have learned from working is that when you created compound idx, the column with condition (=) should be priority and other conditions like (>, <, >=, <=, IN) will follow after.
My query
delete from test.t1 where t2_id = 1;
My main table is t1 (includes around 1M rows and need to delete about 100k rows)
CREATE TABLE test.t1
(
id bigserial NOT NULL,
t2_id bigint,
... other fields
CONSTRAINT pk_t1 PRIMARY KEY (id),
CONSTRAINT fk_t1_t2 FOREIGN KEY (t2_id)
REFERENCES test.t2 (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I have index on t2_id and 3 other indexes on plain string fields.
CREATE INDEX t1_t2_idx ON test.t1 USING btree (t2_id);
There are multiple (around 50) tables that reference test.t1. I have an index on t1_id for every table that references it.
CREATE TABLE test.t7
(
id bigserial NOT NULL,
t1_id bigint,
... other fields
CONSTRAINT pk_objekt PRIMARY KEY (id),
CONSTRAINT fk_t7_t1 FOREIGN KEY (t1_id)
REFERENCES test.t1 (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE INDEX t7_t1_idx ON test.t7 USING btree (t1_id);
//No other indexes here
Contents of t7 are deleted before t1 and it's super fast compared to removing from t1. Ratio of rows to delete is the same (~10%), but the total number of rows is considerably smaller (around 100K).
I have not been able to reduce the time to reasonable length:
I've tried removing all index-is (cancelled after 24h-s)
Kept only t1_t2_idx and t7_t1_idx ... t50_t1_idx indexes (cancelled after 24h-s)
Kept all indexes (cancelled after 24h-s)
Also vacuum analyze is performed before deletion and there should not be any locks (only active query in db).
I have not tried copying to temp table and truncating t1, but this does not seem reasonable since t1 can grow up to 10M rows from which 1M need to be deleted at some point.
Any ideas how to improve removing?
EDIT
Quite sure there are no locks because pg_stat_activity shows only 2 active queries (delete and pg_stat_activity)
"Delete on test.t1 (cost=0.43..6713.66 rows=107552 width=6)"
" -> Index Scan using t1_t2_idx on test.t1 (cost=0.43..6713.66 rows=107552 width=6)"
" Output: ctid"
" Index Cond: (t1.t1_id = 1)"
I would like to add Database side validation to allow only one category based on Order ID using SQL_Constraints or Check constraint.
Table: order_line_table
Example allow to insert or update same category only based on order id
Id Order_id Categ_id
1 1 4
2 1 4
3 1 4
4 2 5
5 2 5
Example not allow to insert or update different category based on order id
Id Order_id Categ_id
6 3 4
7 3 5
I tried below code its working in server side. But using web service xmlrpc validation is not working.
#api.one
#api.constrains('order_line')
def _check_category(self):
list_categ = []
filter_categ = []
if self.order_line:
order_line_vals = self.order_line
for line_vals in order_line_vals:
for line in line_vals:
categ_id = line.categ_id and line.categ_id.id or False
list_categ.append(line.categ_id.id)
if isinstance(line, dict):
list_categ.append(line['categ_id'])
filter_categ = list(set(list_categ))
if len(filter_categ) > 1:
raise UserError(_('Only one product category is allowed!'))
At first I misunderstood your question, so I'm updating the answer.
To achieve your goal you could use EXCLUDE constraint in PostgreSQL:
CREATE TABLE order_line_table
(
Id SERIAL PRIMARY KEY,
Order_id INT,
Categ_id INT,
EXCLUDE USING GIST
(
Order_id WITH =,
Categ_id WITH <>
)
);
To support GIST index over <> operator you have to install an additional PostgreSQL extension btree_gist:
CREATE EXTENSION btree_gist;
Demo:
# INSERT INTO order_line_table (Order_id, Categ_id) VALUES (1, 2);
INSERT 0 1
# INSERT INTO order_line_table (Order_id, Categ_id) VALUES (1, 2);
INSERT 0 1
# INSERT INTO order_line_table (Order_id, Categ_id) VALUES (1, 3);
ERROR: conflicting key value violates exclusion constraint "orders_order_id_category_id_excl"
DETAIL: Key (Order_id, Categ_id)=(1, 3) conflicts with existing key (Order_id, Categ_id)=(1, 2).
When I run the following script with Postgres 9.3 (with enable_seqscan set to off), I expect the final query to make use of the "forms_string" partial index, but instead uses the "forms_int" index, which doesn't make sense.
When I've been testing this with actual code with JSON functions and indexes for more types, it consistently seems to use whatever the last index created was, for every query.
Adding more unrelated rows so that the rows relevant to the partial index are only a small percentage of total rows in the table results in a "bitmap heap scan", but still mentions the same incorrect index after that.
Any idea how I can get it to use the correct index?
CREATE EXTENSION IF NOT EXISTS plv8;
CREATE OR REPLACE FUNCTION
json_string(data json, key text) RETURNS TEXT AS $$
var ret = data,
keys = key.split('.'),
len = keys.length;
for (var i = 0; i < len; ++i) {
if (ret) {
ret = ret[keys[i]]
};
}
if (typeof ret === "undefined") {
ret = null;
} else if (ret) {
ret = ret.toString();
}
return ret;
$$ LANGUAGE plv8 IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION
json_int(data json, key text) RETURNS INT AS $$
var ret = data,
keys = key.split('.'),
len = keys.length;
for (var i = 0; i < len; ++i) {
if (ret) {
ret = ret[keys[i]]
}
}
if (typeof ret === "undefined") {
ret = null;
} else {
ret = parseInt(ret, 10);
if (isNaN(ret)) {
ret = null;
}
}
return ret;
$$ LANGUAGE plv8 IMMUTABLE STRICT;
CREATE TABLE form_types (
id SERIAL NOT NULL,
name VARCHAR(200),
PRIMARY KEY (id)
);
CREATE TABLE tenants (
id SERIAL NOT NULL,
name VARCHAR(200),
PRIMARY KEY (id)
);
CREATE TABLE forms (
id SERIAL NOT NULL,
tenant_id INTEGER,
type_id INTEGER,
data JSON,
PRIMARY KEY (id),
FOREIGN KEY(tenant_id) REFERENCES tenants (id),
FOREIGN KEY(type_id) REFERENCES form_types (id)
);
CREATE INDEX ix_forms_type_id ON forms (type_id);
CREATE INDEX ix_forms_tenant_id ON forms (tenant_id);
INSERT INTO tenants (name) VALUES ('mike'), ('bob');
INSERT INTO form_types (name) VALUES ('type 1'), ('type 2');
INSERT INTO forms (tenant_id, type_id, data) VALUES
(1, 1, '{"string": "unicorns", "int": 1}'),
(1, 1, '{"string": "pythons", "int": 2}'),
(1, 1, '{"string": "pythons", "int": 8}'),
(1, 1, '{"string": "penguins"}');
CREATE OR REPLACE VIEW foo AS
SELECT forms.id AS forms_id,
json_string(forms.data, 'string') AS "data.string",
json_int(forms.data, 'int') AS "data.int"
FROM forms
WHERE forms.tenant_id = 1 AND forms.type_id = 1;
CREATE INDEX "forms_string" ON forms (json_string(data, 'string'))
WHERE tenant_id = 1 AND type_id = 1;
CREATE INDEX "forms_int" ON forms (json_int(data, 'int'))
WHERE tenant_id = 1 AND type_id = 1;
EXPLAIN ANALYZE VERBOSE SELECT "data.string" from foo;
Outputs:
Index Scan using forms_int on public.forms
(cost=0.13..8.40 rows=1 width=32) (actual time=0.085..0.239 rows=20 loops=1)
Output: json_string(forms.data, 'string'::text)
Total runtime: 0.282 ms
Without enable_seqscan=off:
Seq Scan on public.forms (cost=0.00..1.31 rows=1 width=32) (actual time=0.080..0.277 rows=28 loops=1)
Output: json_string(forms.data, 'string'::text)
Filter: ((forms.tenant_id = 1) AND (forms.type_id = 1))
Total runtime: 0.327 ms
\d forms prints
Table "public.forms"
Column | Type | Modifiers
-----------+---------+----------------------------------------------------
id | integer | not null default nextval('forms_id_seq'::regclass)
tenant_id | integer |
type_id | integer |
data | json |
Indexes:
"forms_pkey" PRIMARY KEY, btree (id)
"forms_int" btree (json_int(data, 'int'::text)) WHERE tenant_id = 1 AND type_id = 1
"forms_string" btree (json_string(data, 'string'::text)) WHERE tenant_id = 1 AND type_id = 1
"ix_forms_tenant_id" btree (tenant_id)
"ix_forms_type_id" btree (type_id)
Foreign-key constraints:
"forms_tenant_id_fkey" FOREIGN KEY (tenant_id) REFERENCES tenants(id)
"forms_type_id_fkey" FOREIGN KEY (type_id) REFERENCES form_types(id)
Index vs seqscan, costs
Looks like your random_page_cost is too high compared to the real performance of your machine. Random I/O is faster (costs less) than Pg thinks it does, so it's choosing a slightly less ideal plan.
That's why the cost estimate for the indexscan is (cost=0.13..8.40 rows=1 width=32) and for the seqscan it's slightly lower at (cost=0.00..1.31 rows=1 width=32).
Lower random_page_cost - try SET random_page_cost = 2 then re-running.
To learn more, read the documentation on PostgreSQL query planning, parameters, and tuning, and the relevant wiki pages.
Index selection
PostgreSQL appears to be picking an index scan on forms_int instead of forms_string because it'll be a more compact, smaller index, and both indexes exactly match the search criteria for the view: tenant_id = 1 AND type_id = 1.
If you disable or drop forms_int it'll probably use forms_string and go slightly slower.
The key thing to understand is that while the index does contain the value of interest, PostgreSQL isn't actually using it. It's scanning the index without an index condition, since every tuple in the index matches, to get tuples from the heap. It's then extracting the value from those heap tuples and outputting them.
This can be demonstrated with an expression-index on a constant:
CREATE INDEX "forms_novalue" ON forms((true)) WHERE tenant_id = 1 AND type_id = 1;
PostgreSQL is quite likely to select this index for the query:
regress=# EXPLAIN ANALYZE VERBOSE SELECT "data.string" from foo;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Index Scan using forms_novalue on public.forms (cost=0.13..13.21 rows=4 width=32) (actual time=0.190..0.310 rows=4 loops=1)
Output: json_string(forms.data, 'string'::text)
Total runtime: 0.346 ms
(3 rows)
All the indexes are the same size because they're all so tiny they fit in the minimum allocation:
regress=# SELECT x.idxname, pg_relation_size(x.idxname) FROM (VALUES ('forms_novalue'),('forms_int'),('forms_string')) x(idxname);
idxname | pg_relation_size
---------------+------------------
forms_novalue | 16384
forms_int | 16384
forms_string | 16384
(3 rows)
but the stats for novalue will be somewhat more attractive due to a narrower row width.
Index scan vs index-only scan
It sounds like what you really expect is an index-only scan, where Pg never touches the table's heap and only uses the tuples in the index its self.
I would expect that this query's requirements could be satisfied with forms_string, but can't get Pg to pick an index-only scan plan for it.
It's not immediately clear to me why Pg is not using an index-only scan here, as it should be a candidate, but it doesn't seem to be able to plan one. If I force enable_indexscan = off, it'll pick an inferior bitmap index scan plan instead, and if force disable enable_bitmapscan it'll fall back to a max-cost-estimate seqscan. This is true even after a VACUUM of the table(s) of interest.
That means it must not be being generated as a candidate path in the query planner - Pg doesn't know how to use an index-only scan for this query, or thinks it cannot do so for some reason.
It isn't an issue with view introspection, as an expanded view query is the same.
Your table has insufficient data in it. In short, Postgres won't use an index when the table fits in a single disk page. Ever. When your table contains a few hundred or thousand rows, it'll become too big to fit, and then you'll see Postgres begin to use index scans when relevant.
Another point to consider is you need to analyze your tables after a large import. Without accurate stats on your actual data, Postgres may end up dismissing some index scans as too expensive, when in fact they'd be cheap.
Lastly, there are cases when it is cheaper to not use an index. In essence, whenever Postgres is about to visit most disk pages repeatedly and in a random order to retrieve a large number of rows, it'll seriously consider the cost of visiting most (bitmap index) or all (seq scan) disk pages once sequentially and filtering invalid rows out. The latter wins if you're selecting enough rows.