How to create index on table which is partitioned? - postgresql

How to create an index on a partitioned table in PostgreSQL 11.2?
My table is:
CREATE TABLE sometablename
(
column1 character varying(255) COLLATE pg_catalog."default" NOT NULL,
column2 integer NOT NULL,
column3 character varying(255) COLLATE pg_catalog."default" NOT NULL,
"timestamp" timestamp without time zone NOT NULL,
avg_val double precision,
max_val double precision,
min_val double precision,
p95_val double precision,
sample_count double precision,
sum_val double precision,
unit character varying(255) COLLATE pg_catalog."default",
user_id bigint NOT NULL,
CONSTRAINT testtable_pkey PRIMARY KEY (column1, column2, column3, "timestamp", user_id)
)
PARTITION BY HASH (user_id)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE UNIQUE INDEX testtable_unique_pkey
ON sometablename USING btree (column1 COLLATE pg_catalog."default", column2
COLLATE pg_catalog."default", "timestamp", user_id)
TABLESPACE pg_default;
As you can see testtable_unique_pkey is my index.
but when I run:
SELECT tablename, indexname, indexdef
FROM pg_indexes
WHERE tablename = 'sometablename'
I can't see my index.
I checked the explain analysis on my queries which is also not using the index.

The index for the base table is never really created, so it doesn't show up in pg_indexes:
CREATE TABLE base_table
(
column1 varchar(255) NOT NULL,
column2 integer NOT NULL,
user_id bigint NOT NULL
)
PARTITION BY HASH (user_id);
CREATE UNIQUE INDEX idx_one ON base_table (column1, column2, user_id);
So the following returns nothing:
select *
from pg_indexes
where tablename = 'base_table';
It is however stored in pg_class:
select i.relname as indexname, t.relname as tablename
from pg_class i
join pg_index idx on idx.indexrelid = i.oid
join pg_class t on t.oid = idx.indrelid
where i.relkind = 'I'
and t.relname = 'base_table';
returns:
indexname | tablename
----------+-----------
idx_one | base_table
But for each partition the index will show up in pg_indexes:
create table st_p1 partition of base_table for values with (modulus 4, remainder 0);
create table st_p2 partition of base_table for values with (modulus 4, remainder 1);
create table st_p3 partition of base_table for values with (modulus 4, remainder 2);
create table st_p4 partition of base_table for values with (modulus 4, remainder 3);
And then:
select tablename, indexname
from pg_indexes
where tablename in ('st_p1', 'st_p2', 'st_p3', 'st_p4');
returns:
tablename | indexname
----------+----------------------------------
st_p1 | st_p1_column1_column2_user_id_idx
st_p2 | st_p2_column1_column2_user_id_idx
st_p3 | st_p3_column1_column2_user_id_idx
st_p4 | st_p4_column1_column2_user_id_idx
Update 2020-06-26:
The fact that the index did not show up in pg_indexes was acknowledged as a bug by the Postgres team and was fixed in Postgres 12.
So the above explanation is only valid for Postgres 10 and 11. Starting with Postgres 12, the index on base_table will be shown in `pg_indexes.

Related

Is it possible to find duplicating records in two columns simultaneously in PostgreSQL?

I have the following database schema (oversimplified):
create sequence partners_partner_id_seq;
create table partners
(
partner_id integer default nextval('partners_partner_id_seq'::regclass) not null primary key,
name varchar(255) default NULL::character varying,
company_id varchar(20) default NULL::character varying,
vat_id varchar(50) default NULL::character varying,
is_deleted boolean default false not null
);
INSERT INTO partners(name, company_id, vat_id) VALUES('test1','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test2','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test3','3214567890102', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test4','9999999999999', 'GE9999999999999');
I am trying to figure out how to return test1, test2 (because the company_id column value duplicates vertically) and test3 (because the vat_id column value duplicates vertically as well).
To put it in other words - I need to find duplicating company_id and vat_id records and group them together, so that test1, test2 and test3 would be together, because they duplicate by company_id and vat_id.
So far I have the following query:
SELECT *
FROM (
SELECT *, LEAD(row, 1) OVER () AS nextrow
FROM (
SELECT *, ROW_NUMBER() OVER (w) AS row
FROM partners
WHERE is_deleted = false
AND ((company_id != '' AND company_id IS NOT null) OR (vat_id != '' AND vat_id IS NOT NULL))
WINDOW w AS (PARTITION BY company_id, vat_id ORDER BY partner_id DESC)
) x
) y
WHERE (row > 1 OR nextrow > 1)
AND is_deleted = false
This successfully shows all company_id duplicates, but does not appear to show vat_id ones - test3 row is missing. Is this possible to be done within one query?
Here is a db-fiddle with the schema, data and predefined query reproducing my result.
You can do this with recursion, but depending on the size of your data you may want to iterate, instead.
The trick is to make the name just another match key instead of treating it differently than the company_id and vat_id:
create table partners (
partner_id integer generated always as identity primary key,
name text,
company_id text,
vat_id text,
is_deleted boolean not null default false
);
insert into partners (name, company_id, vat_id) values
('test1','1010109191191', 'BG1010109191192'),
('test2','1010109191191', 'BG1010109191192'),
('test3','3214567890102', 'BG1010109191192'),
('test4','9999999999999', 'GE9999999999999'),
('test5','3214567890102', 'BG8888888888888'),
('test6','2983489023408', 'BG8888888888888')
;
I added a couple of test cases and left in the lone partner.
with recursive keys as (
select partner_id,
array['n_'||name, 'c_'||company_id, 'v_'||vat_id] as matcher,
array[partner_id] as matchlist,
1 as size
from partners
), matchers as (
select *
from keys
union all
select p.partner_id, c.matcher,
p.matchlist||c.partner_id as matchlist,
p.size + 1
from matchers p
join keys c
on c.matcher && p.matcher
and not p.matchlist #> array[c.partner_id]
), largest as (
select distinct sort(matchlist) as matchlist
from matchers m
where not exists (select 1
from matchers
where matchlist #> m.matchlist
and size > m.size)
-- and size > 1
)
select *
from largest
;
matchlist
{1,2,3,5,6}
{4}
fiddle
EDIT UPDATE
Since recursion did not perform, here is an iterative example in plpgsql that uses a temporary table:
create temporary table match1 (
partner_id int not null,
group_id int not null,
matchkey uuid not null
);
create index on match1 (matchkey);
create index on match1 (group_id);
insert into match1
select partner_id, partner_id, md5('n_'||name)::uuid from partners
union all
select partner_id, partner_id, md5('c_'||company_id)::uuid from partners
union all
select partner_id, partner_id, md5('v_'||vat_id)::uuid from partners;
do $$
declare _cnt bigint;
begin
loop
with consolidate as (
select group_id,
min(group_id) over (partition by matchkey) as new_group_id
from match1
), minimize as (
select group_id, min(new_group_id) as new_group_id
from consolidate
group by group_id
), doupdate as (
update match1
set group_id = m.new_group_id
from minimize m
where m.group_id = match1.group_id
and m.new_group_id != match1.group_id
returning *
)
select count(*) into _cnt from doupdate;
if _cnt = 0 then
exit;
end if;
end loop;
end;
$$;
updated fiddle

query a jsonb array in a select of another query

I have a user table with a column favorites that is a jsonb
favorites:
[
{
"id_doc:": 9,
"type": "post"
},
{
"id_doc": 10,
"type": "post"
}
]
And I have another table posts where I want to make a query by id and this id must be in the fields id_doc in the favorites user
select * from posts where id in (select favorites -> id_doc from users )
This is the schema
CREATE TABLE dev.users
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
firstname character varying COLLATE pg_catalog."default" NOT NULL,
lastname character varying COLLATE pg_catalog."default" NOT NULL,
email character varying COLLATE pg_catalog."default" NOT NULL,
password character varying COLLATE pg_catalog."default" NOT NULL,
favorites jsonb[],
CONSTRAINT users_pkey PRIMARY KEY (id),
CONSTRAINT email_key UNIQUE (email)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE dev.users
OWNER to postgres;
CREATE TABLE dev.posts
(
id integer NOT NULL DEFAULT nextval('dev.posts_id_seq'::regclass),
title character varying COLLATE pg_catalog."default" NOT NULL,
userid integer NOT NULL,
description character varying COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT posts_pkey PRIMARY KEY (id)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE dev.posts
OWNER to postgres;
How can I do this?
Thank you
There are other ways to accomplish this, but I prefer using CTEs for clarity. Please let me know in the comments if you have questions about what this does.
with elements as (
select jsonb_array_elements(favorites) as favitem
from users
), fav_ids as (
select distinct (favitem->>'id_doc')::int as id_doc
from elements
)
select p.*
from posts p
join fav_ids f on f.id_doc = p.id
;
Update
Since the column is defined as jsonb[] rather than json, we need to unnest() instead of jsonb_array_elements():
with elements as (
select unnest(favorites) as favitem
from users
), fav_ids as (
select distinct (favitem->>'id_doc')::int as id_doc
from elements
)
select p.*
from posts p
join fav_ids f on f.id_doc = p.id
;

postgres inner JOIN query out of memory

I am trying to consult a database using pgAdmin3 and I need to join to tables. I am using the following code:
SELECT table1.species, table1.trait, table1.value, table1.units, table2.id, table2.family, table2.latitude, table2.longitude, table2.species as speciescheck
FROM table1 INNER JOIN table2
ON table1.species = table2.species
But I keep running this error:
an out of memory error
So I've tried to insert my result in a new table, as follow:
CREATE TABLE new_table AS
SELECT table1.species, table1.trait, table1.value, table1.units, table2.id, table2.family, table2.latitude, table2.longitude, table2.species as speciescheck
FROM table1 INNER JOIN table2
ON table1.species = table2.species
And still got an error:
ERROR: could not extend file "base/17675/43101.15": No space left on device
SQL state: 53100
Hint: Check free disk space.
I am very very new at this (is the first time I have to deal with PostgreSQL) and I guess I can do something to optimize this query and avoid this type of error. I have no privileges in the database. Can anyone help??
Thanks in advance!
Updated:
Table 1 description
-- Table: table1
-- DROP TABLE table1;
CREATE TABLE table1
(
species character varying(100),
trait character varying(50),
value double precision,
units character varying(50)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table1
OWNER TO postgres;
GRANT ALL ON TABLE table1 TO postgres;
GRANT SELECT ON TABLE table1 TO banco;
-- Index: speciestable1_idx
-- DROP INDEX speciestable1_idx;
CREATE INDEX speciestable1_idx
ON table1
USING btree
(species COLLATE pg_catalog."default");
-- Index: traittype_idx
-- DROP INDEX traittype_idx;
CREATE INDEX traittype_idx
ON table1
USING btree
(trait COLLATE pg_catalog."default");
and table2 as:
-- Table: table2
-- DROP TABLE table2;
CREATE TABLE table2
(
id integer NOT NULL,
family character varying(40),
species character varying(100),
plotarea real,
latitude double precision,
longitude double precision,
source integer,
latlon geometry,
CONSTRAINT table2_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table2
OWNER TO postgres;
GRANT ALL ON TABLE table2 TO postgres;
GRANT SELECT ON TABLE table2 TO banco;
-- Index: latlon_gist
-- DROP INDEX latlon_gist;
CREATE INDEX latlon_gist
ON table2
USING gist
(latlon);
-- Index: species_idx
-- DROP INDEX species_idx;
CREATE INDEX species_idx
ON table2
USING btree
(species COLLATE pg_catalog."default");
You're performing a join between two tables on the column species.
Not sure what's in your data, but if species is a column with significantly fewer values than the number of records (e.g. if species is "elephant", "giraffe" and you're analyzing all animals in Africa), this join will match every elephant with every elephant.
When joining two tables most of the time you try to use a unique or close to unique attribute, like id (not sure what id means in your case, but could be it).

Query on Postgres totaly holds down server

Now, I'am moving our database from Microsoft SQL Server to PostgreSQL 9.1.
There are a simple query, to calculate some summary of our store:
SELECT DISTINCT p.part_name_id,
(SELECT SUM(p1.quantity)
FROM parts.spareparts p1
WHERE p1.part_name_id = p.part_name_id) AS AllQuantity,
(SELECT SUM(p2.price * p2.quantity)
FROM parts.spareparts p2
WHERE p2.part_name_id = p.part_name_id) AS AllPrice
FROM parts.spareparts p
It working very fast on MSSQL, less than one second, there are about 150 000 records in spareparts table.
In PostgreSQL I waited for 200,000 milliseconds and not wait for the result.
Where I was wrong?
P.S.: table definitions:
-- Table: parts.spareparts
-- DROP TABLE parts.spareparts;
CREATE TABLE parts.spareparts
(
id serial NOT NULL,
big_id bigint NOT NULL,
part_unique integer NOT NULL,
store_address integer,
brand_id integer,
model_id integer,
category_id integer,
part_name_id integer,
price money,
quantity integer,
description character varying(250),
private_info character varying(600),
manager_id integer,
company_id integer,
part_type smallint,
box_number integer,
com_person character varying(200),
com_phone character varying(200),
vendor_id integer,
is_publish boolean DEFAULT true,
is_comission boolean DEFAULT false,
is_new boolean DEFAULT false,
is_warning boolean DEFAULT false,
catalog_no character varying(200),
disc_id integer,
is_set boolean,
w_height numeric(3,2),
w_width numeric(3,2),
w_diam numeric(3,2),
w_type integer,
page_url character varying(150),
last_edit_manager_id integer,
CONSTRAINT spareparts_pk PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE parts.spareparts
OWNER TO asap;
-- Index: parts.sparepart_part_unique_idx
-- DROP INDEX parts.sparepart_part_unique_idx;
CREATE INDEX sparepart_part_unique_idx
ON parts.spareparts
USING btree
(part_unique, company_id);
-- Index: parts.spareparts_4param_idx
-- DROP INDEX parts.spareparts_4param_idx;
CREATE INDEX spareparts_4param_idx
ON parts.spareparts
USING btree
(brand_id, model_id, category_id, part_name_id);
-- Index: parts.spareparts_bigid_idx
-- DROP INDEX parts.spareparts_bigid_idx;
CREATE INDEX spareparts_bigid_idx
ON parts.spareparts
USING btree
(big_id);
-- Index: parts.spareparts_brand_id_part_id_quantity_idx
-- DROP INDEX parts.spareparts_brand_id_part_id_quantity_idx;
CREATE INDEX spareparts_brand_id_part_id_quantity_idx
ON parts.spareparts
USING btree
(brand_id, part_name_id, quantity);
-- Index: parts.spareparts_brand_id_quantity_idx
-- DROP INDEX parts.spareparts_brand_id_quantity_idx;
CREATE INDEX spareparts_brand_id_quantity_idx
ON parts.spareparts
USING btree
(brand_id, quantity);
-- Index: parts.spareparts_company_id_part_unique_idx
-- DROP INDEX parts.spareparts_company_id_part_unique_idx;
CREATE INDEX spareparts_company_id_part_unique_idx
ON parts.spareparts
USING btree
(company_id, part_unique);
-- Index: parts.spareparts_model_id_company_id
-- DROP INDEX parts.spareparts_model_id_company_id;
CREATE INDEX spareparts_model_id_company_id
ON parts.spareparts
USING btree
(model_id, company_id);
COMMENT ON INDEX parts.spareparts_model_id_company_id
IS 'Для frmFilter';
-- Index: parts.spareparts_url_idx
-- DROP INDEX parts.spareparts_url_idx;
CREATE INDEX spareparts_url_idx
ON parts.spareparts
USING btree
(page_url COLLATE pg_catalog."default");
-- Trigger: spareparts_delete_trigger on parts.spareparts
-- DROP TRIGGER spareparts_delete_trigger ON parts.spareparts;
CREATE TRIGGER spareparts_delete_trigger
AFTER DELETE
ON parts.spareparts
FOR EACH ROW
EXECUTE PROCEDURE parts.spareparts_delete_fn();
-- Trigger: spareparts_update_trigger on parts.spareparts
-- DROP TRIGGER spareparts_update_trigger ON parts.spareparts;
CREATE TRIGGER spareparts_update_trigger
AFTER INSERT OR UPDATE
ON parts.spareparts
FOR EACH ROW
EXECUTE PROCEDURE parts.spareparts_update_fn();
I think you can rewrite the query without the need of the nested selects:
SELECT p.part_name_id,
SUM(p.quantity) AS AllQuantity,
SUM(p.price * p.quantity) AS AllPrice
FROM parts.spareparts p
group by p.part_name_id
I don't think you actually need the subqueries; you can write simply:
SELECT part_name_id,
SUM(quantity) AS AllQuantity,
SUM(price * quantity) AS AllPrice
FROM parts.spare_parts
GROUP
BY part_name_id
;
which should be much more efficient.

in T-SQL, is it possible to find names of columns containing NULL in a given row (without knowing all column names)?

Is it possible in T-SQL to write a proper query reflecting this pseudo-code:
SELECT {primary_key}, {column_name}
FROM {table}
WHERE {any column_name value} is NULL
i.e. without referencing each column-name explicitly.
Sounds simple enough but I've searched pretty extensively and found nothing.
You have to use dynamic sql to solve that problem. I have demonstrated how it could be done.
With this sql you can pick a table and check the row with id = 1 for columns being null and primary keys. I included a test table at the bottom of the script. Code will not display anything if there is not primary keys and no columns being null.
DECLARE #table_name VARCHAR(20)
DECLARE #chosencolumn VARCHAR(20)
DECLARE #sqlstring VARCHAR(MAX)
DECLARE #sqlstring2 varchar(100)
DECLARE #text VARCHAR(8000)
DECLARE #t TABLE (col1 VARCHAR(30), dummy INT)
SET #table_name = 'test_table' -- replace with your tablename if you want
SET #chosencolumn = 'ID=1' -- replace with criteria for selected row
SELECT #sqlstring = COALESCE(#sqlstring, '') + 'UNION ALL SELECT '',''''NULL '''' '' + '''+t1.column_name+''', 1000 ordinal_position FROM ['+#table_name+'] WHERE [' +t1.column_name+ '] is null and ' +#chosencolumn+ ' '
FROM INFORMATION_SCHEMA.COLUMNS t1
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE t2
ON t1.column_name = t2.column_name
AND t1.table_name = t2.table_name
AND t1.table_schema = t2.table_schema
WHERE t1.table_name = #table_name
AND t2.column_name is null
SET #sqlstring = stuff('UNION ALL SELECT '',''''PRIMARY KEY'''' ''+ column_name + '' '' col1, ordinal_position
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE table_name = ''' + #table_name+ '''' + #sqlstring, 1, 10, '') + 'order by 2'
INSERT #t
EXEC( #sqlstring)
SELECT #text = COALESCE(#text, '') + col1
FROM #t
SET #sqlstring2 ='select '+stuff(#text,1,1,'')
EXEC( #sqlstring2)
Result:
id host_id date col1
PRIMARY KEY PRIMARY KEY PRIMARY KEY NULL
Test table
CREATE TABLE [dbo].[test_table](
[id] int not null,
[host_id] [int] NOT NULL,
[date] [datetime] NOT NULL,
[col1] [varchar](20) NULL,
[col2] [varchar](20) NULL,
CONSTRAINT [PK_test_table] PRIMARY KEY CLUSTERED
(
[id] ASC,
[host_id] ASC,
[date] ASC
))
Test data
INSERT test_table VALUES (1, 1, getdate(), null, 'somevalue')