Advice on createing index for combination of int. range + int |Postgres - postgresql

I have a question on the index for the following table.
create table ascertain_telephonenumbersmodel
(
id serial not null
constraint ascertain_telephonenumbersmodel_pkey
primary key,
abc_or_def smallint not null
constraint ascertain_telephonenumbersmodel_abc_or_def_check
check (abc_or_def >= 0),
numbers_range int4range not null,
volume smallint not null
constraint ascertain_telephonenumbersmodel_volume_check
check (volume >= 0),
operator varchar(50) not null,
region varchar(100) not null,
update_date timestamp with time zone not null,
);
The only one type of query this table is dealing with is
select
*
from
ascertain_telephonenumbersmodel
where
abc_or_def=`some integer` and numbers_range #> `some integer`
# example abc_or_def=900 and numbers_range #> 2685856
Question is – what is the best way of creating index for this condition?
DB – PostgreSQL 13
Number of rows ~ 400.000
Current execution time ~ 80-110 msec.
Thank you!

A compound gist index. You will need to use an extension to include the int in the index.
create extension btree_gist;
create index on ascertain_telephonenumbersmodel using gist (abc_or_def, numbers_range);

Related

Postgres SQL Table Partitioning by Range Timestamp not Unique key Collision

I have an issue when trying to modify and existing PostgreSQL (version 13.3) table to support partitioning it gets stuck when inserting the new data from the old table because the inserted timestamp in some cases may not be unique, so it fails on execution.
The partition forces me to create the primary to be the range (timestamp) value. You can see the new table definition below:
CREATE TABLE "UserFavorites_master" (
"Id" int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
"UserId" int4 NOT NULL,
"CardId" int4 NOT NULL,
"CreationDate" timestamp NOT NULL,
CONSTRAINT "PK_UserFavorites_CreationDate" PRIMARY KEY ("CreationDate")
) partition by range ("CreationDate");
The original table didn't have a constraint on timestamp to either be unique or a primary key nor would we particularly want that but that seems to be a requirement of partitioning. Looking for alternatives or good ideas to solve the issue.
You can see the full code below:
alter table "UserFavorites" rename to "UserFavorites_old";
CREATE TABLE "UserFavorites_master" (
"Id" int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
"UserId" int4 NOT NULL,
"CardId" int4 NOT NULL,
"CreationDate" timestamp NOT NULL,
CONSTRAINT "PK_UserFavorites_CreationDate" PRIMARY KEY ("CreationDate")
) partition by range ("CreationDate");
-- Frome Reference: https://stackoverflow.com/a/53600145/1190540
create or replace function createPartitionIfNotExists(forDate timestamp) returns void
as $body$
declare yearStart date := date_trunc('year', forDate);
declare yearEndExclusive date := yearStart + interval '1 year';
declare tableName text := 'UserFavorites_Partition_' || to_char(forDate, 'YYYY');
begin
if to_regclass(tableName) is null then
execute format('create table %I partition of "UserFavorites_master" for values from (%L) to (%L)', tableName, yearStart, yearEndExclusive);
-- Unfortunatelly Postgres forces us to define index for each table individually:
--execute format('create unique index on %I (%I)', tableName, 'UserId'::text);
end if;
end;
$body$ language plpgsql;
do
$$
declare rec record;
begin
loop
for rec in 2015..2030 loop
-- ... and create a partition for them
perform createPartitionIfNotExists(to_date(rec::varchar,'yyyy'));
end loop;
end
$$;
create or replace view "UserFavorites" as select * from "UserFavorites_master";
insert into "UserFavorites" ("Id", "UserId", "CardId", "CreationDate") select * from "UserFavorites_old";
It fails on the Last line with the following error:
SQL Error [23505]: ERROR: duplicate key value violates unique constraint "UserFavorites_Partition_2020_pkey"
Detail: Key ("CreationDate")=(2020-11-02 09:38:54.997) already exists.
ERROR: duplicate key value violates unique constraint "UserFavorites_Partition_2020_pkey"
Detail: Key ("CreationDate")=(2020-11-02 09:38:54.997) already exists.
ERROR: duplicate key value violates unique constraint "UserFavorites_Partition_2020_pkey"
Detail: Key ("CreationDate")=(2020-11-02 09:38:54.997) already exists.
No, partitioning doesn't force you to create a primary key. Just omit that line, and your example should work.
However, you definitely always should have a primary key on your tables. Otherwise, you can end up with identical rows, which is a major headache in a relational database. You might have to clean up your data.
#Laurenz Albe is correct, it seems I also have the ability to specify multiple keys though it may affect performance as referenced here Multiple Keys Performance, even indexing the creation date of the partition seemed to make the performance worse.
You can see a reference to multiple keys below, you mileage may vary.
CREATE TABLE "UserFavorites_master" (
"Id" int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
"UserId" int4 NOT NULL,
"CardId" int4 NOT NULL,
"CreationDate" timestamp NOT NULL,
CONSTRAINT "PK_UserFavorites" PRIMARY KEY ("Id", "CreationDate")
) partition by range ("CreationDate");

PostgreSQL: some troubles to insert from select with on conflict

I have some Postgres tables:
CREATE TABLE source_redshift.staticprompts (
id INT,
projectid BIGINT,
scriptid INT,
promptnum INT,
prompttype VARCHAR(20),
inputs VARCHAR(2000),
attributes VARCHAR(2000),
text VARCHAR(2000),
corpuscode VARCHAR(2000),
comment VARCHAR(2000),
created TIMESTAMP,
modified TIMESTAMP
);
and
CREATE TABLE target_redshift.user_input_conf (
collect_project_id BIGINT NOT NULL UNIQUE,
prompt_type VARCHAR(20),
prompt_input_desc VARCHAR(300),
prompt_input_name VARCHAR(100),
no_of_prompt_count BIGINT,
prompt_input_value VARCHAR(100) UNIQUE,
prompt_input_value_id BIGSERIAL PRIMARY KEY,
script_id BIGINT,
corpuscode VARCHAR(20),
min_recordings VARCHAR(2000),
max_recordings VARCHAR(2000),
recordings_count VARCHAR(2000),
lease_duration VARCHAR(2000),
date_created TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
date_updated TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT must_be_different UNIQUE (prompt_input_value,collect_project_id)
);
I need copy data from staticprompts to user_input_conf with this rules:
Primary Key : prompt_input_value_id
Unique Values : collect_project_id, prompt_input_value
Data Load Logic :
Insert only when new prompt input value is found for given collect project from source. Inputs column stores the values in JSON format in staticprompts table.
Insert :
Generate unique sequence number for each of the new prompt input value for a collect project id from source and store in prompt_input_value_id.
Update :
If prompt value already exists for a collect project and if there are any value changes on prompt_input_desc or prompt input name or prompt input value then update only those columns.
prompt_input_value_id - Generate unique sequence number for the combination of each prompt_input_value and collect_project_id
prompt_input_value - Inputs.value is stored in the inputs column as JSON text. Create a unique record for each inputs.value. Look at the example below this table.
I try to use this query:
INSERT INTO target_redshift.user_input_conf AS t (
collect_project_id,
prompt_type,
prompt_input_desc,
prompt_input_name,
prompt_input_value,
script_id,
corpuscode)
SELECT
s.projectid,
s.prompttype,
s.inputs::jsonb#>>'{inputs,0,desc}' AS desc,
s.inputs::jsonb#>>'{inputs,0,name}' AS name,
s.inputs::jsonb#>>'{inputs,0,values}' AS values,
s.scriptid,
s.corpuscode
FROM source_redshift.staticprompts AS s
ON CONFLICT (collect_project_id, prompt_input_value)
DO UPDATE SET
(prompt_input_desc, prompt_input_name, prompt_input_value, date_updated) =
(EXCLUDED.prompt_input_desc, EXCLUDED.prompt_input_name, EXCLUDED.prompt_input_value, NOW())
WHERE t.prompt_input_desc != EXCLUDED.prompt_input_desc
OR t.prompt_input_name != EXCLUDED.prompt_input_name
OR t.prompt_input_value != EXCLUDED.prompt_input_value;
""")
But I get an error:
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "user_input_conf_collect_project_id_key"
DETAIL: Key (collect_project_id)=(1) already exists.
I think there is a misunderstanding. A unique constraint over two columns does not mean that each of the columns is unique, but that the combination of the two columns is unique.
So your must_be_different is different (and weaker) than the unique constraints on prompt_input_value and collect_project_id. For example, if you have the three rows
collect_project_id | prompt_input_value
--------------------+--------------------
1 | a
1 | b
2 | b
they will create a conflict with both single-column unique constraints, but nor with must_be_different.
I guess that the underlying problem is that you want to use INSERT ... ON CONFLICT with multiple unique constraints. That cannot be done; see this question for a discussion and potential solutions.

PostgreSQL query does not use index

Table definition is as follows:
CREATE TABLE public.the_table
(
id integer NOT NULL DEFAULT nextval('the_table_id_seq'::regclass),
report_timestamp timestamp without time zone NOT NULL,
value_id integer NOT NULL,
text_value character varying(255),
numeric_value double precision,
bool_value boolean,
dt_value timestamp with time zone,
exported boolean NOT NULL DEFAULT false,
CONSTRAINT the_table_fkey_valdef FOREIGN KEY (value_id)
REFERENCES public.value_defs (value_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.the_table
OWNER TO postgres;
Indices:
CREATE INDEX the_table_idx_id ON public.the_table USING brin (id);
CREATE INDEX the_table_idx_timestamp ON public.the_table USING btree (report_timestamp);
CREATE INDEX the_table_idx_tsvid ON public.the_table USING brin (report_timestamp, value_id);
CREATE INDEX the_table_idx_valueid ON public.the_table USING btree (value_id);
The query is:
SELECT * FROM the_table r WHERE r.value_id = 1064 ORDER BY r.report_timestamp desc LIMIT 1;
While running the query PostgreSQL does not use the_table_idx_valueid index.
Why?
If anything, this index will help:
CREATE INDEX ON the_table (value_id, report_timestamp);
Depending on the selectivity of the condition and the number of rows in the table, PostgreSQL may correctly deduce that a sequential scan and a sort is faster than an index scan.

How to select last value insert in column ( like function LAST() for OracleDB)

I'm actually sutend and I'm setting up DB PostgreSQL for my AirsoftShop and some request on it. I need to find similar function as SELECT LAST(xx) FROM yy usable on SQL server and OracleDB i think. For return the last insert values in the column target by LAST().
I have this table :
CREATE TABLE munition.suivi_ammo (
type_ammo integer NOT NULL,
calibre integer NOT NULL,
event integer NOT NULL,
date_event date NOT NULL,
entrance integer NOT NULL,
exit integer NOT NULL,
inventory integer NOT NULL,
FOREIGN KEY (calibre) REFERENCES munition.index(numero),
FOREIGN KEY (event) REFERENCES munition.index(numero),
FOREIGN KEY (type_ammo) REFERENCES munition.index(numero)
);
and index for definition by number id :
CREATE TABLE munition.index (
numero integer NOT NULL,
definition text NOT NULL,
PRIMARY KEY (numero)
);
I want to select the last inventory insert in the table and calculate the current inventory according to the inflow and outflow made after my inventory
It's works when i do this type of request with specific date to be sure to only have the last one inventory, but I do not want to have to do it
SELECT index.definition,
Sum(suivi_ammo.inventory) + Sum(suivi_ammo.entrance) - Sum(suivi_ammo.exit) AS Stock
FROM munition.suivi_ammo
INNER JOIN munition.index ON suivi_ammo.type_ammo = index.numero
WHERE date_event < '03/05/2019' AND date_event >= '2019-04-10'
GROUP BY index.definition;
I also tried to used last_value() window function but doesn't work.
Thx !

Average MySQL in new table

I have a database about weather that updates every second.
It contains temperature and wind speed.
This is my database:
CREATE TABLE `new_table`.`test` (
`id` INT(10) NOT NULL,
`date` DATETIME() NOT NULL,
`temperature` VARCHAR(25) NOT NULL,
`wind_speed` INT(10) NOT NULL,
`humidity` FLOAT NOT NULL,
PRIMARY KEY (`id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_bin;
I need to find the average temperature every hour.
This is my code:
Select SELECT AVG( temperature ), date
FROM new_table
GROUP BY HOUR ( date )
My coding is working but the problem is that I want to move the value and date of the average to another table.
This is the table:
CREATE TABLE `new_table.`table1` (
`idsea_state` INT(10) NOT NULL,
`dateavg` DATETIME() NOT NULL,
`avg_temperature` VARCHAR(25) NOT NULL,
PRIMARY KEY (`idsea_state`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_bin;
Is it possible? Can you give me the coding?
In order to insert new rows into a database based on data you have obtained from another table, you can do this by setting up an INSERT query targeting the destination table, then run a sub-query which will pull the data from the source table and then the result set returned from the sub-query will be used to provide the VALUES used for the INSERT command
Here is the basic structure, note that the VALUES keyword is not used:
INSERT INTO `table1`
(`dateavg`, `avg_temperature`)
SELECT `date` , avg(`temperature`)
FROM `test`;
Its also important to note that the position of the columns returned by result set will be sequentially matched to its respective position in the INSERT fields of the outer query
e.g. if you had a query
INSERT INTO table1 (`foo`, `bar`, `baz`)
SELECT (`a`, `y`, `g`) FROM table2
a would be inserted into foo
y would go into bar
g would go into baz
due to their respective positions
I have made a working demo - http://www.sqlfiddle.com/#!9/ff740/4
I made the below changes to simplify the example and just demonstrate the concept involved.
Here is the DDL changes I made to your original code
CREATE TABLE `test` (
`id` INT(10) NOT NULL AUTO_INCREMENT,
`date` DATETIME NOT NULL,
`temperature` FLOAT NOT NULL,
`wind_speed` INT(10),
`humidity` FLOAT ,
PRIMARY KEY (`id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_bin;
CREATE TABLE `table1` (
`idsea_state` INT(10) NOT NULL AUTO_INCREMENT,
`dateavg` VARCHAR(55),
`avg_temperature` VARCHAR(25),
PRIMARY KEY (`idsea_state`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_bin;
INSERT INTO `test`
(`date`, `temperature`) VALUES
('2013-05-03', 7.5),
('2013-06-12', 17.5),
('2013-10-12', 37.5);
INSERT INTO `table1`
(`dateavg`, `avg_temperature`)
SELECT `date` , avg(`temperature`)
FROM `test`;