Postgresql multicolumn index btree as constraint - postgresql

I want to use a Postgresql btree index as a constraint but once I run the same input twice there are duplicates on the final database. The data base itself is placed on AWS and hence I a using the Python AWS SDK which works properly. Please note that I need to use a multi-column index.
I am wondering why ..
I am creating the following table
CREATE TABLE file (
"userid" serial Primary key,
"date" integer,
"place" VARCHAR
"title" VARCHAR,
"avg" INTEGER,
"recommendation" VARCHAR,
"type" VARCHAR,
"comment" VARCHAR,
"insert_date" VARCHAR
);
For the index I am using the
create unique index file_idx on file using btree(
date,
place,
title,
avg,
recommendation,
type
);
ALTER TABLE file add constraint test unique using file_idx;
As an input file I am processing the input file which has 3 test entries
file = "xx.json"
insert_file = pd.read_json(file)
val = insert_file.loc[i, :].values.tolist()
cols = insert_file.column
query = f""" INSERT INTO file {cols} VALUES {val} on conflict on constraint test do nothing""".format(val=val, cols = cols)
# sending the query to AWS
aws.create_query(query)
When I run the file xx.json twice I have got 6 entries instead of only 3 ..
Thanks!

Related

Postgres SQL Table Partitioning by Range Timestamp not Unique key Collision

I have an issue when trying to modify and existing PostgreSQL (version 13.3) table to support partitioning it gets stuck when inserting the new data from the old table because the inserted timestamp in some cases may not be unique, so it fails on execution.
The partition forces me to create the primary to be the range (timestamp) value. You can see the new table definition below:
CREATE TABLE "UserFavorites_master" (
"Id" int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
"UserId" int4 NOT NULL,
"CardId" int4 NOT NULL,
"CreationDate" timestamp NOT NULL,
CONSTRAINT "PK_UserFavorites_CreationDate" PRIMARY KEY ("CreationDate")
) partition by range ("CreationDate");
The original table didn't have a constraint on timestamp to either be unique or a primary key nor would we particularly want that but that seems to be a requirement of partitioning. Looking for alternatives or good ideas to solve the issue.
You can see the full code below:
alter table "UserFavorites" rename to "UserFavorites_old";
CREATE TABLE "UserFavorites_master" (
"Id" int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
"UserId" int4 NOT NULL,
"CardId" int4 NOT NULL,
"CreationDate" timestamp NOT NULL,
CONSTRAINT "PK_UserFavorites_CreationDate" PRIMARY KEY ("CreationDate")
) partition by range ("CreationDate");
-- Frome Reference: https://stackoverflow.com/a/53600145/1190540
create or replace function createPartitionIfNotExists(forDate timestamp) returns void
as $body$
declare yearStart date := date_trunc('year', forDate);
declare yearEndExclusive date := yearStart + interval '1 year';
declare tableName text := 'UserFavorites_Partition_' || to_char(forDate, 'YYYY');
begin
if to_regclass(tableName) is null then
execute format('create table %I partition of "UserFavorites_master" for values from (%L) to (%L)', tableName, yearStart, yearEndExclusive);
-- Unfortunatelly Postgres forces us to define index for each table individually:
--execute format('create unique index on %I (%I)', tableName, 'UserId'::text);
end if;
end;
$body$ language plpgsql;
do
$$
declare rec record;
begin
loop
for rec in 2015..2030 loop
-- ... and create a partition for them
perform createPartitionIfNotExists(to_date(rec::varchar,'yyyy'));
end loop;
end
$$;
create or replace view "UserFavorites" as select * from "UserFavorites_master";
insert into "UserFavorites" ("Id", "UserId", "CardId", "CreationDate") select * from "UserFavorites_old";
It fails on the Last line with the following error:
SQL Error [23505]: ERROR: duplicate key value violates unique constraint "UserFavorites_Partition_2020_pkey"
Detail: Key ("CreationDate")=(2020-11-02 09:38:54.997) already exists.
ERROR: duplicate key value violates unique constraint "UserFavorites_Partition_2020_pkey"
Detail: Key ("CreationDate")=(2020-11-02 09:38:54.997) already exists.
ERROR: duplicate key value violates unique constraint "UserFavorites_Partition_2020_pkey"
Detail: Key ("CreationDate")=(2020-11-02 09:38:54.997) already exists.
No, partitioning doesn't force you to create a primary key. Just omit that line, and your example should work.
However, you definitely always should have a primary key on your tables. Otherwise, you can end up with identical rows, which is a major headache in a relational database. You might have to clean up your data.
#Laurenz Albe is correct, it seems I also have the ability to specify multiple keys though it may affect performance as referenced here Multiple Keys Performance, even indexing the creation date of the partition seemed to make the performance worse.
You can see a reference to multiple keys below, you mileage may vary.
CREATE TABLE "UserFavorites_master" (
"Id" int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
"UserId" int4 NOT NULL,
"CardId" int4 NOT NULL,
"CreationDate" timestamp NOT NULL,
CONSTRAINT "PK_UserFavorites" PRIMARY KEY ("Id", "CreationDate")
) partition by range ("CreationDate");

PostgreSQL: some troubles to insert from select with on conflict

I have some Postgres tables:
CREATE TABLE source_redshift.staticprompts (
id INT,
projectid BIGINT,
scriptid INT,
promptnum INT,
prompttype VARCHAR(20),
inputs VARCHAR(2000),
attributes VARCHAR(2000),
text VARCHAR(2000),
corpuscode VARCHAR(2000),
comment VARCHAR(2000),
created TIMESTAMP,
modified TIMESTAMP
);
and
CREATE TABLE target_redshift.user_input_conf (
collect_project_id BIGINT NOT NULL UNIQUE,
prompt_type VARCHAR(20),
prompt_input_desc VARCHAR(300),
prompt_input_name VARCHAR(100),
no_of_prompt_count BIGINT,
prompt_input_value VARCHAR(100) UNIQUE,
prompt_input_value_id BIGSERIAL PRIMARY KEY,
script_id BIGINT,
corpuscode VARCHAR(20),
min_recordings VARCHAR(2000),
max_recordings VARCHAR(2000),
recordings_count VARCHAR(2000),
lease_duration VARCHAR(2000),
date_created TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
date_updated TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT must_be_different UNIQUE (prompt_input_value,collect_project_id)
);
I need copy data from staticprompts to user_input_conf with this rules:
Primary Key : prompt_input_value_id
Unique Values : collect_project_id, prompt_input_value
Data Load Logic :
Insert only when new prompt input value is found for given collect project from source. Inputs column stores the values in JSON format in staticprompts table.
Insert :
Generate unique sequence number for each of the new prompt input value for a collect project id from source and store in prompt_input_value_id.
Update :
If prompt value already exists for a collect project and if there are any value changes on prompt_input_desc or prompt input name or prompt input value then update only those columns.
prompt_input_value_id - Generate unique sequence number for the combination of each prompt_input_value and collect_project_id
prompt_input_value - Inputs.value is stored in the inputs column as JSON text. Create a unique record for each inputs.value. Look at the example below this table.
I try to use this query:
INSERT INTO target_redshift.user_input_conf AS t (
collect_project_id,
prompt_type,
prompt_input_desc,
prompt_input_name,
prompt_input_value,
script_id,
corpuscode)
SELECT
s.projectid,
s.prompttype,
s.inputs::jsonb#>>'{inputs,0,desc}' AS desc,
s.inputs::jsonb#>>'{inputs,0,name}' AS name,
s.inputs::jsonb#>>'{inputs,0,values}' AS values,
s.scriptid,
s.corpuscode
FROM source_redshift.staticprompts AS s
ON CONFLICT (collect_project_id, prompt_input_value)
DO UPDATE SET
(prompt_input_desc, prompt_input_name, prompt_input_value, date_updated) =
(EXCLUDED.prompt_input_desc, EXCLUDED.prompt_input_name, EXCLUDED.prompt_input_value, NOW())
WHERE t.prompt_input_desc != EXCLUDED.prompt_input_desc
OR t.prompt_input_name != EXCLUDED.prompt_input_name
OR t.prompt_input_value != EXCLUDED.prompt_input_value;
""")
But I get an error:
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "user_input_conf_collect_project_id_key"
DETAIL: Key (collect_project_id)=(1) already exists.
I think there is a misunderstanding. A unique constraint over two columns does not mean that each of the columns is unique, but that the combination of the two columns is unique.
So your must_be_different is different (and weaker) than the unique constraints on prompt_input_value and collect_project_id. For example, if you have the three rows
collect_project_id | prompt_input_value
--------------------+--------------------
1 | a
1 | b
2 | b
they will create a conflict with both single-column unique constraints, but nor with must_be_different.
I guess that the underlying problem is that you want to use INSERT ... ON CONFLICT with multiple unique constraints. That cannot be done; see this question for a discussion and potential solutions.

How to get primary key sequence in Postgres?

I have following table:
-- DDL generated by Postico 1.5.10
-- Not all database features are supported. Do not use for backup.
-- Table Definition ----------------------------------------------
CREATE TABLE "Ticket" (
id bigint PRIMARY KEY,
"paymentId" text NOT NULL,
"transactionId" text,
"dateCreated" timestamp without time zone NOT NULL,
"dateValidated" timestamp without time zone,
"sellerPaymentId" text NOT NULL,
"sellerPaymentProvider" text NOT NULL,
"userId" bigint NOT NULL,
"userIdFb" text NOT NULL,
"userName" text NOT NULL,
"eventDescription" text NOT NULL,
"eventTimeId" text,
"eventId" text NOT NULL,
"eventName" text NOT NULL,
"startTime" timestamp without time zone,
"endTime" timestamp without time zone,
quantity bigint,
"unitPrice" text,
seats jsonb[],
location text NOT NULL,
link text,
"eventTimesSelected" jsonb,
"otherListsSelected" jsonb,
"transactionIdBarion1" text,
"transactionIdBarion2" text
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX "pk:Ticket.id" ON "Ticket"(id int8_ops);
When inserting a new row, got this error:
[ ERROR ] PostgreSQLError.server.error._bt_check_unique: POST /startPayment duplicate key value violates unique constraint "pk:Ticket.id" (ErrorMiddleware.swift:26)
[ DEBUG ] Possible causes for PostgreSQLError.server.error._bt_check_unique: Key (id)=(1) already exists. (ErrorMiddleware.swift:26)
How the heck I can reset the primary key sequence? There are many answers on internet, but what is the name of my sequence? :) I do not see any 'name' in my DDL.
I tried fetch sequence name like this:
select currval(pg_get_serial_sequence("Ticket", "id"));
no luck:
ERROR: column "Ticket" does not exist
LINE 1: select currval(pg_get_serial_sequence("Ticket", "id"));
pg_get_serial_sequence() expects string values, not identifiers. Your problems stem from the fact that you used those dreaded quoted identifiers when creating the table which is strongly discouraged.
You need to pass the double quotes inside single quotes:
select currval(pg_get_serial_sequence('"Ticket"', '"id"'));
You should reconsider the usage of quoted identifiers to avoid problems like that in the future.
How the heck I can reset the primary key sequence
How to reset postgres' primary key sequence when it falls out of sync?
How to bulk update sequence ID postgreSQL for all tables

PostgreSQL query does not use index

Table definition is as follows:
CREATE TABLE public.the_table
(
id integer NOT NULL DEFAULT nextval('the_table_id_seq'::regclass),
report_timestamp timestamp without time zone NOT NULL,
value_id integer NOT NULL,
text_value character varying(255),
numeric_value double precision,
bool_value boolean,
dt_value timestamp with time zone,
exported boolean NOT NULL DEFAULT false,
CONSTRAINT the_table_fkey_valdef FOREIGN KEY (value_id)
REFERENCES public.value_defs (value_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.the_table
OWNER TO postgres;
Indices:
CREATE INDEX the_table_idx_id ON public.the_table USING brin (id);
CREATE INDEX the_table_idx_timestamp ON public.the_table USING btree (report_timestamp);
CREATE INDEX the_table_idx_tsvid ON public.the_table USING brin (report_timestamp, value_id);
CREATE INDEX the_table_idx_valueid ON public.the_table USING btree (value_id);
The query is:
SELECT * FROM the_table r WHERE r.value_id = 1064 ORDER BY r.report_timestamp desc LIMIT 1;
While running the query PostgreSQL does not use the_table_idx_valueid index.
Why?
If anything, this index will help:
CREATE INDEX ON the_table (value_id, report_timestamp);
Depending on the selectivity of the condition and the number of rows in the table, PostgreSQL may correctly deduce that a sequential scan and a sort is faster than an index scan.

HSQL Triggers : user lacks privilege or object not found: NEWROW.ID

I am trying to implement triggers in hsql after update
where I have one table called component table and on update in that table I want to log it in another table using after insert trigger, for which I am doing
CREATE TABLE IF NOT EXISTS "component"(
"id" INTEGER IDENTITY,
"name" VARCHAR(100),
"configuration" LONGVARCHAR,
"owner_id" INTEGER );
CREATE TABLE IF NOT EXISTS "component_audit"(
"id" INTEGER IDENTITY,
"component_id" INTEGER ,
"action" VARCHAR(20),
"activity_time" BIGINT,
"user_id" INTEGER,
FOREIGN KEY ("component_id") REFERENCES "component"("id") ON UPDATE RESTRICT ON DELETE CASCADE
);
CREATE TRIGGER trig AFTER INSERT ON "component"
REFERENCING NEW ROW AS newrow
FOR EACH ROW
INSERT INTO "component_audit" ("id","component_id","action","activity_time","user_id")
VALUES (DEFAULT, 1, newrow.id, 123, 1);
On running HSQL throws error
Caused by: org.hsqldb.HsqlException: user lacks privilege or object
not found: NEWROW.ID
Its due to my id column being in "id" because I needed it in small caps (by DEFAULT HSQLDB is upper case)
how do I pass my variable subsitution ?
Just use the same naming as in your CREATE TABLE statement.
CREATE TRIGGER trig AFTER INSERT ON "component"
REFERENCING NEW ROW AS newrow
FOR EACH ROW
INSERT INTO "component_audit" ("id","component_id","action","activity_time","user_id")
VALUES (DEFAULT, 1, newrow."id", 123, 1);