Postgres lowercase column and delete duplicates

Postgres lowercase column and delete duplicates - postgresql

I have the following table:
Customers
---------
name text
object_id integer
created_time timestamp with time zone
Indexes:
"my_index" UNIQUE CONSTRAINT, btree (name, object_id, created_time)
The unique index works fine but then I ended up with duplicate data like:
Name | object_id | created_time
------------------------------------
john | 1 | 2018-02-28 15:42:14.30573+00
JOHN | 1 | 2018-02-28 15:42:14.30573+00
So I tried to lowercase all my data in the name column with:
UPDATE customers SET name=lower(name) WHERE name != LOWER(name);
But this procedure generated an error because now I would be violating the index:
ERROR: duplicate key value violates unique constraint "my_index"
DETAIL: Key (name, object_id, created_time)=(john, 1, 2018-02-28 15:42:14.30573+00) already exists.
What kind of procedure could I use to delete rows that after casting to lowercase generate an index violation ?

If you have 'JOHN' and 'John' in you table but not 'john' it gets messy. here's one solution.
insert into customers
select distinct lower("name") ,object_id,created_time from customers
where name <> lower(name)
and not (lower("name") ,object_id,created_time)
in (select * from customers);
delete from customers where name <> lower(name);
after that consider:
alter table customers alter column name type citext;

Related

Insert into two referencing tables by selecting from a single table

I have 2 permanent tables in my PostgreSQL 12 database with a one-to-many relationship (thing, and thing_identifier). The second -- thing_identifier -- has a column referencing thing, such that thing_identifier can hold multiple, external identifiers for a given thing:
CREATE TABLE IF NOT EXISTS thing
(
thing_id SERIAL PRIMARY KEY,
thing_name TEXT, --this is not necessarily unique
thing_attribute TEXT --also not unique
);
CREATE TABLE IF NOT EXISTS thing_identifier
(
id SERIAL PRIMARY KEY,
thing_id integer references thing (thing_id),
identifier text
);
I need to insert some new data into thing and thing_identifier, both of which come from a table I created by using COPY to pull the contents of a large CSV file into the database, something like:
CREATE TABLE IF NOT EXISTS things_to_add
(
id SERIAL PRIMARY KEY,
guid TEXT, --a unique identifier used by the supplier
thing_name TEXT, --not unique
thing_attribute TEXT --also not unique
);
Sample data:
INSERT INTO things_to_add (guid, thing_name) VALUES
('[111-22-ABC]','Thing-a-ma-jig','pretty thing'),
('[999-88-XYZ]','Herk-a-ma-fob','blue thing');
The goal is to have each row in things_to_add result in one new row, each, in thing and thing_identifier, as in the following:
thing:
| thing_id | thing_name | thing attribute |
|----------|---------------------|-------------------|
| 1 | thing-a-ma-jig | pretty thing
| 2 | herk-a-ma-fob | blue thing
thing_identifier:
| id | thing_id | identifier |
|----|----------|------------------|
| 8 | 1 | '[111-22-ABC]' |
| 9 | 2 | '[999-88-XYZ]' |
I could use a CTE INSERTstatement (with RETURNING thing_id) to get the thing_id that results from the INSERT on thing, but I can't figure out how to get both that thing_id from the INSERT on thing and the original guid from things_to_add, which needs to go into thing_identifier.identifier.
Just to be clear, the only guaranteed unique column in thing is thing_id, and the only guaranteed unique column in things_to_add is id (which we don't want to store) and guid (which is what we want in thing_identifier.identifier), so there isn't any way to join thing and things_to_add after the INSERT on thing.

You can retrieve the thing_to_add.guid from a JOIN :
WITH list AS
(
INSERT INTO thing (thing_name)
SELECT thing_name
FROM things_to_add
RETURNING thing_id, thing_name
)
INSERT INTO thing_identifier (thing_id, identifier)
SELECT l.thing_id, t.guid
FROM list AS l
INNER JOIN thing_to_add AS t
ON l.thing_name = t.thing_name
Then, if thing.thing_name is not unique, the problem is more tricky. Updating both tables thing and thing_identifier from the same trigger on thing_to_add may solve the issue :
CREATE OR REPLACE FUNCTION after_insert_thing_to_add ()
RETURNS TRIGGER LANGUAGE sql AS
$$
WITH list AS
(
INSERT INTO thing (thing_name)
SELECT NEW.thing_name
RETURNING thing_id
)
INSERT INTO thing_identifier (thing_id, identifier)
SELECT l.thing_id, NEW.guid
FROM list AS l ;
$$
DROP TRIGGER IF EXISTS after_insert ON thing_to_add ;
CREATE TRIGGER after_insert
AFTER INSERT
ON thing_to_add
FOR EACH ROW
EXECUTE PROCEDURE after_insert_thing_to_add ();

Use INSERT ... ON CONFLICT DO NOTHING RETURNING failed rows

Suppose I have the following table:
CREATE TABLE tags (
id int PK,
name varchar(255),
CONSTRAINT name_unique UNIQUE(name)
)
I need a query that will insert tags that do not exists and return ids for all requested tags. Consider the following:
INSERT INTO tags (name) values ('tag10'), ('tag6'), ('tag11') ON CONFLICT DO NOTHING returning id, name
The output of this query is:
+---------------+
| id | name |
|---------------|
| 208 | tag10 |
|---------------|
| 209 | tag11 |
+---------------+
What I need is to have tag6 in the output.

A bit verbose, but I can't think of anything else:
with all_tags (name) as (
values ('tag10'), ('tag6'), ('tag11')
), inserted (id, name) as (
INSERT INTO tags (name)
select name
from all_tags
ON CONFLICT DO NOTHING
returning id, name
)
select t.id, t.name, 'already there'
from tags t
join all_tags at on at.name = t.name
union all
select id, name, 'inserted'
from inserted;
The outer select from tags sees the snapshot of the table as it was before the new tags were inserted. The third column with the constant is only there to test the query so that one can identify which rows were inserted and which not.

With this table:
CREATE TABLE tags (
id serial PRIMARY KEY,
name text UNIQUE
);
As long as the values inside the query is unique a workaround for this is:
INSERT INTO tags (name)
VALUES ('tag10'), ('tag6'), ('tag11')
ON CONFLICT DO UPDATE name = EXCLUDED.name RETURNING id, name;

How to insert values from another table in PostgreSQL?

I have a table which references other tables:
CREATE TABLE scratch
(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
rep_id INT NOT NULL REFERENCES reps,
term_id INT REFERENCES terms
);
CREATE TABLE reps (
id SERIAL PRIMARY KEY,
rep TEXT NOT NULL UNIQUE
);
CREATE TABLE terms (
id SERIAL PRIMARY KEY,
terms TEXT NOT NULL UNIQUE
);
I wish to add a new record to scratch given the name, the rep and the terms values, i.e. I have neither corresponding rep_id nor term_id.
Right now the only idea that I have is:
insert into scratch (name, rep_id, term_id)
values ('aaa', (select id from reps where rep='Dracula' limit 1), (select id from terms where terms='prepaid' limit 1));
My problem is this. I am trying to use the parameterized query API (from node using the node-postgres package), where an insert query looks like this:
insert into scratch (name, rep_id, term_id) values ($1, $2, $3);
and then an array of values for $1, $2 and $3 is passed as a separate argument. At the end, when I am comfortable with the parameterized queries the idea is to promote them to prepared statements to utilize the most efficient and safest way to query the database.
However, I am puzzled how can I do this with my example, where different tables have to be subqueried.
P.S. I am using PostgreSQL 9.2 and have no problem with a PostgreSQL specific solution.
EDIT 1
C:\Users\markk>psql -U postgres
psql (9.2.4)
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
Type "help" for help.
postgres=# \c dummy
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
You are now connected to database "dummy" as user "postgres".
dummy=# DROP TABLE scratch;
DROP TABLE
dummy=# CREATE TABLE scratch
dummy-# (
dummy(# id SERIAL NOT NULL PRIMARY KEY,
dummy(# name text NOT NULL UNIQUE,
dummy(# rep_id integer NOT NULL,
dummy(# term_id integer
dummy(# );
NOTICE: CREATE TABLE will create implicit sequence "scratch_id_seq" for serial column "scratch.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "scratch_pkey" for table "scratch"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "scratch_name_key" for table "scratch"
CREATE TABLE
dummy=# DEALLOCATE insert_scratch;
ERROR: prepared statement "insert_scratch" does not exist
dummy=# PREPARE insert_scratch (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# SELECT $1, r.id, t.id
dummy-# FROM reps r, terms t
dummy-# WHERE r.rep = $2 AND t.terms = $3
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# DEALLOCATE insert_scratch2;
ERROR: prepared statement "insert_scratch2" does not exist
dummy=# PREPARE insert_scratch2 (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# VALUES ($1, (SELECT id FROM reps WHERE rep=$2 LIMIT 1), (SELECT id FROM terms WHERE terms=$3 LIMIT 1))
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# EXECUTE insert_scratch ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+-----+-------
(0 rows)
INSERT 0 0
dummy=# EXECUTE insert_scratch2 ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+----------+-------
1 | abc | Snowhite |
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch ('abcd', 'Snowhite', '30 days');
id | name | rep | terms
----+------+----------+---------
2 | abcd | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch2 ('abcd2', 'Snowhite', '30 days');
id | name | rep | terms
----+-------+----------+---------
3 | abcd2 | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=#
EDIT 2
We can utilize the fact that rep_id is required, even though terms_id is optional and use the following version of INSERT-SELECT:
PREPARE insert_scratch (text, text, text) AS
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM reps r
LEFT JOIN terms t ON t.terms = $3
WHERE r.rep = $2
RETURNING id, name, $2 rep, $3 terms;
This version, however, has two problems:
No distinction is made between a missing terms value (i.e. '') and an invalid terms value (i.e. a non empty value missing from the terms table entirely). Both are treated as missing terms. (But the INSERT with two subqueries suffers from the same problem)
The version depends on the fact that the rep is required. But what if rep_id was optional too?
EDIT 3
Found the solution for the item 2 - eliminating dependency on rep being required. Plus using the WHERE statement has the problem that the sql does not fail if the rep is invalid - it just inserts 0 rows, whereas I want to fail explicitly in this case. My solution is simply using a dummy one row CTE:
PREPARE insert_scratch (text, text, text) AS
WITH stub(x) AS (VALUES (0))
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM stub
LEFT JOIN terms t ON t.terms = $3
LEFT JOIN reps r ON r.rep = $2
RETURNING id, name, rep_id, term_id;
If rep is missing or invalid, this sql will try to insert NULL into the rep_id field and since the field is NOT NULL an error would be raised - precisely what I need. And if further I decide to make rep optional - no problem, the same SQL works for that too.

INSERT into scratch (name, rep_id, term_id)
SELECT 'aaa'
, r.id
, t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = 'Dracula'
AND t.terms = 'prepaid'
;
Notes:
You don't need the ugly LIMITs, since r.rep and t.terms are unique (candidate keys)
you could replace the FROM a, b by a FROM a CROSS JOIN b
the scratch table will probably need an UNIQUE constraint on (rep_id, term_it) (the nullability of term_id is questionable)
UPDATE: the same as prepared query as found in the Documentation
PREPARE hoppa (text, text,text) AS
INSERT into scratch (name, rep_id, term_id)
SELECT $1 , r.id , t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = $2
AND t.terms = $3
;
EXECUTE hoppa ('bbb', 'Dracula' , 'prepaid' );
SELECT * FROM scratch;
UPDATE2: test data
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE reps ( id SERIAL PRIMARY KEY, rep TEXT NOT NULL UNIQUE);
CREATE TABLE terms ( id SERIAL PRIMARY KEY, terms TEXT NOT NULL UNIQUE);
CREATE TABLE scratch ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, rep_id INT NOT NULL REFERENCES reps, term_id INT REFERENCES terms);
INSERT INTO reps(rep) VALUES( 'Dracula' );
INSERT INTO terms(terms) VALUES( 'prepaid' );
Results:
NOTICE: drop cascades to 3 other objects
DETAIL: drop cascades to table tmp.reps
drop cascades to table tmp.terms
drop cascades to table tmp.scratch
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
PREPARE
INSERT 0 1
id | name | rep_id | term_id
----+------+--------+---------
1 | aaa | 1 | 1
2 | bbb | 1 | 1
(2 rows)

Prevent empty strings in CHARACTER VARYING field

I am using PostgreSQL and would like to prevent certain required CHARACTER VARYING (VARCHAR) fields from allowing empty string inputs.
These fields would also need to contain unique values, so I am already using a unique constraint; however, this does not prevent an original (unique) empty value.
Basic example, where username needs to be unique and not empty
| id | username | password |
+----+----------+----------+
| 1 | User1 | pw1 | #Allowed
| 2 | User2 | pw1 | #Allowed
| 3 | User2 | pw2 | #Already prevented by constraint
| 4 | '' | pw2 | #Currently allowed, but needs to be prevented

Use a check constraint:
CREATE TABLE foobar(
x TEXT NOT NULL UNIQUE,
CHECK (x <> '')
);
INSERT INTO foobar(x) VALUES('');

You can use the standard SQL 'CONSTRAINT...CHECK' clause when defining table fields:
CREATE TABLE test
(
nonempty VARCHAR NOT NULL UNIQUE CONSTRAINT non_empty CHECK(length(nonempty)>0)
)

As a special kind of constraint, you can put the datatype+constraint into a DOMAIN:
-- set search_path='tmp';
DROP DOMAIN birthdate CASCADE;
CREATE DOMAIN birthdate AS date DEFAULT NULL
CHECK (value >= '1900-01-01' AND value <= now())
;
DROP DOMAIN username CASCADE;
CREATE DOMAIN username AS VARCHAR NOT NULL
CHECK (length(value) > 0)
;
DROP TABLE employee CASCADE;
CREATE TABLE employee
( empno INTEGER NOT NULL PRIMARY KEY
, dob birthdate
, zname username
, UNIQUE (zname)
);
INSERT INTO employee(empno,dob,zname)
VALUES (1,'1980-02-02', 'John Doe' ), (2,'1980-02-02', 'Jon Doeh' );
INSERT INTO employee(empno,dob,zname)
VALUES (3,'1980-02-02', '' ), (4,'1980-01-01', 'Joan Doh' );
This will allow you to reuse the domain again and again, without having to copy the constraint every time.
-- UPDATE 2021-03-25 (Thanks to #AlexanderPavlov)
There appears to be a serious flaw in Postgres's implementation: it is possible to insert NULLs from the results of an empty scalar subquery.
The (nonsensical) COALESCE() below "fixes" this behaviour.
This allows us to put the database into a forbidden state.
\echo literal NULL
INSERT INTO employee(empno,dob,zname) VALUES (5,'2021-02-02', NULL );
\echo empty (scalar) set
INSERT INTO employee(empno,dob,zname) VALUES (6,'2021-02-02', (select zname from employee where 1=0) );
\echo empty COALESCE((scalar, NULL) ) set
INSERT INTO employee(empno,dob,zname) VALUES (7,'2021-02-02', (select COALESCE(zname,NULL) from employee where 1=0) );
\echo empty set#2
INSERT INTO employee(empno,dob,zname) (select 8,'2021-03-03', zname from employee where 1=0 );
\echo duplicate the complete table
INSERT INTO employee(empno,dob,zname) (select 100+empno,dob+'1mon':: interval, upper(zname) from employee );
select * from employee;
Extra Results:
literal NULL
ERROR: domain username does not allow null values
empty (scalar) set
INSERT 0 1
empty COALESCE((scalar, NULL) ) set
ERROR: domain username does not allow null values
empty set#2
INSERT 0 0
duplicate the complete table
ERROR: domain username does not allow null values
empno | dob | zname
-------+------------+----------
1 | 1980-02-02 | John Doe
2 | 1980-02-02 | Jon Doeh
6 | 2021-02-02 |
(3 rows)

Postgres: complex CASCADE question - making sure you only delete unique foreign key references?

I've got some linked tables in a Postgres database, as follows:
Table "public.key"
Column | Type | Modifiers
--------+------+-----------
id | text | not null
name | text |
Referenced by:
TABLE "enumeration_value" CONSTRAINT "enumeration_value_key_id_fkey" FOREIGN KEY (key_id) REFERENCES key(id)
Table "public.enumeration_value"
Column | Type | Modifiers
--------+------+-----------
id | text | not null
key_id | text |
Foreign-key constraints:
"enumeration_value_key_id_fkey" FOREIGN KEY (key_id) REFERENCES key(id)
Referenced by:
TABLE "classification_item" CONSTRAINT "classification_item_value_id_fkey" FOREIGN KEY (value_id) REFERENCES enumeration_value(id)
Table "public.classification_item"
Column | Type | Modifiers
----------------+------+-----------
id | text | not null
transaction_id | text |
value_id | text |
Foreign-key constraints:
"classification_item_transaction_id_fkey" FOREIGN KEY (transaction_id) REFERENCES transaction(id)
"classification_item_value_id_fkey" FOREIGN KEY (value_id) REFERENCES enumeration_value(id)
I want to
delete all classification_items associated with a certain transaction
delete all enumeration_values associated with those classification_items
and finally, delete all key items associated with those enumeration_values.
The difficulty is that the key items are NOT unique to enumeration_values associated (via classification_item) with a certain transaction. They get created independently, and can exist across multiple of these transactions.
So I know how to do the second two of these steps, but not the first one:
delete from key where id in (select key_id from enumeration_value where id in (select value_id from "classification_item" where id = (select id from "transaction" where slice_id = (select id from slice where name = 'barnet'))));
# In statement above: help! How do I make sure these keys are ONLY used with these values?
delete from enumeration_value where id in (select value_id from "classification_item" where id = (select id from "transaction" where slice_id = (select id from slice where name = 'barnet')));
delete from classification_item where transaction_id in (select id from "transaction" where slice_id = (select id from slice where name = 'barnet'));
If only postgres had a CASCADE DELETE statement....

If only postgres had a CASCADE DELETE
statement....
PostgreSQL has this option for a long time, as of version 8.0 (5 years ago). Just use them.

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres lowercase column and delete duplicates - postgresql

Related

Insert into two referencing tables by selecting from a single table

Use INSERT ... ON CONFLICT DO NOTHING RETURNING failed rows

How to insert values from another table in PostgreSQL?

Prevent empty strings in CHARACTER VARYING field

Postgres: complex CASCADE question - making sure you only delete unique foreign key references?

Categories

Resources