Create a DB index if it doesn't exist

Create a DB index if it doesn't exist - postgresql

I have an Alembic migration which creates a few DB indexes that were missing in a database. Example:
op.create_index(op.f('ix_some_index'), 'table_1', ['column_1'], unique=False)
However, the migration fails in other environments that already have the index:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation "ix_some_index" already exists
PostgreSQL supports an IF NOT EXISTS option for cases like this, but I don't see any way of invoking it using either Alembic or SQLAlchemy options. Is there a canonical way of checking for an existing index?

Here's a somewhat blunt solution that works for PostgreSQL. It simply checks whether there's an index with the same name before creating the new index.
Beware that it doesn't verify that the index is in the correct Postgres namespace or any other info that could be relevant. It works in my case because I know there's no other chance of name collision:
def index_exists(name):
connection = op.get_bind()
result = connection.execute(
"SELECT exists(SELECT 1 from pg_indexes where indexname = '{}') as ix_exists;"
.format(name)
).first()
return result.ix_exists
def upgrade():
if not index_exists('ix_some_index'):
op.create_index(op.f('ix_some_index'), 'table_1', ['column_1'], unique=False)

Related

Feedback on whether index was created on materialized views in postgresql

I created a unique index for a materialized view as :
create unique index if not exists matview_key on
matview (some_group_id, some_description);
I can't tell if it has been created
How do I see the index?
Thank you!

Two ways to verify index creation:
--In psql
\d matview
--Using SQL
select
*
from
pg_indexes
where
indexname = 'matview_key'
and
tablename = 'matview';
More information on pg_indexes.

Like has been commented, if the command finishes successfully and you don't get an error message, the index was created. Possible caveat: while the transaction is not committed, nobody else can see it (except the unique name is reserved now), and it still might get rolled back. Check in a separate transaction to be sure.
To be absolutely sure:
SELECT pg_get_indexdef(oid)
FROM pg_catalog.pg_class
WHERE relname = 'matview_key'
AND relkind = 'i'
-- AND relnamespace = 'public'::regnamespace -- optional, to make sure of the schema, too
This way you see whether an index of the given name exists, and also its exact definition to rule out a different index with the same name. Pure SQL, works from any client. (There is nothing special about an index on materialized views.)
Also filter for the schema to be absolutely sure. Would be the "default" schema (a.k.a. "current" schema) in your case, since you did not specify in the creation. See:
How does the search_path influence identifier resolution and the "current schema"
Related:
Create index if it does not exist
How to check if a table exists in a given schema
In psql:
\di public.matview_key
To only find indexes. Again, the schema is optional to narrow down.
Progress Reporting
If creating an index takes a long time, you can look up progress in pg_stat_progress_create_index since Postgres 12:
SELECT * FROM pg_stat_progress_create_index
-- WHERE relid = 'public.matview'::regclass -- optionally narrow down

Un alternative to looking into pg_indexes is pg_matviews (for a materialized view only)
select *
from pg_matviews
where matviewname = 'my_matview_name';

Postgres delete before insert in single transaction

PostgreSQL DB: v 9.4.24
create table my_a_b_data ... // with a_uuid, b_uuid, and c columns
NOTE: the my_a_b_data keeps the references to a and b table. So it keeps the uuids of a and b.
where: the primary key (a_uuid, b_uuid)
there is also an index:
create unique index my_a_b_data_pkey
on my_a_b_data (a_uuid, b_uuid);
In the Java jdbc-alike code, in the scope one single transaction: (start() -> [code (delete, insert)] ->commit()]) (org.postgresql:postgresql:42.2.5 driver)
delete from my_a_b_data where b_uuid = 'bbb';
insert into my_a_b_data (a_uuid, b_uuid, c) values ('aaa', 'bbb', null);
I found that the insert fails, because the delete is not yet deleted. So it fails because it can not be duplicated.
Q: Is it is some kind of limitation in PostgreSQL that DB can’t do a delete and insert in one transaction because PostgreSQL doesn’t update its indexes until the commit for the delete is executed, therefore the insert will fail since the id or key (whatever we may be using) already exists in the index?
What would be possible solution? Splitting in two transactions?
UPDATE: the order is exactly the same. When I test the sql alone in the SQL console. It works fine. We use JDBI library v 5.29.
there it looks like this:
#Transaction
#SqlUpdate("insert into my_a_b_data (...; // similar for the delete
public abstract void addB() ..
So in the code:
this.begin();
this.deleteByB(b_id);
this.addB(a_id, b_id);
this.commit();

I had a similar problem to insert duplicated values and I resolved it by using Insert and Update instead of Delete. I created this process on Python but you might be able to reproduce it:
First, you create a temporary table like the target table where you want to insert values, the difference is that this table is dropped after commit.
CREATE TEMP TABLE temp_my_a_b_data
(LIKE public.my_a_b_data INCLUDING DEFAULTS)
ON COMMIT DROP;
I have created a CSV (I had to merge different data to input) with the values that I want to input/insert on my table and I used the COPY function to insert them to the temp_table (temp_my_a_b_data).
I found this code on this post related to Java and COPY PostgreSQL - \copy command:
String query ="COPY tmp from 'E://load.csv' delimiter ','";
Use the INSERT INTO but with the ON_CONFLICT clause which you can decide to do an action when the insert cannot be done because of specified constrains, on the case below we do the update:
INSERT INTO public.my_a_b_data
SELECT *
FROM temp_my_a_b_data
ON CONFLICT (a_uuid, b_uuid,c) DO UPDATE
SET a_uuid = EXCLUDED.a_uuid,
b_uuid = EXCLUDED. c = EXCLUDED.c;`
Considerations:
I am not sure but you might be able to perform the third step without using the previous steps, temp table or copy from. You can just a loop over the values:
INSERT INTO public.my_a_b_data VALUES(value1, value2, null)
ON CONFLICT (a_uuid, b_uuid,c) DO UPDATE
SET a_uuid = EXCLUDED.a_uuid,
b_uuid = EXCLUDED.b_uuid, c = EXCLUDED.c;

How to replace row if primary key already exists ("IntegrityError: duplicate key value") [duplicate]

A very frequently asked question here is how to do an upsert, which is what MySQL calls INSERT ... ON DUPLICATE UPDATE and the standard supports as part of the MERGE operation.
Given that PostgreSQL doesn't support it directly (before pg 9.5), how do you do this? Consider the following:
CREATE TABLE testtable (
id integer PRIMARY KEY,
somedata text NOT NULL
);
INSERT INTO testtable (id, somedata) VALUES
(1, 'fred'),
(2, 'bob');
Now imagine that you want to "upsert" the tuples (2, 'Joe'), (3, 'Alan'), so the new table contents would be:
(1, 'fred'),
(2, 'Joe'), -- Changed value of existing tuple
(3, 'Alan') -- Added new tuple
That's what people are talking about when discussing an upsert. Crucially, any approach must be safe in the presence of multiple transactions working on the same table - either by using explicit locking, or otherwise defending against the resulting race conditions.
This topic is discussed extensively at Insert, on duplicate update in PostgreSQL?, but that's about alternatives to the MySQL syntax, and it's grown a fair bit of unrelated detail over time. I'm working on definitive answers.
These techniques are also useful for "insert if not exists, otherwise do nothing", i.e. "insert ... on duplicate key ignore".

9.5 and newer:
PostgreSQL 9.5 and newer support INSERT ... ON CONFLICT (key) DO UPDATE (and ON CONFLICT (key) DO NOTHING), i.e. upsert.
Comparison with ON DUPLICATE KEY UPDATE.
Quick explanation.
For usage see the manual - specifically the conflict_action clause in the syntax diagram, and the explanatory text.
Unlike the solutions for 9.4 and older that are given below, this feature works with multiple conflicting rows and it doesn't require exclusive locking or a retry loop.
The commit adding the feature is here and the discussion around its development is here.
If you're on 9.5 and don't need to be backward-compatible you can stop reading now.
9.4 and older:
PostgreSQL doesn't have any built-in UPSERT (or MERGE) facility, and doing it efficiently in the face of concurrent use is very difficult.
This article discusses the problem in useful detail.
In general you must choose between two options:
Individual insert/update operations in a retry loop; or
Locking the table and doing batch merge
Individual row retry loop
Using individual row upserts in a retry loop is the reasonable option if you want many connections concurrently trying to perform inserts.
The PostgreSQL documentation contains a useful procedure that'll let you do this in a loop inside the database. It guards against lost updates and insert races, unlike most naive solutions. It will only work in READ COMMITTED mode and is only safe if it's the only thing you do in the transaction, though. The function won't work correctly if triggers or secondary unique keys cause unique violations.
This strategy is very inefficient. Whenever practical you should queue up work and do a bulk upsert as described below instead.
Many attempted solutions to this problem fail to consider rollbacks, so they result in incomplete updates. Two transactions race with each other; one of them successfully INSERTs; the other gets a duplicate key error and does an UPDATE instead. The UPDATE blocks waiting for the INSERT to rollback or commit. When it rolls back, the UPDATE condition re-check matches zero rows, so even though the UPDATE commits it hasn't actually done the upsert you expected. You have to check the result row counts and re-try where necessary.
Some attempted solutions also fail to consider SELECT races. If you try the obvious and simple:
-- THIS IS WRONG. DO NOT COPY IT. It's an EXAMPLE.
BEGIN;
UPDATE testtable
SET somedata = 'blah'
WHERE id = 2;
-- Remember, this is WRONG. Do NOT COPY IT.
INSERT INTO testtable (id, somedata)
SELECT 2, 'blah'
WHERE NOT EXISTS (SELECT 1 FROM testtable WHERE testtable.id = 2);
COMMIT;
then when two run at once there are several failure modes. One is the already discussed issue with an update re-check. Another is where both UPDATE at the same time, matching zero rows and continuing. Then they both do the EXISTS test, which happens before the INSERT. Both get zero rows, so both do the INSERT. One fails with a duplicate key error.
This is why you need a re-try loop. You might think that you can prevent duplicate key errors or lost updates with clever SQL, but you can't. You need to check row counts or handle duplicate key errors (depending on the chosen approach) and re-try.
Please don't roll your own solution for this. Like with message queuing, it's probably wrong.
Bulk upsert with lock
Sometimes you want to do a bulk upsert, where you have a new data set that you want to merge into an older existing data set. This is vastly more efficient than individual row upserts and should be preferred whenever practical.
In this case, you typically follow the following process:
CREATE a TEMPORARY table
COPY or bulk-insert the new data into the temp table
LOCK the target table IN EXCLUSIVE MODE. This permits other transactions to SELECT, but not make any changes to the table.
Do an UPDATE ... FROM of existing records using the values in the temp table;
Do an INSERT of rows that don't already exist in the target table;
COMMIT, releasing the lock.
For example, for the example given in the question, using multi-valued INSERT to populate the temp table:
BEGIN;
CREATE TEMPORARY TABLE newvals(id integer, somedata text);
INSERT INTO newvals(id, somedata) VALUES (2, 'Joe'), (3, 'Alan');
LOCK TABLE testtable IN EXCLUSIVE MODE;
UPDATE testtable
SET somedata = newvals.somedata
FROM newvals
WHERE newvals.id = testtable.id;
INSERT INTO testtable
SELECT newvals.id, newvals.somedata
FROM newvals
LEFT OUTER JOIN testtable ON (testtable.id = newvals.id)
WHERE testtable.id IS NULL;
COMMIT;
Related reading
UPSERT wiki page
UPSERTisms in Postgres
Insert, on duplicate update in PostgreSQL?
http://petereisentraut.blogspot.com/2010/05/merge-syntax.html
Upsert with a transaction
Is SELECT or INSERT in a function prone to race conditions?
SQL MERGE on the PostgreSQL wiki
Most idiomatic way to implement UPSERT in Postgresql nowadays
What about MERGE?
SQL-standard MERGE actually has poorly defined concurrency semantics and is not suitable for upserting without locking a table first.
It's a really useful OLAP statement for data merging, but it's not actually a useful solution for concurrency-safe upsert. There's lots of advice to people using other DBMSes to use MERGE for upserts, but it's actually wrong.
Other DBs:
INSERT ... ON DUPLICATE KEY UPDATE in MySQL
MERGE from MS SQL Server (but see above about MERGE problems)
MERGE from Oracle (but see above about MERGE problems)

Here are some examples for insert ... on conflict ... (pg 9.5+) :
Insert, on conflict - do nothing.
insert into dummy(id, name, size) values(1, 'new_name', 3)
on conflict do nothing;`
Insert, on conflict - do update, specify conflict target via column.
insert into dummy(id, name, size) values(1, 'new_name', 3)
on conflict(id)
do update set name = 'new_name', size = 3;
Insert, on conflict - do update, specify conflict target via constraint name.
insert into dummy(id, name, size) values(1, 'new_name', 3)
on conflict on constraint dummy_pkey
do update set name = 'new_name', size = 4;

I am trying to contribute with another solution for the single insertion problem with the pre-9.5 versions of PostgreSQL. The idea is simply to try to perform first the insertion, and in case the record is already present, to update it:
do $$
begin
insert into testtable(id, somedata) values(2,'Joe');
exception when unique_violation then
update testtable set somedata = 'Joe' where id = 2;
end $$;
Note that this solution can be applied only if there are no deletions of rows of the table.
I do not know about the efficiency of this solution, but it seems to me reasonable enough.

SQLAlchemy upsert for Postgres >=9.5
Since the large post above covers many different SQL approaches for Postgres versions (not only non-9.5 as in the question), I would like to add how to do it in SQLAlchemy if you are using Postgres 9.5. Instead of implementing your own upsert, you can also use SQLAlchemy's functions (which were added in SQLAlchemy 1.1). Personally, I would recommend using these, if possible. Not only because of convenience, but also because it lets PostgreSQL handle any race conditions that might occur.
Cross-posting from another answer I gave yesterday (https://stackoverflow.com/a/44395983/2156909)
SQLAlchemy supports ON CONFLICT now with two methods on_conflict_do_update() and on_conflict_do_nothing():
Copying from the documentation:
from sqlalchemy.dialects.postgresql import insert
stmt = insert(my_table).values(user_email='a#b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
index_elements=[my_table.c.user_email],
index_where=my_table.c.user_email.like('%#gmail.com'),
set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)
http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html?highlight=conflict#insert-on-conflict-upsert

MERGE in PostgreSQL v. 15
Since PostgreSQL v. 15, is possible to use MERGE command. It actually has been presented as the first of the main improvements of this new version.
It uses a WHEN MATCHED / WHEN NOT MATCHED conditional in order to choose the behaviour when there is an existing row with same criteria.
It is even better than standard UPSERT, as the new feature gives full control to INSERT, UPDATE or DELETE rows in bulk.
MERGE INTO customer_account ca
USING recent_transactions t
ON t.customer_id = ca.customer_id
WHEN MATCHED THEN
UPDATE SET balance = balance + transaction_value
WHEN NOT MATCHED THEN
INSERT (customer_id, balance)
VALUES (t.customer_id, t.transaction_value)

WITH UPD AS (UPDATE TEST_TABLE SET SOME_DATA = 'Joe' WHERE ID = 2
RETURNING ID),
INS AS (SELECT '2', 'Joe' WHERE NOT EXISTS (SELECT * FROM UPD))
INSERT INTO TEST_TABLE(ID, SOME_DATA) SELECT * FROM INS
Tested on Postgresql 9.3

Since this question was closed, I'm posting here for how you do it using SQLAlchemy. Via recursion, it retries a bulk insert or update to combat race conditions and validation errors.
First the imports
import itertools as it
from functools import partial
from operator import itemgetter
from sqlalchemy.exc import IntegrityError
from app import session
from models import Posts
Now a couple helper functions
def chunk(content, chunksize=None):
"""Groups data into chunks each with (at most) `chunksize` items.
https://stackoverflow.com/a/22919323/408556
"""
if chunksize:
i = iter(content)
generator = (list(it.islice(i, chunksize)) for _ in it.count())
else:
generator = iter([content])
return it.takewhile(bool, generator)
def gen_resources(records):
"""Yields a dictionary if the record's id already exists, a row object
otherwise.
"""
ids = {item[0] for item in session.query(Posts.id)}
for record in records:
is_row = hasattr(record, 'to_dict')
if is_row and record.id in ids:
# It's a row but the id already exists, so we need to convert it
# to a dict that updates the existing record. Since it is duplicate,
# also yield True
yield record.to_dict(), True
elif is_row:
# It's a row and the id doesn't exist, so no conversion needed.
# Since it's not a duplicate, also yield False
yield record, False
elif record['id'] in ids:
# It's a dict and the id already exists, so no conversion needed.
# Since it is duplicate, also yield True
yield record, True
else:
# It's a dict and the id doesn't exist, so we need to convert it.
# Since it's not a duplicate, also yield False
yield Posts(**record), False
And finally the upsert function
def upsert(data, chunksize=None):
for records in chunk(data, chunksize):
resources = gen_resources(records)
sorted_resources = sorted(resources, key=itemgetter(1))
for dupe, group in it.groupby(sorted_resources, itemgetter(1)):
items = [g[0] for g in group]
if dupe:
_upsert = partial(session.bulk_update_mappings, Posts)
else:
_upsert = session.add_all
try:
_upsert(items)
session.commit()
except IntegrityError:
# A record was added or deleted after we checked, so retry
#
# modify accordingly by adding additional exceptions, e.g.,
# except (IntegrityError, ValidationError, ValueError)
db.session.rollback()
upsert(items)
except Exception as e:
# Some other error occurred so reduce chunksize to isolate the
# offending row(s)
db.session.rollback()
num_items = len(items)
if num_items > 1:
upsert(items, num_items // 2)
else:
print('Error adding record {}'.format(items[0]))
Here's how you use it
>>> data = [
... {'id': 1, 'text': 'updated post1'},
... {'id': 5, 'text': 'updated post5'},
... {'id': 1000, 'text': 'new post1000'}]
...
>>> upsert(data)
The advantage this has over bulk_save_objects is that it can handle relationships, error checking, etc on insert (unlike bulk operations).

How to use UPDATE ... FROM in SQLAlchemy?

I would like to write this kind of statement in SQLAlchemy / Postgres:
UPDATE slots
FROM (SELECT id FROM slots WHERE user IS NULL
ORDER BY id LIMIT 1000) AS available
SET user='joe'
WHERE id = available.id
RETURNING *;
Namely, I would like to update a limited number of rows matching specified criteria.
PG

I was able to do it this way:
limited_slots = select([slots.c.id]).\
where(slots.c.user==None).\
order_by(slots.c.id).\
limit(1000)
stmt = slots.update().returning(slots).\
values(user='joe').\
where(slots.c.id.in_(limited_slots))
I don't think its as efficient as the original SQL query, however if the database memory is large enough to hold all related segments it shouldn't make much difference.

It's been a while since I used sqlalchemy so consider the following as pseudocode:
for i in session.query(Slots).filter(Slots.user == None):
i.user = "Joe"
session.add(i)
session.commit()
I recommend the sqlalchemy ORM tutorial.

Activerecord-import & serial column in PostgreSQL

I am in the process of upgrading a Rails 2.3.4 project to Rails 3.1.1. The old version used ar-extensions to handle a data import. I pulled out ar-extensions and replaced it with activerecord-import, which I understand has exactly the same interfaces.
My code calls looks like this
Student.import(columns, values)
Both args are valid arrays holding the correct data, but I get a big fat error!
The error stack looks like this:
NoMethodError (You have a nil object when you didn't expect it!
You might have expected an instance of Array.
The error occurred while evaluating nil.split):
activerecord (3.1.1) lib/active_record/connection_adapters/postgresql_adapter.rb:828:in 'default_sequence_name'
activerecord (3.1.1) lib/active_record/base.rb:647:in `reset_sequence_name'
activerecord (3.1.1) lib/active_record/base.rb:643:in `sequence_name'
activerecord-import (0.2.9) lib/activerecord-import/import.rb:203:in `import'
Looking through the code it seems as though Activerecord-import calls activerecord which in turn looks for the name and next value of the Postgres sequence.
So activerecord-import looks for the sequence_name
lib/activerecord-import/import.rb:203
# Force the primary key col into the insert if it's not
# on the list and we are using a sequence and stuff a nil
# value for it into each row so the sequencer will fire later
-> if !column_names.include?(primary_key) && sequence_name && connection.prefetch_primary_key?
column_names << primary_key
array_of_attributes.each { |a| a << nil }
end
It calls active record ...
lib/active_record/base.rb:647:in `reset_sequence_name'
# Lazy-set the sequence name to the connection's default. This method
# is only ever called once since set_sequence_name overrides it.
def sequence_name #:nodoc:
-> reset_sequence_name
end
def reset_sequence_name #:nodoc:
-> default = connection.default_sequence_name(table_name, primary_key)
set_sequence_name(default)
default
end
The code errors when serial_sequence returns nil and default_sequence_name tries to split it.
lib/active_record/connection_adapters/postgresql_adapter.rb
# Returns the sequence name for a table's primary key or some other specified key.
def default_sequence_name(table_name, pk = nil) #:nodoc:
-> serial_sequence(table_name, pk || 'id').split('.').last
rescue ActiveRecord::StatementInvalid
"#{table_name}_#{pk || 'id'}_seq"
end
def serial_sequence(table, column)
result = exec_query(<<-eosql, 'SCHEMA', [[nil, table], [nil, column]])
SELECT pg_get_serial_sequence($1, $2)
eosql
result.rows.first.first
end
When I execute pg_get_serial_sequence() directly against the database I get no value returned:
SELECT pg_get_serial_sequence('student', 'id')
But I can see that in the database there is a sequence called student_id_seq
I am using the following versions of Ruby, rails PG etc..
Rails 3.1.1
Ruby 1.9.2
Activerecord-import 0.2.9
pg 0.12.2
psql (9.0.5, server 9.1.3)
I have migrated the database from MySQL to PostgreSQL, I don't think this has any bearing on the problem but I thought that I'd better add it for completeness.
I can't work out why this isn't working!

Summary of your description:
The table student exists.
The column id exists.
The sequence student_id_seq exists.
pg_get_serial_sequence('student', 'id') still returns NULL.
Two possible explanations:
1) The sequence is not linked to the column.
Column default and the tie between column and sequence are independent features. The mere existence of a fitting sequence does not mean it does what you presume. If you create a column as serial you get the whole package, though. Read the details in the manual.
To fix this (and if you are sure that's how it should be), you can mark the sequence as "owned by" student.id:
ALTER SEQUENCE student_id_seq OWNED BY student.id;
Also check if the column default is set as expected:
SELECT column_name, column_default
FROM information_schema.columns
WHERE table_name = 'student'
-- AND schema = 'your_schema' -- if needed
If not, repair:
ALTER TABLE student ALTER COLUMN id SET DEFAULT nextval('student.id')
2) A mixup of host address / port / database / schema / capitalization of the table name.
It happens all the time. Make sure you check the same database that your app connects to, With the same user or at least the same search_path. Make sure, the objects are in the schema where you expect them and there isn't, for instance, another student table in another schema that got mixed up.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Create a DB index if it doesn't exist - postgresql

Related

Feedback on whether index was created on materialized views in postgresql

Postgres delete before insert in single transaction

How to replace row if primary key already exists ("IntegrityError: duplicate key value") [duplicate]

How to use UPDATE ... FROM in SQLAlchemy?

Activerecord-import & serial column in PostgreSQL

Categories

Resources