Before I try to insert a row into a PostgreSQL table, should I query whether the insert would violate a constraint?
I do check when the insert would cause unwanted side-effects (e.g., auto-increment) upon an error.
But, if there are no possible side effects, is it OK to just blindly try to insert into a table? Or, is it better practice to prevent errors by anticipating them when possible (as advised in Objective-C)?
Also, when performing the insert inside an SQL function, will other queries (e.g., CTEs) inside the function get rolled back if the insert fails?
In general testing before hand is not a good idea because it requires you to explicitly lock tables to prevent other clients from changing or inserting data between your test and inserts. Explicit locking is bad for concurrency.
Serials getting auto incremented by failed inserts is in general not a problem. Just don't assume the values inserted into the database are consecutive.
A database and obj-c are two completely different things. Let the database check for problems, it is much easier to add the appropriate constraints to your schema then it is to check everything in your client program.
The default is to rollback to the start of the transaction. But you can control it with savepoints and rollback to savepoint. However a CTE is part of the query and queries are always rolled back completely when part of them fails. However you might be able to work around that by splitting the CTE of into a full query that creates a temp table. Then you can use the temp table instead of the cte.
Related
I'm migrating data from one table to another in an environment where any long locks or downtime is not acceptable, in total about 80000 rows. Essentially the query boils down to this simple case:
INSERT INTO table_2
SELECT * FROM table_1
JOIN table_3 on table_1.id = table_3.id
All 3 tables are being read from and could have an insert at any time. I want to just run the query above, but I'm not sure how the locking works and whether the tables will be totally inaccessible during the operation. My understanding tells me that only the affected rows (newly inserted) will be locked. Table 1 is just being selected, so no harm, and concurrent inserts are safe so table 2 should be freely accessible.
Is this understanding correct, and can I run this query in a production environment without fear? If it's not safe, what is the standard way to accomplish this?
You're fine.
If you're interested in the details, you can read up on multiversion concurrency control, or on the details of the Postgres MVCC implementation, or how its various locking modes interact, but the implications for your case are nicely summarised in the docs:
reading never blocks writing and writing never blocks reading
In short, every record stored in the database has some version number attached to it, and every query knows which versions to consider and which to ignore.
This means that an INSERT can safely write to a table without locking it, as any concurrent queries will simply ignore the new rows until the inserting transaction decides to commit.
To give some context, the command is issued inside a task, and many task might issue the same command from multiple workers at the same time.
Each tasks tries to create a postgres schema. I often get the following error:
IntegrityError: (IntegrityError) duplicate key value violates unique constraint "pg_namespace_nspname_index"
DETAIL: Key (nspname)=(9621584361) already exists.
'CREATE SCHEMA IF NOT EXISTS "9621584361"'
Postgres version is PostgreSQL 9.4rc1.
Is it a bug in Postgres?
This is a bit of a wart in the implementation of IF NOT EXISTS for tables and schemas. Basically, they're an upsert attempt, and PostgreSQL doesn't handle the race conditions cleanly. It's safe, but ugly.
If the schema is being concurrently created in another session but isn't yet committed, then it both exists and does not exist, depending on who you are and how you look. It's not possible for other transactions to "see" the new schema in the system catalogs because it's uncommitted, so it's entry in pg_namespace is not visible to other transactions. So CREATE SCHEMA / CREATE TABLE tries to create it because, as far as it's concerned, the object doesn't exist.
However, that inserts a row into a table with a unique constraint. Unique constraints must be able to see uncommitted rows in order to function. So the insert blocks (stops) until the first transaction that did the CREATE either commits or rolls back. If it commits, the second transaction aborts, because it tried to insert a row that violates a unique constraint. CREATE SCHEMA isn't smart enough to catch this case and re-try.
To properly fix this PostgreSQL would probably need predicate locking, where it could lock the potential for a row. This might get added as part of the current work going on for implementing UPSERT.
For these particular commands, PostgreSQL could probably do a dirty read of the system catalogs, where it can see uncommitted changes. Then it could wait for the uncommitted transaction to commit or roll back, re-do the dirty read to see if someone else is waiting, and retry. But this would have a race condition where someone else might create the schema between when you do the read to check for it and when you try to create it.
So the IF NOT EXISTS variants would have to:
Check to see if the schema exists; if it does, finish without doing anything.
Attempt to create the table
If creation fails due to a unique constraint error, retry at the start
If table creation succeeds, finish
As far as I know nobody's implemented that, or they tried and it wasn't accepted. There would be possible issues with transaction ID burn rate, etc, with this approach.
I think this is a bug of sorts, but it's a "yeah, we know" kind of bug, not a "we'll get right on fixing that" kind of bug. Feel free to post to pgsql-bugs about it; at the very least the documentation should mention this caveat about IF NOT EXISTS.
I don't recommend doing DDL concurrently like that.
I needed to work around this limitation in an application where schemas are created concurrently. What worked for me was adding
LOCK TABLE pg_catalog.pg_namespace
in the transaction including CREATE SCHEMA. Looks like a dirty and unsafe thing to do, but helped me to solve the problem which occurred only in tests anyway.
I am a very newbie with this SQL Plus and Oracle 10g thing.So,please don't mind the stupid questions.
See, what problem I am facing is that whenever i fire a query over a table;
SELECT * FROM emp;
The output comes out to be "no rows selected".
I am in utter dilemma as the table and its schema is clearly preserved but the data which I entered is not getting displayed. The same is happening for all the user generated tables. The tuples are not getting displayed. Is this the problem related to SQL Plus???
Kindly help and give me a proper guide.
In Oracle, every statement that you issue is part of a transaction. Those transactions need to either be committed (in which case the changes are made permanent) or rolled back (in which case the changes are reverted) before another session can see the data. Some databases either do not support transactions (i.e. MyISAM tables in MySQL) or do not implicitly start transactions (i.e. SQL Server). The Oracle approach is generally far superior-- when you inadvertently run a delete statement that is missing an important predicate, the ability to rollback the operation when it deletes many more rows that you are expecting can be a real career saver.
In Oracle, when you've run whatever statements comprise your transaction and you are confident that your changes are correct, you need to explicitly issue a commit to make those changes visible to other sessions, i.e.
SQL> insert into some_table( <<columns>> ) values( <<values>> );
SQL> insert into some_other_table( <<columns>> ) values ( <<more values>> );
SQL> commit;
If you are really, really, really sure that you prefer the behavior you might be accustomed to in other tools, you can tell SQL*Plus to autocommit your changes
SQL> set autocommit on;
SQL> <<do whatever>>
That is generally a really bad idea. The tiny benefit you get from not having to explicitly issue a commit is far outweighed by the ability to ensure that other sessions don't see data in an inconsistent state (i.e. if your transferring money from account A to account B by issuing two update statements, you don't want someone seeing an intermediate result where either both accounts have the money or neither account does) and the ability to rollback a change if it turns out to do something other than what you expected.
Is there any safe way of acquiring an advisory lock before executing a particular statement without using two separate queries? E.g., I assume that if I do something like the following, there is no guarantee that the lock will be acquired before the insert:
WITH x AS (SELECT pg_advisory_lock(1,2)) INSERT ...
But is there some similar way of getting the desired effect?
I'm pretty sure that SQL standards require implementations to behave as if the very first thing they do is to effectively materialize the common table expressions in the WITH clause. PostgreSQL complies with this requirement.
Common table expressions behave (mostly) as named objects. Multiple CTEs are materialized in the order they're declared. Backward references by name work as you'd expect, and forward references by name raise an error.
So I'm pretty sure that, in the general case, the CTE will have to materialize before the INSERT statement will run. But in your case, using PostgreSQL, I'm not dead certain, and here's why.
PostgreSQL's implementation [of common table expressions]
evaluates only as many rows of a WITH query as are actually
fetched by the parent query.
I'm not sure an INSERT statement fetches a row in this sense.
You haven't really told us enough about your use case to be sure, but in general for explicit locks to be useful in PostgreSQL they need to be acquired before the transaction acquires its snapshot. Advisory locks can be acquired before you start the transaction, and most locks with transactional scope should be acquired right after your begin your transaction; before anything which will need a transaction ID.
If you really don't need to acquire the lock before you have your transaction ID assigned and your snapshot set, and it is important to you that you issue one statement to acquire the lock and perform the insert, create a function which does both.
I am trying find out what is postgres can handle safely inside of transaction, but I cannot find the relavant information in the postgres manual. So far I have found out the following:
UPDATE, INSERT and DELTE are fully supported inside transactions and rolled back when the transaction is not finished
DROP TABLE is not handled safely inside a transaction, and is undone with a CREATE TABLE, thus recreates the dropped table but does not repopulate it
CREATE TABLE is also not truly transactionized and is instead undone with a corresponding DROP TABLE
Is this correct? Also I could not find any hints as to the handling of ALTER TABLE and TRUNCATE. In what way are those handled and are they safe inside transactions? Is there a difference of the handling between different types of transactions and different versions of postgres?
DROP TABLE is transactional. To undo this, you need to issue a ROLLBACK not a CREATE TABLE. The same goes for CREATE TABLE (which is also undone using ROLLBACK).
ROLLBACK is always the only correct way to undo a transaction - that includes ALTER TABLE and TRUNCATE.
The only thing that is never transactional in Postgres are the numbers generated by a sequence (CREATE/ALTER/DROP SEQUENCE themselves are transactional though).
Best I'm aware all of these commands are transaction aware, except for TRUNCATE ... RESTART IDENTITY (and even that one is transactional since 9.1.)
See the manual on concurrency control and transaction-related commands.