PostgreSQL select query on table that is being updated - postgresql

I assume this question has been asked before, but unfortunately I cannot find the answer to my question.
I have a table, and I am using an update statement to update a column. Simultaneously I am running a create table query with a select statement that is retrieving data from the table and column that is also being updated.
My questions are: can this lead to wrong results in the output of the create table statement? does the update query finish 1st then the create table with the select execute? I just know that the create table statement is taking way longer to execute.

In PostgreSQL readers never lock writers and vice versa. This is guaranteed by PostgreSQL's MVCC implementation that keeps old row versions around.
If the updating transaction isn't finished yet, the reading transaction will see the old value, and the result is consistent.
There is nothing inside PostgreSQL that should slow down the SELECT statement noticeably, but of course I/O contention is a possible explanation.

Related

postgres - An UPDATE is locked by a SELECT on another table

I'm running a light UPDATE on table and it's usually done in no time nor locks. When doing multiple UPDATEs, I see some cases where the update query is suddenly getting locked. When I check which query blocks it, I see that it's a SELECT query on a completely different table.
How can that be?

How to lock a SELECT in PostgreSQL?

I am used to do this in MySQL:
INSERT INTO ... SELECT ...
which would lock the table I SELECT from.
Now, I am trying to do something similar in PostgreSQL, where I select a set of rows in a table, and then I insert some stuff in other tables based on those rows values. I want to prevent having outdated data, so I am wondering how can I lock a SELECT in PostgresSQL.
There is no need to explicitly lock anything. A SELECT statement will always see a consistent snapshot of the table, no matter how long it runs.
The result will be no different if you lock the table against concurrent modifications before starting the SELECT, but you will harm concurrency unnecessarily.
If you need several queries to see a consistent state of the database, start a transaction with the REPEATABLE READ isolation level. Then all statements in the transaction will see the same state of the database.

Postgres parallel/efficient load huge amount of data psycopg

I want to load many rows from a CSV file.
The file​s​ contain​ data like these​ "article​_name​,​article_time,​start_time,​end_time"
There is a contraint on the table: for the same article name, i don't insert a new row if the new ​article_time falls in an existing range​ [start_time,​end_time]​ for the same article.
ie: don't insert row y if exists [​start_time_x,​end_time_x] for which time_article_y inside range [​start_time_x,​end_time_x] , with article_​name_​y = article_​name_​x
I tried ​with psycopg by selecting the existing article names ad checking manually if there is an overlap --> too long
I tried again with psycopg, this time by setting a condition 'exclude using...' and tryig to insert with specifying "on conflict do nothing" (so that it does not fail) but still too long
I tried the same thing but this time trying to insert many values at each call of execute (psycopg): it got a little better (1M rows processed in almost 10minutes)​, but still not as fast as it needs to be for the amount of data ​I have (500M+)
I tried to parallelize by calling the same script many time, on different files but the timing didn't get any better, I guess because of the locks on the table each time we want to write something
Is there any way to create a lock only on rows containing the same article_name? (and not a lock on the whole table?)
Could you please help with any idea to make this parallellizable and/or more time efficient?
​Lots of thanks folks​
Your idea with the exclusion constraint and INSERT ... ON CONFLICT is good.
You could improve the speed as follows:
Do it all in a single transaction.
Like Vao Tsun suggested, maybe COPY the data into a staging table first and do it all with a single SQL statement.
Remove all indexes except the exclusion constraint from the table where you modify data and re-create them when you are done.
Speed up insertion by disabling autovacuum and raising max_wal_size (or checkpoint_segments on older PostgreSQL versions) while you load the data.

Alter a SELECT Query to Limit

I'm working on a 1M+ row table. The software that inserts the data sometimes tries to select all rows. If it tries to do that; It crashes.
I'm not able to modify the software so I'm trying to implement a fix on the Postgresql side.
I want Postgresql to limit SELECT query results that are coming from a special user to 1.
I tried to implement a RULE but haven't been able to do it with success. Any suggestions are welcome.
Br,
You could rename the table and create a view with the name of the table (selecting from the renamed table).
Then you can include a LIMIT clause in the view definition.
There is a chance you need an index. Let me give you a few scenarios
There is a unique constraint on one of the fields but no corresponding index. This way when you insert a record PostgreSQL has to scan the table to see if there is an existing record with the same value in that field.
Your software mimics unique field constraint. Before inserting a new record it scans the table for a record with the same value in one of the fields to check if such a record already exists. Index on the right field would definitely help.
You software wants to compute the next "id" value. In this case it runs SELECT MAX(id) in order to find the next available value. "id" needs an index.
Try to find out if indexing one of the table fields helps. You can also try to trace and analyze queries submitted to the server and see if those queries can benefit from indexing the table. You can enable query logging this way How to log PostgreSQL queries?
Another guess is that your software buffers all records before processing them. Reading 1M records into memory may crash it. Limiting fetchSize (e.g. if your software uses JDBC you could add defaultRowFetchSize connection parameter to the connection string) may help though I realize you may not have means to change the way the existing software fetches data from DB.

PostgreSQL insert query succeeds but then row nowhere to be found

I am at a complete loss. I perform queries to my data table with an application very rapidly and they seem to be there. However, I have tried to perform a few inserts by hand and although I do not get any errors when I try to select my last insert I cannot find it.
postges 9.3. Any thoughts?