We currently store things into Redis for temporary aggregation and have a worker that goes and does insertion in bulk into Postgres. Is there a way that we can do bulk insert across multiple schemas in a single Insert transaction? This will remove the need to aggregate things in Redis. Or, is there a better way to aggregate the requests?
Thanks for the help in advance.
It really depends on what you mean with "single insert transaction".
One single INSERT statement can only affect one specific table. However you could still BEGIN a transaction (depends on implementation), perform all of your INSERT in there and then COMMIT the transaction.
This would still be more efficient than performing all the INSERTs on many transactions since it avoid redundant "hand shakings".
https://www.postgresql.org/docs/current/sql-begin.html
Have you tried creating an update-able view that references two tables and then bulk insert in to this view?
Are you looking for something like this?
with data (c1, c2) as (
values (1,2),(10,20),(30,40)
), s1_insert as (
insert into schema_one.table_1(c1, c2)
select c1, c2
from data
)
insert into schema_two.table_2(col1, col2)
select c1, c2
from data;
If you execute a insert statement a single transaction will happen and you can only insert in a single table(So inserting across multiple schema is altogether not possible in a single transaction).
Related
There is an insert query inserting data into a partitioned table using values clause.
insert into t (c1, c2, c3) values (v1,v2,v3);
Database is AWS Aurora v11. Around 20 sessions run in parallel, executing ~2million individual insert statements in total. Seeing DataFileRead as the wait event, wondering why would this wait event show up for an insert statement? Would it be because each insert statement has to check if the PK/UK keys already exists in the table before committing the insert statement? Or other reasons?
Each inserted row has to read the relevant leaf pages of each of the table's indexes in order to do index maintenance (insert the index entries for the new row into their proper locations--it has to dirty the page, but it first needs to read the page before it can dirty it), and also to verify PK/UK constraints. And maybe it also needs to read index leaf pages of other table's indexes in order to verify FKs.
If you insert the new tuples is the right order, you an hit the same leaf pages over and over in quick sequence, maximizing the cacheability. But if you have multiple indexes, there might be no ordering that can satisfy all of them.
I'm using postgreSQL and I want to insert into about 5 different tables where the returning data of an insert into a table is required in another table, and so on and so forth. E.g. if I insert data into table A, I need the returning id to insert into table B, and the returning id of that for table C, etc. And in some of the tables, I need to sometimes insert multiple rows.
I intend to wrap the entire thing up into a singular function rather than having 5 different queries because I want the entire thing to fail if one insert fails. And having separate queries doesn't allow for this.
Now, I know how to achieve this, but my question is, is this safe to do or are there any issues with this method, given that I have 1000s of users.
Using a function will work well, just as running several INSERT statements in a single transaction or a stack of CTEs using INSERT INTO ... SELECT ... RETURNING ....
Problem is following: remove all records from one table, and insert them to another.
I have a table that is partitioned by date criteria. To avoid partitioning each record one by one, I'm collecting the data in one table, and periodically move them to another table. Copied records have to be removed from first table. I'm using DELETE query with RETURNING, but the side effect is that autovacuum is having a lot of work to do to clean up the mess from original table.
I'm trying to achieve the same effect (copy and remove records), but without creating additional work for vacuum mechanism.
As I'm removing all rows (by delete without where conditions), I was thinking about TRUNCATE, but it does not support RETURNING clause. Another idea was to somehow configure the table, to automatically remove tuple from page on delete operation, without waiting for vacuum, but I did not found if it is possible.
Can you suggest something, that I could use to solve my problem?
You need to use something like:
--Open your transaction
BEGIN;
--Prevent concurrent writes, but allow concurrent data access
LOCK TABLE table_a IN SHARE MODE;
--Copy the data from table_a to table_b, you can also use CREATE TABLE AS to do this
INSERT INTO table_b AS SELECT * FROM table_a;
--Zeroying table_a
TRUNCATE TABLE table_a;
--Commits and release the lock
COMMIT;
I'm running a multi-master setup with bucardo and postgres.
I'm finding that some of my table sequences are getting out of sync with each other. Particularly the auto-incremented id.
example:
db1 - table1
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
The id of the new row is 1
db2 - table1
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
The id of the new row is 1
The id of the new row on db2 should be 2, because bucardo has replicated the data from db1, but db2's auto increment is based on:
nextval('oauth_sessions_id_seq'::regclass)
And if we check the "oauth_sessions_id_seq" we see the last value as 0.
phew... Make sense?
Anyway, can I do any of the following?
Replicate the session tables with bucardo, so each DB's session is shared?
Manipulate the default auto-increment function above to take into account the max existing items in the table?
If you have any better ideas, please feel free to throw them in. Questions just ask, thanks for any help.
You are going to have to change your id generation method, because there is no Bucardo solution according to this comment in the FAQ.
Can Bucardo replicate DDL?
No, Bucardo relies on triggers, and Postgres does not yet provide DDL
triggers or triggers on its system tables.
Since Bucardo uses triggers, it cannot "see" the sequence changes, only the data in tables, which it replicates. Sequences are interesting objects that do not support triggers, but you can manually update them. I suppose you could add something like the code below before the INSERT, but there still might be issues.
SELECT setval('oauth_sessions_id_seq', (SELECT MAX(did) FROM distributors));
See this question for more information.
I am not fully up on all the issues involved, but you could perform the maximum calculation manually and do the insert operation in a re-try loop. I doubt it will work if you are actually doing inserts on both DBs and allowing Bucardo to replicate, but if you can guarantee that only one DB updates at a time, then you could try something like an UPSERT retry loop. See this post for more info. The "guts" of the loop might look like this:
INSERT INTO distributors (did, dname)
VALUES ((SELECT max(did)+1 FROM distributors), 'XYZ Widgets');
Irrespective of the DB (PostgreSQL, Oracle, etc.), dynamic sequence was created for each of the table which has the primary key associated with it.
Most of the sequences go out of sync whenever a huge import of data is happened or some person has manually modified the sequence of the table.
Solution: The only way we can set back the sequence is by taking the max value of the PK table and set the sequence next val to it.
The below query will list you out all the sequences created in your DB schema:
SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
SELECT MAX('primary_key') from table;
SELECT setval('the_primary_key_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);
I am using Sybase DB with TSQL.
The follow snippet of TSQL code is very simple and I need to perform it several 100,000 times (large database) so I would really like to improve its performance in any way possible:
BEGIN TRANSACTION
INSERT INTO
DESTINATION_TABLE
SELECT
COLUMNS
FROM
SOURCE_TABLE
WHERE
ORDER_ID = #orderId
DELETE FROM
SOURCE_TABLE
WHERE
ORDER_ID = #orderId
COMMIT TRANSACTION
As one can see, I am inserting and removing the same set of rows based on the same condition.
Is there a way to improve the performance of this simple query?
Thanks.
If you are inserting more than a few rows, you really need to do a bulk insert. Calling this method 100,000 times, passing it an ID every time, is a linear-processing mindset. Databases are for set operations.
Construct a temporary table of ID's that you need to insert and delete. Then do a bulk insert by joining on the ID's in that table, and similarly a bulk delete.