I am using Sybase DB with TSQL.
The follow snippet of TSQL code is very simple and I need to perform it several 100,000 times (large database) so I would really like to improve its performance in any way possible:
BEGIN TRANSACTION
INSERT INTO
DESTINATION_TABLE
SELECT
COLUMNS
FROM
SOURCE_TABLE
WHERE
ORDER_ID = #orderId
DELETE FROM
SOURCE_TABLE
WHERE
ORDER_ID = #orderId
COMMIT TRANSACTION
As one can see, I am inserting and removing the same set of rows based on the same condition.
Is there a way to improve the performance of this simple query?
Thanks.
If you are inserting more than a few rows, you really need to do a bulk insert. Calling this method 100,000 times, passing it an ID every time, is a linear-processing mindset. Databases are for set operations.
Construct a temporary table of ID's that you need to insert and delete. Then do a bulk insert by joining on the ID's in that table, and similarly a bulk delete.
Related
In my Postgresql schema I have a jobs table and an accounts table. Once a day I need to schedule a job for each account by inserting a row per account into the jobs table. This can done using a simple INSERT INTO.. SELECT FROM statement, but is there any empirical way I can know if I am straining my DB by this bulk insert and whether I should chunk the inserts instead?
Postgres often does miraculous work so I have no idea if bulk inserting 500k records at a time is better than 100 x 5k, for example. The bulk insert works today but can take minutes to complete.
One additional data point: the jobs table has a uniqueness constraint on account ID, so this statement includes an ON CONFLICT clause too.
In PostgreSQL, it doesn't matter how many rows you modify in a single transaction, so it is preferable to do everything in a single statement so that you don't end up with half the work done in case of a failure. The only consideration is that transactions should not take too long, but if that happens once a day, it is no problem.
My use-case is that I need to copy a few columns from TABLE A to another TABLE B and also derive values of a few other columns of TABLE B by some calculation.
As per current estimation around 50,000 rows will be inserted on daily basis in TABLE A.
TABLE B should be updated with all data before End of day.
Hence, either I can use trigger which will be invoked on INSERT operation on TABLE A or schedule some Job at EOD which read all data in bulk from TABLE A , do some calculation and insert in TABLE B.
As I am new to trigger, I am not sure which option should i pick for this use-case. Any suggestion which would be a better approach ?
So far what I have read about triggers, they can slow down DBs performance if they are invoked frequently.
As around 50,000 insert operation will happen daily , so can I assume that 50,000 falls under heavy operations where triggers would not be beneficial ?
EDIT 1 : 50,000 insert operation will reach 100,000 insert operations daily
Postgres DB is used.
If you are doing bulk COPY into an unindexed table, adding a simple trigger will slow you down by a lot (like 5 fold). But if you are using single-row INSERTs or the table is indexed, the marginal slow down of adding a simple trigger will be pretty small.
50,000 inserts per day is not very many. You should have no trouble using a trigger on that, unless the trigger has to scan a large table on each call or something.
We currently store things into Redis for temporary aggregation and have a worker that goes and does insertion in bulk into Postgres. Is there a way that we can do bulk insert across multiple schemas in a single Insert transaction? This will remove the need to aggregate things in Redis. Or, is there a better way to aggregate the requests?
Thanks for the help in advance.
It really depends on what you mean with "single insert transaction".
One single INSERT statement can only affect one specific table. However you could still BEGIN a transaction (depends on implementation), perform all of your INSERT in there and then COMMIT the transaction.
This would still be more efficient than performing all the INSERTs on many transactions since it avoid redundant "hand shakings".
https://www.postgresql.org/docs/current/sql-begin.html
Have you tried creating an update-able view that references two tables and then bulk insert in to this view?
Are you looking for something like this?
with data (c1, c2) as (
values (1,2),(10,20),(30,40)
), s1_insert as (
insert into schema_one.table_1(c1, c2)
select c1, c2
from data
)
insert into schema_two.table_2(col1, col2)
select c1, c2
from data;
If you execute a insert statement a single transaction will happen and you can only insert in a single table(So inserting across multiple schema is altogether not possible in a single transaction).
Problem is following: remove all records from one table, and insert them to another.
I have a table that is partitioned by date criteria. To avoid partitioning each record one by one, I'm collecting the data in one table, and periodically move them to another table. Copied records have to be removed from first table. I'm using DELETE query with RETURNING, but the side effect is that autovacuum is having a lot of work to do to clean up the mess from original table.
I'm trying to achieve the same effect (copy and remove records), but without creating additional work for vacuum mechanism.
As I'm removing all rows (by delete without where conditions), I was thinking about TRUNCATE, but it does not support RETURNING clause. Another idea was to somehow configure the table, to automatically remove tuple from page on delete operation, without waiting for vacuum, but I did not found if it is possible.
Can you suggest something, that I could use to solve my problem?
You need to use something like:
--Open your transaction
BEGIN;
--Prevent concurrent writes, but allow concurrent data access
LOCK TABLE table_a IN SHARE MODE;
--Copy the data from table_a to table_b, you can also use CREATE TABLE AS to do this
INSERT INTO table_b AS SELECT * FROM table_a;
--Zeroying table_a
TRUNCATE TABLE table_a;
--Commits and release the lock
COMMIT;
I am using PostgreSQL database for live project. In which, I have one table with 8 columns.
This table contains millions of rows, so to make search faster from table, I want to delete and store old entries from this table to new another table.
To do so, I know one approach:
first select some rows
create new table
store this rows in that table
than delete from main table.
But it takes too much time and it is not efficient.
So I want to know what is the best possible approach to perform this in postgresql database?
Postgresql version: 9.4.2.
Approx number of rows: 8000000
I want to move rows: 2000000
You can use CTE (common table expressions) to move rows in a single SQL statement (more in the documentation):
with delta as (
delete from one_table where ...
returning *
)
insert into another_table
select * from delta;
But think carefully whether you actually need it. Like a_horse_with_no_name said in the comment, tuning your queries might be enough.
This is a sample code for copying data between two table of same.
Here i used different DB, one is my production DB and other is my testing DB
INSERT INTO "Table2"
select * from dblink('dbname=DB1 dbname=DB2 user=postgres password=root',
'select "col1","Col2" from "Table1"')
as t1(a character varying,b character varying);