I need to insert a Big amount of data(Some Millions) and I need to perform it Quickly.
I read about Bulk insert via ODBC on .NET and JAVA But I need to perform it directly on the Database.
I also read about Batch Insert but What I have tried have not seemed to work
Batch Insert, Example
I'm executing a INSERT SELECT but it's taking something like 0,360s per row, this is very slow and I need to perform some improvements here.
I would really appreciate some guidance here with examples and documentation if possible.
DATABASE: SYBASE ASE 15.7
Expanding on some of the comments ...
blocking, slow disk IO, and any other 'wait' events (ie, anything other than actual insert/update activity) can be ascertained from the master..monProcessWaits table (where SPID = spid_of_your_insert_update_process) [see the P&T manual for Monitoring Tables (aka MDA tables)]
master..monProcessObject and master..monProcessStatement will show logical/physical IOs for currently running queries [again, see P&T manual for MDA tables]
master..monSysStatement will show logical/physical IOs for recently completed queries [again, see P&T manual for MDA tables]
for UPDATE statements you'll want to take a look at the query plan to see if you're suffering from a poor join order; also of key importance ... direct (fast/good) updates vs deferred (slow/bad) updates; deferred updates can occur for many reasons ... some fixable, some not ... updating indexed columns, poor join order, updates that cause page splits and/or row forwardings
RI (PK/FK) constraints can be viewed with sp_helpconstraint table_name; query plans will also show the under-the-covers joins required when performing RI (PK/FK) validations during inserts/updates/deletes
triggers are a bit harder to locate (an official sp_helptrigger doesn't show up until ASE 16); check the sysobjects.[ins|upd|del]trig where name = your_table - these represent the object id(s) of any insert/update/delete triggers on the table; also check sysobjects records where type = 'TR' and deltrig = object_id(your_table) - provides support for additional insert/update/delete triggers (don't recall at moment if this is just ASE 16+)
if triggers are being fired, need to review the associated query plans to make sure the inserted and deleted tables (if referenced) are driving any queries where these pseudo tables are joined with permanent tables
There are likely some areas I'm forgetting (off the top of my head) ... key take away is that there could be many reasons for 'slow' DML statements.
One (relatively) quick way to find out if RI (PK/FK) constraints or triggers are at play ...
set showplan on
go
insert/update/delete statements
go
Then review the resulting query plan(s); if you see references to any tables other than the ones explicitly listed in the insert/update/delete statements then you're likely dealing with RI constraints and/or triggers.
Related
I'm migrating data from one table to another in an environment where any long locks or downtime is not acceptable, in total about 80000 rows. Essentially the query boils down to this simple case:
INSERT INTO table_2
SELECT * FROM table_1
JOIN table_3 on table_1.id = table_3.id
All 3 tables are being read from and could have an insert at any time. I want to just run the query above, but I'm not sure how the locking works and whether the tables will be totally inaccessible during the operation. My understanding tells me that only the affected rows (newly inserted) will be locked. Table 1 is just being selected, so no harm, and concurrent inserts are safe so table 2 should be freely accessible.
Is this understanding correct, and can I run this query in a production environment without fear? If it's not safe, what is the standard way to accomplish this?
You're fine.
If you're interested in the details, you can read up on multiversion concurrency control, or on the details of the Postgres MVCC implementation, or how its various locking modes interact, but the implications for your case are nicely summarised in the docs:
reading never blocks writing and writing never blocks reading
In short, every record stored in the database has some version number attached to it, and every query knows which versions to consider and which to ignore.
This means that an INSERT can safely write to a table without locking it, as any concurrent queries will simply ignore the new rows until the inserting transaction decides to commit.
I am facing an issue, possibly quite easy to solve, I am just new to advanced transaction settings.
Every 30 minutes I am running an INSERT query that is getting latest data from a linked server to my client's server, to a table we can call ImportTable. For this I have a simple job that looks like this:
BEGIN TRAN
DELETE FROM ImportTable
INSERT INTO ImportTable (columns)
SELECT (columns)
FROM QueryGettingResultsFromLinkedServer
COMMIT
The thing is, each time the job runs the ImportTable is locked for the query run time (2-5 minutes) and nobody can read the records. I wish the table to be read-accessible all the time, with as little downtime as possible.
Now, I read that it is possible to allow SNAPSHOT ISOLATION in the database settings that could probably solve my problem (set to FALSE at the moment), but I have never played with different transaction isolation types and as this is not my DB but my client's, I'd rather not alter any database settings if I am not sure if it can break something.
I know I could have an intermediary table that the records are inserted to and then inserted to the final table and that is certainly a possible solution, I was just hoping for something more sophisticated and learning something new in the process.
PS: My client's server & database is fairly new and barely used, so I expect very little impact if I change some settings, but still, I cannot just randomly change various settings for learning purposes.
Many thanks!
Inserts wont normally block the table ,unless it is escalated to table level.In this case,you are deleting table first and inserting data again,why not insert only updated data?.for the query you are using transaction level (rsci)snapshot isolation will help you,but you will have an added impact of row version which means sql will store row versions of rows that changed in tempdb.
please see MCM isolation videos of Kimberely tripp for indepth understanding ,also dont forget to test in stage enviornment.
You are making this harder than it needs to be
The problem is the 2-5 minutes that you let be part of a transaction
It is only a few thousand rows - that part takes like a few milliseconds
If you need ImportTable to be available during those few milliseconds then put it in a SnapShot
Delete ImportTableStaging;
INSERT INTO ImportTableStaging(columns)
SELECT (columns)
FROM QueryGettingResultsFromLinkedServer;
BEGIN TRAN
DELETE FROM ImportTable
INSERT INTO ImportTable (columns) with (tablock)
SELECT (columns)
FROM ImportTableStaging
COMMIT
If you are worried about concurrent update to ImportTableStaging then use a #temp
Before I try to insert a row into a PostgreSQL table, should I query whether the insert would violate a constraint?
I do check when the insert would cause unwanted side-effects (e.g., auto-increment) upon an error.
But, if there are no possible side effects, is it OK to just blindly try to insert into a table? Or, is it better practice to prevent errors by anticipating them when possible (as advised in Objective-C)?
Also, when performing the insert inside an SQL function, will other queries (e.g., CTEs) inside the function get rolled back if the insert fails?
In general testing before hand is not a good idea because it requires you to explicitly lock tables to prevent other clients from changing or inserting data between your test and inserts. Explicit locking is bad for concurrency.
Serials getting auto incremented by failed inserts is in general not a problem. Just don't assume the values inserted into the database are consecutive.
A database and obj-c are two completely different things. Let the database check for problems, it is much easier to add the appropriate constraints to your schema then it is to check everything in your client program.
The default is to rollback to the start of the transaction. But you can control it with savepoints and rollback to savepoint. However a CTE is part of the query and queries are always rolled back completely when part of them fails. However you might be able to work around that by splitting the CTE of into a full query that creates a temp table. Then you can use the temp table instead of the cte.
I am a very newbie with this SQL Plus and Oracle 10g thing.So,please don't mind the stupid questions.
See, what problem I am facing is that whenever i fire a query over a table;
SELECT * FROM emp;
The output comes out to be "no rows selected".
I am in utter dilemma as the table and its schema is clearly preserved but the data which I entered is not getting displayed. The same is happening for all the user generated tables. The tuples are not getting displayed. Is this the problem related to SQL Plus???
Kindly help and give me a proper guide.
In Oracle, every statement that you issue is part of a transaction. Those transactions need to either be committed (in which case the changes are made permanent) or rolled back (in which case the changes are reverted) before another session can see the data. Some databases either do not support transactions (i.e. MyISAM tables in MySQL) or do not implicitly start transactions (i.e. SQL Server). The Oracle approach is generally far superior-- when you inadvertently run a delete statement that is missing an important predicate, the ability to rollback the operation when it deletes many more rows that you are expecting can be a real career saver.
In Oracle, when you've run whatever statements comprise your transaction and you are confident that your changes are correct, you need to explicitly issue a commit to make those changes visible to other sessions, i.e.
SQL> insert into some_table( <<columns>> ) values( <<values>> );
SQL> insert into some_other_table( <<columns>> ) values ( <<more values>> );
SQL> commit;
If you are really, really, really sure that you prefer the behavior you might be accustomed to in other tools, you can tell SQL*Plus to autocommit your changes
SQL> set autocommit on;
SQL> <<do whatever>>
That is generally a really bad idea. The tiny benefit you get from not having to explicitly issue a commit is far outweighed by the ability to ensure that other sessions don't see data in an inconsistent state (i.e. if your transferring money from account A to account B by issuing two update statements, you don't want someone seeing an intermediate result where either both accounts have the money or neither account does) and the ability to rollback a change if it turns out to do something other than what you expected.
I am currently transferring a large amount of records from one table to another, summarizing in the process. So, I have a SQL in this general format:
INSERT INTO TargetTable
(Col1,
Col2,
...
ColX)
Tot
)
SELECT
Col1,
Col2,
...
ColX
SUM(TOT)
FROM
SourceTable
GROUP BY
Col1,
Col2,
...
ColX
Is there any performance advantage of moving this SQL into an SSIS task when transferring records from one table to another using a SQL SELECT as a source? For example, is logging turned off?
Secondary question: Are there any tactics that I could use to ensure a maximum transfer rate? For example, removing indexes from the Target table before inserting, locking the table, etc?
In my experience (and, bear in mind that it's been a year and change since I've done this), the only advantage you'd get from SSIS is its ability to make use of the bulk insert task. This adds an additional step, requiring you to export your source data to a flat file before you begin the import process.
Alternatively, if you stick with a SQL statement, the section in this article titled Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging provides the following suggestions:
You can use INSERT INTO SELECT FROM to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
I personally dislike SSIS packages for a particular reason: I have never had a DBA who was dedicated to maintaining them. The data import projects I worked on required a lot of fiddling, as the source data wasn't clean (which I assume won't be a problem for you), so I had many packages that worked just fine in a testing environment with a limited data sample that crashed immediately when deployed into production, which made the process a pain in the neck to deal with.
This is just my opinion, but I would say that unless you or someone else you work with focuses on SSIS packages as a part of database maintenance, then it's easier to maintain and document a process that lives inside a stored procedure.
Set logging as simple. Set the log size high enough to handle the insert. Are others on the sytems? A tablock will help the insert - TargetTable with (tablock). If you have a clustered index on TargetTable order the data that way in the select. If you can accept dirty read SourceTable with (nolock). If you are inserting more than 100,000 records you might want to break up the insert using a where.