I have been playing with Redshift recently, and found an odd (or maybe not so odd) behavior. When a COPY (from S3) is in progress, if I do INSERT INTO in a completely different table in a different schema, the INSERT INTO query takes way too much time. When nothing else is running on the redshift cluster, the INSERT INTO query finishes within 3-5 minutes. But, when a COPY is in progress, the same INSERT INTO query takes 1-2 hours.
Looking at the Redshift dashboard, the odd thing is that read throughput is close to zero. Given that my INSERT INTO query contains a select, I would imagine that the read throughput would be higher. So, it feels like the COPY query is blocking all other writes. I have checked the LOCKs (STV_LOCKS) table and there is no conflict between LOCKS for COPY and INSERT INTO. Is it possible that the COPY query blocks all other writes?
Thanks in advance
You need to check parameter group configuration ( for your cluster in AWS console) -> Workload Management Configuration.
Check for concurrency .By default its 5 . you can increase the value ( max is up to 50) . This will allow concurrent connections. When you are doing copy command some of the connections are used so for insert into query , there might not connections left. So increase the concurrency and check again.
Hope this helps
Related
I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html
Context:
Using PostgreSQL (9.6), for a custom synchronisation project, we have an agent that make a lot of INSERTs between a database_1 and database_2 when syncing data.
For example: DB2 is down during 5 minutes, there are 40,000 new lines in DB1, so when DB2 is up again, all the 40,000 lines will be immediately synced from DB1 to DB2.
All this works great.
Problem/Fact:
During the synchronisation, the INSERT rate is around 1000 lines / second.
However, when we do a simple SELECT count(*) FROM table during the sync (in the middle of these thousands of INSERTs), we noticed that the INSERT rate is falling town to a few dozens per second (instead of 1000x per second).
Question:
Is there any reason why a SELECT operation (made inside pgAdmin, by another process than the syncing process) is slowing down the batch of INSERT ?
Any locking or internal reason that might explain this?
Or should I provide more information? How can I debug more?
Hints:
Logs are fully activated and all the INSERTs always take around 0.700ms (before slowdown and same during slowdown), it doesn't change.
INSERTs are currently performed one row by one row
(I'll be happy to provide more information)
I have a huge process(program with activerecord), which lock different tables for an amount of time.
Now I want to check all my locks during the process. So which tables are locked and for how long. I could use the activity monitor, but I need more information.
Is there a tool like the SQL Server Profiler, which list all locks during a process? Or is somewhere a logtable, which I can check?
Further Information:
There is a process in our program which use half of the tables from our database. Create new rows, update existing rows, select informations... The process runs only during the night. Now they want to run this process during the day and I have to evaluate to possibility of that request. I already checked the sourcecode, but I also want to check the database for longer locks, tablelocks and such stuff, just to be sure. The idea is, to start that process in our test environment and collect all lock informations. But I don't see all locks in the activity monitor and I can't look for an hour over the activity monitor.
There are many DMVS which will help you out to gather lock stats.Run this query based on your frequency through a SQL job and log this to table for later analysis..
--This shows all the locks involved in each session
SELECT resource_type, resource_associated_entity_id,
request_status, request_mode,request_session_id,
resource_description
FROM sys.dm_tran_locks lck
WHERE resource_database_id = db_id()
--You also can use SYS.DM_EXEC_Requests DMV to gather blockings,wait_types to understand more
select status,wait_type,last_wait_type,txt.text from sys.dm_exec_requests ec
cross apply
sys.dm_exec_sql_text(ec.sql_handle) txt
I am facing an issue, possibly quite easy to solve, I am just new to advanced transaction settings.
Every 30 minutes I am running an INSERT query that is getting latest data from a linked server to my client's server, to a table we can call ImportTable. For this I have a simple job that looks like this:
BEGIN TRAN
DELETE FROM ImportTable
INSERT INTO ImportTable (columns)
SELECT (columns)
FROM QueryGettingResultsFromLinkedServer
COMMIT
The thing is, each time the job runs the ImportTable is locked for the query run time (2-5 minutes) and nobody can read the records. I wish the table to be read-accessible all the time, with as little downtime as possible.
Now, I read that it is possible to allow SNAPSHOT ISOLATION in the database settings that could probably solve my problem (set to FALSE at the moment), but I have never played with different transaction isolation types and as this is not my DB but my client's, I'd rather not alter any database settings if I am not sure if it can break something.
I know I could have an intermediary table that the records are inserted to and then inserted to the final table and that is certainly a possible solution, I was just hoping for something more sophisticated and learning something new in the process.
PS: My client's server & database is fairly new and barely used, so I expect very little impact if I change some settings, but still, I cannot just randomly change various settings for learning purposes.
Many thanks!
Inserts wont normally block the table ,unless it is escalated to table level.In this case,you are deleting table first and inserting data again,why not insert only updated data?.for the query you are using transaction level (rsci)snapshot isolation will help you,but you will have an added impact of row version which means sql will store row versions of rows that changed in tempdb.
please see MCM isolation videos of Kimberely tripp for indepth understanding ,also dont forget to test in stage enviornment.
You are making this harder than it needs to be
The problem is the 2-5 minutes that you let be part of a transaction
It is only a few thousand rows - that part takes like a few milliseconds
If you need ImportTable to be available during those few milliseconds then put it in a SnapShot
Delete ImportTableStaging;
INSERT INTO ImportTableStaging(columns)
SELECT (columns)
FROM QueryGettingResultsFromLinkedServer;
BEGIN TRAN
DELETE FROM ImportTable
INSERT INTO ImportTable (columns) with (tablock)
SELECT (columns)
FROM ImportTableStaging
COMMIT
If you are worried about concurrent update to ImportTableStaging then use a #temp
I need to insert a table from a master table having 2 billion records . Insert needs to satisfy some conditons and also in the some columns to be calculated and then it has to be inserted.
I am having 2 options but I dont know which to follow to improve performance.
1 option
Create a cursor by filtering from master table with the conditons. and get one by one record for caluclation and then last insertion to the child table
2 option
insert first using into conditon and then calculation using update statement.
Please Assist.
Having a cursor to get data, perform calculation, and then insert into the database will be time consuming. My guess is that since it involves data connections and I/O for each retrieval and insertion (for both the databases )
Databases are usually better with bulk operations, so it will definitely give you better performance if you use Option 2. Option 2 is better for troubleshooting also ( as the process is cleanly separated - step1: download, step2: calculate) than Option 1 where in case of an error in the middle of the process, you'll be forced to redo all the steps again.
Opening a cursor and inserting records one by one might have serious performance issues at the volumes on the order of a Billion . Especially if you have a weak network between your Database tier and App tier . The fastest way to do this could be to use Db2 export utility to download data , let the program manipulate the data from the file and later load the file back to the child table . Apart from the file based option you can also consider the following approaches
1) Write an SQL stored procedure (No need to ship the data out of the database to make changes )
2) If you using Java/JDBC use Batch Update feature to update multiple records at the same time
3) If you using a tool like Informatica, turn on the bulk load feature in informatica
Also see the IBM DW article on imporving insert performance . The article is a little bit older but concepts are still valid . http://www.ibm.com/developerworks/data/library/tips/dm-0403wilkins/