INSERT OPENQUERY timeout - tsql

I'm trying to execute and insert query to a linked server in SQL Server.
For that I'm using INSERT INTO OPENQUERY statement.
The linked server is an Apache HIVE using Cloudera ODBC Provider.
The insert operation takes around 1 minute in my setup when performed from HIVE client.
However, SQL INSERT always times out after 30 seconds.
I set the Query Timeout parameter to 0 but it seems to be not affecting INSERT statement, however, it is working fine for SELECT statements taking longer time.
Is this a known limitation?
Is there a way to change the timeout for the insert statement when using OPENQUERY?
EDIT
I would like to clarify the setup I'm working with.
---------- ---------------------- ---------------
| MS SQL | => Linked Server => | Hive ODBC Provider | => | Hive Server |
---------- ---------------------- ---------------
In Hive, I have a table called calc_result where I would like to periodically store calculation results from the SQL server. For example, I try to insert using a query like this.
insert openquery(HIVE, 'select timestamp timestamp , tag tag, value value from calc_result')
values('2019-04-22 11:50:41', 'test',2.0)
The insert operation is captured correctly by HIVE server and a MapReduce job starts. However, the job will be killed after 30 seconds due to timeout.
The SQL server will show the below error message.
OLE DB provider "MSDASQL" for linked server "HIVE" returned message "[Cloudera][Hardy] (72) Query execution timeout expired.".
However, SELECT OPENQUERY works fine and would follow Query Timeout settings of the linked server (Which is set to 0 in this case).

Edit that is completely different use case from what I've imagined. In that case there should not be any difference in select/insert.
As you have configured your linked server timeout, there is a second place in the linked server properties you can check a Command Timeout setting in the provider string:
Other option that comes into my mind is instance wide timout. Default set for 600 seconds (10 minutes) which is way above your 30 seconds. However, you can still try it to see if there is any impact.
For infinite wait:
sp_configure 'show advanced options',1
go
reconfigure
go
sp_configure 'remote query timeout (s)',0
go
reconfigure
go

I would try using SELECT INTO temporary table and then materializing it using regular INSERT INTO:
SELECT c1, c2
INTO #temp_tab
FROM OPENQUERY(mylinkedserver, 'SELECT c1, c2 FROM remote_table');
INSERT INTO normal_table(col1, col2)
SELECT c1, c2
FROM #temp_tab;
EDIT:
You could try wrapping it with transaction and remove aliases:
BEGIN TRAN;
insert openquery(HIVE, 'select timestamp, tag, value from calc_result')
values('2019-04-22 11:50:41', 'test',2.0);
COMMIT;
If necessary set up DTC: How can I enable distributed transactions for a linked server?

While I didn't find a way to change OPENQUERYtimeout from 30 seconds, I found that using EXEC AT Linked Server to work fine for INSERT queries while adhering to timeout settings.
I accidentally stumbled upon the solution in this 2009 blog post. Databases might not be my strength, but I feel SQL Server documentation can be improved. A simple page that lists possible ways to interact with a Linked Server could've saved me lots of retries.

Related

Logging slow queries on Google Cloud SQL PostgreSQL instances

The company I work for uses Google Cloud SQL to manage their SQL databases in production.
We're having performance issues and I thought it'd be a good idea (among other things) to see/monitor all queries above a specific threshold (e.g. 250ms).
By looking at the PostgreSQL documentation I think log_min_duration_statement seems like the flag I need.
log_min_duration_statement (integer)
Causes the duration of each completed statement to be logged if the statement ran for at least the specified number of milliseconds. Setting this to zero prints all statement durations.
But judging from the Cloud SQL documentation I see that is only possible to set a narrow set of database flags (as in for each DB instance) but as you can see from here log_min_duration_statement is not among those supported flags.
So here comes the question. How do I log/monitor my slow PostgreSQL queries with Google Cloud SQL? If not possible then what kind of tool/methodologies do you suggest I use to achieve a similar result?
April 3, 2019 UPDATE
It is now possible to log slow queries on Google Cloud SQL PostgreSQL instances, see https://cloud.google.com/sql/docs/release-notes#april_3_2019:
database_flags = [
{
name = "log_min_duration_statement"
value = "1000"
},
]
Once you enable log_min_duration_statement, you can view the logs using Stackdriver logging. Select Cloud SQL Database -> cloudsql.googleapis.com/postgres.log and you will see the log like this.
[103402]: [9-1] db=cloudsqladmin,user=cloudsqladmin LOG: duration: 11.211 ms statement: [YOUR SQL HERE]
References:
Full list of supported flags (CTRL+F for log_min_duration_statement): https://cloud.google.com/sql/docs/postgres/flags#postgres-l
Issue tracker: https://issuetracker.google.com/issues/74578509#comment54
PostgreSQL docs: https://www.postgresql.org/docs/9.6/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT
The possibility of monitoring slow PostgreSQL queries for Cloud SQL instances is currently not available. As you comment, the log_min_duration_statement flag is currently not supported by Cloud SQL.
Right now, work is being made on adding this feature to Cloud SQL, and you can keep track on the progress made through this link. You can click on the star icon on the top left corner to get email notifications whenever any significant progress has been achieved.
There is a way to log slow queries through the pg_stat_statements extension which is supported by Cloud SQL.
Since Cloud SQL doesn't grant superuser right to any of the users you need to use some workaround.
First, you need to enable the extension with
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
then you can check slow queries with a query like
SELECT pd.datname,
us.usename,
pss.userid,
pss.query AS SQLQuery,
pss.rows AS TotalRowCount,
(pss.total_time / 1000) AS TotalSecond,
((pss.total_time / 1000) / calls) as TotalAverageSecond
FROM pg_stat_statements AS pss
INNER JOIN pg_database AS pd
ON pss.dbid = pd.oid
INNER JOIN pg_user AS us
ON pss.userid = us.usesysid
ORDER BY TotalAverageSecond DESC
LIMIT 10;
As postgres user you can have a look on all slow queries, but since the user is not superuser you will see <insufficient privilege> on all other users' queries.
To get around this limitation you can install the extension on other databases too (normally only postgres user has rigths to install extensions) and you can check the query texts with the owner of the db.
Not ideal by any measure, but what we do is run something like this on a cron once a minute and log out the result:
SELECT EXTRACT(EPOCH FROM now() - query_start) AS seconds, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - query_start > interval '1 seconds' AND query NOT LIKE '%pg_stat_activity%'
ORDER BY seconds DESC LIMIT 20
You'd need to fiddle with the query to get millisecond granularity, and even then it'll only catch queries that overlap with your cron frequency, but probably better than nothing?

View parameter values on currently running procedure?

Using Transact SQL
Just curious, is there a way to view the values of a parameter (aka, the EXEC line run to execute the proc?) of a proc that is currently in the process of running?
Example, I run:
EXEC HelloWorld #SQL = 1
Is there a table or log or anything I can look at WHILE the proc is still running, and see #SQL = 1?
While a procedure is running, you can execute a DBCC INPUTBUFFER command in another window. You will need to know the SPID that is executing your HelloWorld procedure.
If you are running HelloWorld within SQL Server Management Studio, you can see the SPID shown on the status bar at the very bottom of the window. My IDE shows 6 panels on the status bar. The 3rd panel shows the login name with the SPID in parenthesis. Example "YourDomain\YourLogin (59)". The 59 is the SPID you are looking for.
If you are not running the query in SQL Server Management Studio and do not have the SPID readily available, you can execute the following command:
sp_who2
This will show a result set with a row for every connection to the SQL Server Instance. Any SPID below 50 represents an internal process. Anything greater than 50 is a user connection. Based on the information you see in this result set, hopefully you will be able to determine the SPID that is executing HelloWorld.
Once you know the SPID, you can see the command it is currently executing by issuing the following command in a new query window.
DBCC INPUTBUFFER(59)
You will want to replace the 59 above with the actual SPID that you previously determined. Executing the command above will show you the command that is currently executing, including the parameter values.
The best way to get the parameter values passed into procs, especially when executed remotely, is via SQL Server Profiler or Extended Events. If using SQL Server Profiler, you need to capture the following events:
(in Stored Procedures)
RPC: Completed
SP: Completed
(in TSQL)
SQL: BatchCompleted
And you will see the values in the "TextData" field (so you need to select that column for all 3 of those events).
There is no way to get parameter value from dynamic view of functions. But in your case I can recommend you to use the following code inside your procedure:
declare #SQL int = 1
raiserror('#SQL = %i',0,0,#SQL) WITH NOWAIT
WITH NOWAIT is very important thing. With these words won't wait information like if you use print

SQL Server OpenQuery() behaving differently then a direct query from TOAD

The following query works efficiently when run directly against Oracle 11 using TOAD (with native Oracle drivers)
select ... from ... where ...
and srvg_ocd in (
select ocd
from rptofc
where eff_endt = to_date('12/31/9999','mm/dd/yyyy')
and rgn_nm = 'Boston'
) ...
;
The exact same query "never" returns if passed from SQL Server 2008 to the same Oracle database via openquery(). SQL Server has a link to the Oracle database using an Oracle Provider OLE DB driver.
select * from openquery( servername, '
select ... from ... where ...
and srvg_ocd in (
select ocd
from rptofc
where eff_endt = to_date(''12/31/9999'',''mm/dd/yyyy'')
and rgn_nm = ''Boston''
) ...
');
The query doesn't return in a reasonable amount of time, and the user kills the query. I don't know if it would eventually return with the correct result.
This result where the direct TOAD query works efficiently and the openquery() version "never" returns is reproducible.
A small modification to the openquery() gives the correct efficient result: Change eff_endt to trunc(eff_endt).
That is well and good, but it doesn't seem like the change should be necessary.
openquery() is supposed to be pass through, so how can there be a difference between the TOAD and openquery() behavior?
The reason we care is because we frequently develop complex queries with TOAD directly accessing Oracle. Once we have the query functioning and optimized, we convert it to an openquery() string for use in a SQL Server application. It is extremely aggravating to have a query suddenly fail with openquery() when we know it worked as a direct query. Then we have to search for a work-around through trial and error.
I would like to see the Oracle trace files for the two scenarios, but the Oracle server is within another organization, and we are not getting cooperation from the Oracle DBAs.
Does anyone know of any driver, or TOAD, or ??? issues that could account for the discrepancy? Is there any way to eliminate the problem such that both methods always give the same result?
I know you asked this a while ago but I just came across your question.
I agree, they should be the same. Obviously there is a difference. We need to find out where the difference is.
I am thinking out loud as I type...
What happens if you specify just a few column instead of select * from openquery?
How many rows are supposed to be returned?
What if, in the oracle select, you limit the returned rows?
How quickly does the openquery timeout?
Are TOAD and SS on the same machine? Are you RDPing into the SS and running toad from there?
Are they using the same drivers? including bit? (32/64) version?
Are they using the same account on oracle?
It is interesting that using the trunc() makes a difference. I assume [eff_endt] is one of the returned fields?
I am wondering if SS is getting all the rows back but it is choking on doing the date conversions. The date type in oracle may need to be converted to a ss date type before ss shows it to you.
What if you insert the rows from the openquery into a table where the date field is just a (n)varchar. I am thinking ss might just dump the date it is getting back from oracle into that text field without trying to convert it.
something like:
insert into mytable(f1,f2,f3,datetimeX)
select f1,f2,f3,datetimeX from openquery( servername, '
select f1,f2,f3,datetimeX from ... where ...
and srvg_ocd in (
select ocd
from rptofc
where eff_endt = to_date(''12/31/9999'',''mm/dd/yyyy'')
and rgn_nm = ''Boston''
) ...
');
What if toad or ss is modifying the query statement before sending it to oracle. You could fire up wireshark and see what toad and ss are actually sending.
I would be very curious if you get this resolved. I link ss to oracle often and have not run into this issue.
Here are basic things you can check for to see what the database is doing after it receives the query. First, check that the execution plans are the same in TOAD as when the query runs using openquery. You could plan the query yourself in TOAD using:
explain plan set statement_id = 'openquery_test' for <your query here>;
select *
from table(dbms_xplan.display(statement_id => 'openquery_test';
then have someone initiate the query using openquery() and have someone with permissions to view v$ tables to run:
select sql_id from v$session where username = '<user running the query>';
(If there's more than one connection with the same user, you'll have to find an additional attribute to isolate the row representing the session running the query.)
select *
from table(dbms_xplan.display_cursor('<value from query above'));
If those look the same then I'd move on to checking database waits and see what it's stuck on.
select se.username
, sw.event
, sw.p1text
, sw.p2text
, sw.p3text
, sw.wait_time_micro/1000000 as seconds_in_wait
, sw.state
, sw.time_since_last_wait_micro/1000000 as seconds_since_last_wait
from v$session se
inner join
v$session_wait sw
on se.sid = sw.sid
where se.username = '<user running the query>'
;
(again, if there's more than one session with the same username, you'll need another attribute to whittle it down to the one you're interested in.)
If the plans are different, then you need to find out why, or if they're the same, look into what it's waiting on (e.g. SQL*Net message to client ?) and why.
I noticed a difference using OLEDB through MS Access (2013) connecting to Oracle 10g & 11g tables, in that it did not always recognize indexes or primary keys on the Oracle tables properly. The same query through an MS Access 2000 database (using odbc) worked fine / had no problem with indexes & keys. The only way I found to fix the OLEDB version was to include all of the key fields in the SELECT -- which was not a satisfying answer, but it's all I could find. This might be an option to try through SSMS / OpenQuery(...) as well.
Besides that... you can try some alternatives to OPENQUERY, such as:
4-part names: SELECT ... FROM Server..Schema.Table
Execute AT: EXEC ('select...') at linked server
But as for why the OLEDB provider works differently than the native Oracle Provider -- the providers are not identical, and the native provider would be more likely to pave-over Oracle quirks than the more generic OLEDB provider would.

Transaction context in use by another session

I have a table called MyTable on which I have defined a trigger, like so:
CREATE TRIGGER dbo.trg_Ins_MyTable
ON dbo.MyTable
FOR INSERT
AS
BEGIN
SET NOCOUNT ON;
insert SomeLinkedSrv.Catalog.dbo.OtherTable
(MyTableId, IsProcessing, ModifiedOn)
values (-1, 0, GETUTCDATE())
END
GO
Whenever I try to insert a row in MyTable, I get this error message:
Msg 3910, Level 16, State 2, Line 1
Transaction context in use by another session.
I have SomeLinkedSrv properly defined as a linked server (for example, select * from SomeLinkedSrv.Catalog.dbo.OtherTable works just fine).
How can I avoid the error and successfully insert record+execute the trigger?
Loopback linked servers can't be used in a distributed transaction if MARS is enabled.
Loopback linked servers cannot be used in a distributed transaction.
Trying a distributed query against a loopback linked server from
within a distributed transaction causes an error, such as error 3910:
"[Microsoft][ODBC SQL Server Driver][SQL Server]Transaction context in
use by another session." This restriction does not apply when an
INSERT...EXECUTE statement, issued by a connection that does not have
multiple active result sets (MARS) enabled, executes against a
loopback linked server. Note that the restriction still applies when
MARS is enabled on a connection.
http://msdn.microsoft.com/en-us/library/ms188716(SQL.105).aspx
I solve It.
I was using the same linked server to call the second procedure and then into the procedure I was using the same linked server.
It's very Easy, only we have to know the restricctions of linked servers.
I have resolved it by removing linked server used in the stored procedure and then called stored procedure by the same linked server. It wasnt working in DEV environement.
One of causes of this situation is a trigger that works for linked-sever database table. An also SQL version of SQL-Server which processes database matters. To avoid this ERROR during sql query execution we should temporarily disable and after execution enable triggers for tables updated. All with database name check. Here is an example:
Select * From People where PersonId In (#PersonId, #PersonIdRight)
IF 'DOUBLE' = DB_NAME()
ALTER TABLE [dbo].[PeopleSites] DISABLE TRIGGER [PeopleSites_ENTDB_UPDATE]
Update PeopleSites Set PersonId = #PersonIdRight Where PersonId = #PersonId
IF 'DOUBLE' = DB_NAME()
ALTER TABLE [dbo].[PeopleSites] ENABLE TRIGGER [PeopleSites_ENTDB_UPDATE]
Select * From PeopleSites where PersonId In (#PersonId, #PersonIdRight)
I also got the same error in our DEV environemnt, moving the linked databases to another sql instance resolved the issue. In our production environment these databases are already on separate instances
In my case I was using SQL 2005 and got "transaction context in use by another session" when running Insert....exec over a linked server. The fix for me was to patch from SP2 build 3161 to SP3. SP2 cumulative 5 is supposed to fix though.
https://support.microsoft.com/en-us/kb/947486
When remote database sits on the same server,configure the linked server without specifying the database server ip / hostname and port. Just the database name should be sufficient.
I was getting the same "transaction context in use by another session error" when trying to run an UPDATE query:
BEGIN TRAN
--ROLLBACK TRAN
--COMMIT TRAN
UPDATE did
SET did.IsProcessed = 0,
did.ProcessingLockID = NULL
FROM [proddb\production].DLP.dbo.tbl_DLPID did (NOLOCK)
WHERE did.dlpid IN ('bunch of GUIDs')
--WHERE did.DLPID IN (SELECT DLPID FROM #TableWithData)
However I didn't realize I was already trying to run this on the DLP database on the ProdDb\Production server. Once I removed that "[proddb\production].DLP.dbo." prefix from the query, it worked fine.

Creating a connection from Microsoft SQL server to an AS/400

I'm trying to connect from Microsoft SQL server to as AS/400 so i can pull data from the AS/400 then flag the data as being pulled.
I've successfully created and OLE DB "IBMDASQL" connection, and am able to pull data some data, but i'm running into an issue when i try to pull data from a very large table
This runs fine, and returns a count of 170 million:
select count(*)
from transactions
This query executed for 15 hours before i gave up on it. (It should return zero since i haven't flagged anything as 'in process' yet)
select count(*)
from transactions
where processed = 'In process'
I'm a Microsoft guy, but my AS/400 guy says that there is an index on the 'processed' column and that locally, that query run instantaneously.
Any thoughts on what i might be doing wrong? I found a table with only 68 records in it, and was able to run this query in about a second:
select count(*)
from smallTable
where RandomColumn = 'randomValue'
So I know that the AS/400 is at least able to understand that type of query.
I have had to fight this battle many times.
There are two ways of approaching this.
1) Stage your data from the AS400 into SQL server where you can optimize your indexes
2) Ask the AS400 folks to create logical views which speed up data retrieval, your AS400 programmer is correct, index will help but I forget the term they use to define a "view" similar to a sql server view, I beleive its something like "physical" v/s "logical". Logical is what you want.
Thirdly, 170 million is a lot of records, even for a relational database like SQL server, have you considered running an SSIS package nightly that stages your data into your own SQL table to see if it improves performance?
I would suggest this way to have good performance, i suppose you have at least SQL2005, i havent tested yet but this is a tip
Let the AS400 perform the select in native way by creating stored procedure in the AS400
open a AS400 session
launch STRSQL
create an AS400 stored procedure in this way to get/update the recordset
CREATE PROCEDURE MYSELECT (IN PARAM CHAR(10))
LANGUAGE SQL
DYNAMIC RESULT SETS 1
BEGIN
DECLARE C1 CURSOR FOR SELECT * FROM MYLIB.MYFILE WHERE MYFIELD=PARAM;
OPEN C1;
RETURN;
END
create an AS400 stored procedure to update the recordset
CREATE PROCEDURE MYUPDATE (IN PARAM CHAR(10))
LANGUAGE SQL
RESULT SETS 0
BEGIN
UPDATE MYLIB.MYFILE SET MYFIELD='newvalue' WHERE MYFIELD=PARAM;
END
Call those AS400 SP from SQL SERVER
declare #myParam char(10)
set #myParam = 'In process'
-- get the recordset
EXEC ('CALL NAME_AS400.MYLIB.MYSELECT(?) ', #myParam) AT AS400 -- < AS400 = name of linked server
-- update
EXEC ('CALL NAME_AS400.MYLIB.MYUPDATE(?) ', #myParam) AT AS400
Hope it helps
I recommend following the suggestions in the IBM Redbook SQL Performance Diagnosis on IBM DB2 Universal Database for iSeries to determine what's really happening.
IBM technical support can also be extremely helpful in diagnosing issues such as these. Don't be afraid to get in touch with them as the software support is generally included as part of the maintenance contract and there is no charge to talk to them.
I've seen OLEDB connections eat up 100% cpu for hours and when the same query is run through VisualExplain (query analyzer) it estimates mere seconds to execute.
We found that running the query like this performed liked expected:
SELECT *
FROM OpenQuery( LinkedServer,
'select count(*)
from transactions
where processed = ''In process''')
GO
Could this be a collation problem? - your WHERE clause is testing on a text field and if the collations of the two servers don't match this clause will be applied clientside rather than serverside so you are first of all pulling all 170 million records down to the client and then performing the WHERE clause on it there.
Based on the past interactions I have had, the query should take about the same amount of time no matter how you access the data. Another thought would be if you could create a view on the table to get the data you need or use a stored procedure.