Azure SQL Data Warehouse "Impacted RowCount"

Azure SQL Data Warehouse "Impacted RowCount" - tsql

In SQL Azure DB when update statement is executed,we can get the impacted row count by ##ROWCOUNT, similarly in Azure SQL Data Warehouse I am unable to get the impacted Row Count. Is there a way to fetch the impacted row count in Azure SQL Data Warehouse.

You can find common workarounds for SQL DW https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-migrate-code/
The workaround for ##rowcount is:
SELECT SUM(row_count) AS row_count
FROM sys.dm_pdw_sql_requests
WHERE row_count <> -1
AND request_id IN
( SELECT TOP 1 request_id
FROM sys.dm_pdw_exec_requests
WHERE session_id = SESSION_ID()
ORDER BY end_time DESC
)
;

I have noticed the above work-arounds can provide erroneous results as they can be impacted by other request steps i.e. when performing an UPDATE one of my tables had related materialised views so INSERT actions were also performed as additional request/dm steps which inflate the row counts.
For SELECTS I also use the modified code below which is normally embedded into a rowcount stored proc with a #QueryLabel parameter. Generally I always use query labels and in the case of queries where I require a rowcount I add in the ##SPID to the label text. This is also because I can't rely on the session id used in the above example due to heavy use of dynamic queries/exec which spawns different session id's.
-- Use a simple methods for SELECT statements first
SET #RowCount =
(
SELECT
TOP 1 row_count
FROM sys.dm_pdw_request_steps
WHERE row_count >= 0
AND request_id IN
(
SELECT TOP 1 request_id
FROM sys.dm_pdw_exec_requests
WHERE CHARINDEX(#QueryLabel, [label]) > 0
AND operation_type = 'ReturnOperation'
ORDER BY end_time DESC -- ok if request has ended
)
)
For UPDATE/INSERT/DELETE I had to look at separate code to see if there are additional predicates to correlate to the specific request/dm actions for the DML I am interested in. But on profiling the various system views when performing UPDATE/INSERT/DELETE the solution seemed to be the same as for SELECT but with just the use of a different operation_type.
So the code which seems to work OK (more testing required perhaps) for UPDATE/INSERT/DELETE is ...
-- Query which works for UPDATE/INSERT/DELETE
SET #RowCount =
(
SELECT
TOP 1 row_count
FROM sys.dm_pdw_request_steps
WHERE row_count >= 0
AND request_id IN
(
SELECT TOP 1 request_id
FROM sys.dm_pdw_exec_requests
WHERE CHARINDEX(#QueryLabel, [label]) > 0
AND operation_type = 'OnOperation'
ORDER BY end_time DESC
)
)

In addition to the correct answer above. In case you use a dynamic SQL then you should use "TOP 4" to get to the row count. Otherwise the ##rowcount workaround would return nothing. Correction: changing the subsql to this corrects the dynamic sql rowcount issue:
SELECT TOP 1 request_id
FROM sys.dm_pdw_exec_requests
WHERE session_id = SESSION_ID()
ORDER BY end_time DESC, start_time DESC

Related

How can I use a UNION statement or an OR statement inside psql's UPDATE command?

I have an app that vends a 'code' to users through an api. A code belongs to a pool of codes that when a user hits an endpoint, he/she will get a code from this 'pool'. At the moment there is only 1 'pool' of codes where a code can be vended. That idea is best expressed below in the following sql.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
)
RETURNING *;
SQL
So basically, when the end point is pinged, I am returning a code that is active and its vended_at field is NULL.
Now what I need to do is to build off of this sql so that a user can get a code from this pool or from a second pool. So for example, lets say that if the user couldn't get a code from this pool (we will call it A represented by the above sql), I need to vend a code from another pool (we will call it B).
I looked up the documentation of postgresql and I think what I want to do is to either 1). Use a UNION somehow to combine pools A and B into one megapool to vend a code or if I can't vend a code through pool A, use postgresql's OR clause to select from pool B.
The problem is that I can't seem to be able to use either of these syntaxes. I've tried something along the lines like this, tweaking it with different variations.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
) UNION (
######## SELECT SOME OTHER STUFF #########
)
RETURNING *;
SQL
or
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
) OR (
######## SELECT SOME OTHER STUFF USING OR #########
)
RETURNING *;
SQL
So far the syntax is off and I'm starting to wonder if I can even use this approach for what I'm trying to do. I can't determine if my approach is wrong or if maybe I am using UNION, OR, and SUB-SELECTS wrong. Does anyone have any advice I can try to accomplish my goal? Thank you.
####### EDIT ########
To illustrate and make the concept even easier, I essentially want to do this.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
CRITERIA 1
)
OR/UNION
(
CRITERIA 2
)
RETURNING *;
SQL

Use one table to store both pools.
Add a pool_number column to the codes table to indicate which pool the code is in, then just add
ORDER BY pool_number
to your existing query.

Update a very large table in PostgreSQL without locking

I have a very large table with 100M rows in which I want to update a column with a value on the basis of another column. The example query to show what I want to do is given below:
UPDATE mytable SET col2 = 'ABCD'
WHERE col1 is not null
This is a master DB in a live environment with multiple slaves and I want to update it without locking the table or effecting the performance of the live environment. What will be the most effective way to do it? I'm thinking of making a procedure that update rows in batches of 1000 or 10000 rows using something like limit but not quite sure how to do it as I'm not that familiar with Postgres and its pitfalls. Oh and both columns don't have any indexes but table has other columns that has.
I would appreciate a sample procedure code.
Thanks.

There is no update without locking, but you can strive to keep the row locks few and short.
You could simply run batches of this:
UPDATE mytable
SET col2 = 'ABCD'
FROM (SELECT id
FROM mytable
WHERE col1 IS NOT NULL
AND col2 IS DISTINCT FROM 'ABCD'
LIMIT 10000) AS part
WHERE mytable.id = part.id;
Just keep repeating that statement until it modifies less than 10000 rows, then you are done.
Note that mass updates don't lock the table, but of course they lock the updated rows, and the more of them you update, the longer the transaction, and the greater the risk of a deadlock.
To make that performant, an index like this would help:
CREATE INDEX ON mytable (col2) WHERE col1 IS NOT NULL;

Just an off-the-wall, out-of-the-box idea. Both col1 and col2 must be null to qualify precludes using an index, perhaps building a psudo index might be an option. This index would of course be a regular table but would only exist for a short period. Additionally, this relieves the lock time worry.
create table indexer (mytable_id integer primary key);
insert into indexer(mytable_id)
select mytable_id
from mytable
where col1 is null
and col2 is null;
The above creates our 'index' that contains only the qualifying rows. Now wrap an update/delete statement into an SQL function. This function updates the main table and deleted the updated rows from the 'index' and returns the number of rows remaining.
create or replace function set_mytable_col2(rows_to_process_in integer)
returns bigint
language sql
as $$
with idx as
( update mytable
set col2 = 'ABCD'
where col2 is null
and mytable_id in (select mytable_if
from indexer
limit rows_to_process_in
)
returning mytable_id
)
delete from indexer
where mytable_id in (select mytable_id from idx);
select count(*) from indexer;
$$;
When the functions returns 0 all rows initially selected have been processed. At this point repeat the entire process to pickup any rows added or updated which the initial selection didn't identify. Should be small number, and process is still available needed later.
Like I said just an off-the-wall idea.
Edited
Must have read into it something that wasn't there concerning col1. However the idea remains the same, just change the INSERT statement for 'indexer' to meet your requirements. As far as setting it in the 'index' no the 'index' contains a single column - the primary key of the big table (and of itself).
Yes you would need to run multiple times unless you give it the total number rows to process as the parameter. The below is a DO block that would satisfy your condition. It processes 200,000 on each pass. Change that to fit your need.
Do $$
declare
rows_remaining bigint;
begin
loop
rows_remaining = set_mytable_col2(200000);
commit;
exit when rows_remaining = 0;
end loop;
end; $$;

Calculate performance of sql query

I have a query which takes parameter and one which doesn't.
Was trying to compare their performances.
Currently for the query one alone I can see two results.That is totally I get three cost % in my execution plan.
Is there a way to get the two performance cost % instead of three?
----Query 1
DECLARE #p2 DATETIME;
SET #p2 = (
SELECT max(date_created)
FROM log_table
WHERE user_id = 1
)
SELECT max(date_created) AS last_login
FROM log_table
WHERE date_created <= #p2
----Query 2
SELECT max(date_created) AS last_login
FROM log_table
WHERE user_id = 1

There is no way to do what you're looking for.
SQL Server produces a plan for each query in the batch.
And you have exactly 3 queries in your batch, because
SELECT max(date_created)
FROM log_table
WHERE user_id = 1
is also a query.
A compartment of query costs is not always the best way to compare performance.
I suggest you compare the number of reads or average execution time.
Or simply execute them separately, that'll do the trick as well.

nested SELECT statements interact in ways that I don't understand

I thought I understood how I can do a SELECT from the results of another SELECT statement, but there seems to be some sort of blurring of scope that I don't understand. I am using SQL Server 2008R2.
It is easiest to explain with an example.
Create a table with a single nvarchar column - load the table with a single text value and a couple of numbers:
CREATE TABLE #temptable( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES('apple');
INSERT INTO #temptable( a )
VALUES(1);
INSERT INTO #temptable( a )
VALUES(2);
select * from #temptable;
This will return: apple, 1, 2
Use IsNumeric to get only the rows of the table that can be cast to numeric - this will leave the text value apple behind. This works fine.
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1 ;
This returns: 1, 2
However, if I use that exact same query as an inner select, and try to do a numeric WHERE clause, it fails saying cannot convert nvarchar value 'apple' to data type int. How has it got the value 'apple' back??
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
where x.NumA > 1
;
Note that the failing query works just fine without the WHERE clause:
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
;
I find this very surprising. What am I not getting? TIA

If you take a look at the estimated execution plan you'll find that it has optimized the inner query into the outer and combined the WHERE clauses.
Using a CTE to isolate the operations works (in SQL Server 2008 R2):
declare #temptable as table ( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES ('apple'), ('1'), ('2');
with Numbers as (
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
)
select * from Numbers

The reason you are getting this is fair and simple. When a query is executed there are some steps that are being followed. This is a parse, algebrize, optimize and compile.
The algebrize part in this case will get all the objects you need for this query. The optimize will use these objects to create a best query plan which will be compiled and executed...
So, when you look into that part you will see it will do a table scan on #temptable. And #temptable is defined as the way you created your table. That you will do some compute on it is a different thing..... The column still has the nvarchar datatype..
To know how this works you have to know how to read a query. First all the objects are retrieved (from table, inner join table), then the predicates (where, on), then the grouping and such, then the select of the columns (with the cast) and then the orderby.
So with that in mind, when you have a combination of selects, the optimizer will still process it that way.. since your select is subordinate to the from and join parts of your query, it will be a reason for getting this error.
I hope i made it a little clear?

The optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data (the evaluation order of the predicates is not guaranteed). I think using the case expression like bellow produces a NULL in absence of the ELSE clause and thus takes the APPLE out
select a from #temptable where case when isnumeric(a) = 1 then a end > 1

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)

Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;

Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only

Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki

Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.

#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.

Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000

The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.

You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.

There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.

Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Azure SQL Data Warehouse "Impacted RowCount" - tsql

In SQL Azure DB when update statement is executed,we can get the impacted row count by ##ROWCOUNT, similarly in Azure SQL Data Warehouse I am unable to get the impacted Row Count. Is there a way to fetch the impacted row count in Azure SQL Data Warehouse.

Related

How can I use a UNION statement or an OR statement inside psql's UPDATE command?

Update a very large table in PostgreSQL without locking

Calculate performance of sql query

nested SELECT statements interact in ways that I don't understand

Equivalent of LIMIT for DB2

Categories

Resources