Removing empty tables from TSQL sproc output data sets? - tsql

I have a TSQL sproc that does three loops in order to find relevant data. If the first loop renders no results, then the second one normally does. I append another table that has multiple values that I can use later on.
So at most I should only have two tables returned in the dataset from the sproc.
The issue is that if the first loop is blank, I then end up with three data tables in my data set.
In my C# code, I can remove this empty table, but would rather not have it returned at all from the sproc.
Is there a way to remove the empty table from within the sproc, given the following:
EXEC (#sqlTop + #sqlBody + #sqlBottom)
SET #NumberOfResultsReturned = ##ROWCOUNT;
.
.
.
IF #NumberOfResultsReturned = 0
BEGIN
SET #searchLoopCount = #searchLoopCount + 1
END
ELSE
BEGIN
-- we have data, so no need to run again
BREAK
END
The process goes as follows: On the first loop there could be no results. Thus the rowcount will be zero because the EXEC executes a dynamically created SQL query. That's one table.
In the next iteration, results are returned, making that two data tables in the dataset output, plus my third one added on the end.
I didn't want to do a COUNT(*) then if > 0 then perform the query as I want to minimize the queries.
Thanks.

You can put the result for your SP in a table variable and then check if the table variable has any data in it.
Something like this with a SP named GetData that returns one integer column.
declare #T table(ID int)
declare #SQL varchar(25)
-- Create dynamic SQL
set #SQL = 'select 1'
-- Insert result from #SQL to #T
insert into #T
exec (#SQL)
-- Check for data
if not exists(select * from #T)
begin
-- No data continue loop
set #searchLoopCount = #searchLoopCount + 1
end
else
begin
-- Have data so wee need to query the data
select *
from #T
-- Terminate loop
break
end

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(
You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.
As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement
Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

Import Proc definition from network file with BULK

I'm trying to create 1000+ Procs in MS SQL from supplied physical files as part of legacy migration located on Network . For now I plan to use sp with dynamic SQL to loop over all of them like in segment below, I had problem with BULK ROWTERMINATOR, so I just dummied it with bunch of ZZZZ, is there any other correct way to set it to NONE, so all string will be loaded into single row for run. I also use Nvarchar(Max) for my field.
DROP TABLE IF EXISTS #imp;
CREATE TABLE #imp (Col varchar(max))
BULK INSERT #imp
FROM '//TFSNetwork/log/Install/sp_Test02.sql'
WITH (ROWTERMINATOR = '\nzzzzzzzzzZZZ') ---<< ?????
select top 1 #Sql = Col from #imp
EXEC (#sql);
----------------------------------------------------sp_Test02.sql
CREATE PROCEDURE [dbo].[sp_Test]
AS
BEGIN
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET NOCOUNT ON;
SELECT GETDATE() AS TS
END
-----------------------------------------------------------------
Load whole file into single row/column
ROWTERMINATOR = '\n' is what used by default ,that's why you get it once omitted at all. Don't think we can or will want to change this behavior rather use your Z combo).
Same thing can be done with another BULK , in this case no need any ROWTERM options.
declare #myFile varchar(max)
select #myFile = BulkColumn
from openrowset(BULK '//Network/Path/Test02.sql', single_blob) x;
SELECT #myFile

STRING_split performance Issue

I have two queries which split comma separated list into rows and insert into table variable.
For first query I have used custom function which is:
USER defined Function for Spilt.
Create FUNCTION [dbo].[Split_S]
(
#sInputList VARCHAR(MAX)
,#sDelimiter VARCHAR(8)
)
RETURNS #List TABLE ([item] VARCHAR(8000))
AS
BEGIN
DECLARE #sItem VARCHAR(MAX)
WHILE CHARINDEX(#sDelimiter,#sInputList,0) <> 0
BEGIN
SELECT
#sItem=RTRIM(LTRIM(SUBSTRING(#sInputList,1,CHARINDEX(#sDelimiter,#sInputList,0)-1)))
,#sInputList=RTRIM(LTRIM(SUBSTRING(#sInputList,CHARINDEX(#sDelimiter,#sInputList,0)+LEN(#sDelimiter),LEN(#sInputList))))
IF LEN(#sItem) > 0
INSERT INTO #List SELECT #sItem
END
IF LEN(#sInputList) > 0
INSERT INTO #List SELECT #sInputList-- Put the last item in
RETURN
END
Query 1 :
DECLARE #F TABLE(F BIGINT)
INSERT INTO #F
SELECT [item] FROM [dbo].[Split_S]
(N'82,13,51,68,6',',')
Query 2 :
DECLARE #F2 TABLE(F BIGINT)
INSERT INTO #F2
SELECT Value
from
STRING_SPLIT(N'82,13,51,68,6',',')
Query Plan of Both Query
Why 37% and using STRING_SPLIT Its 63% .
but if i only compare select statement then query cost of STRING_SPLIT is 1%.
Which query has better performance and why?
If you will check only the part of the query that include the select query, then you will get that using STRING_SPLIT gives much better performance according too execution plan (EP). the result will be 99% vs 1%.
But when we use the data that returned by the STRING_SPLIT function (for example "select... into" or like in your case "insert...select'), then you might notice that the server uses "table spool (Eager Spool)" which make the difference. This operator takes the rows and stores them in a hidden temporary object stored in the tempdb database (the idea of using this logic is that the spooled data can be reused later in the execution plan). The "eager" spool takes ALL rows from the previously operator at one time, which means that this is "blocking operator".

UPDATE-no-op in SQL MERGE statement

I have a table with some persistent data in it. Now when I query it, I also have a pretty complex CTE which computes the values required for the result and I need to insert missing rows into the persistent table. In the end I want to select the result consisting of all the rows identified by the CTE but with the data from the table if they were already in the table, and I need the information whether a row has been just inserted or not.
Simplified this works like this (the following code runs as a normal query if you like to try it):
-- Set-up of test data, this would be the persisted table
DECLARE #target TABLE (id int NOT NULL PRIMARY KEY) ;
INSERT INTO #target (id) SELECT v.id FROM (VALUES (1), (2)) v(id);
-- START OF THE CODE IN QUESTION
-- The result table variable (will be several columns in the end)
DECLARE #result TABLE (id int NOT NULL, new bit NOT NULL) ;
WITH Source AS (
-- Imagine a fairly expensive, recursive CTE here
SELECT * FROM (VALUES (1), (3)) AS Source (id)
)
MERGE INTO #target AS Target
USING Source
ON Target.id = Source.id
-- Perform a no-op on the match to get the output record
WHEN MATCHED THEN
UPDATE SET Target.id=Target.id
WHEN NOT MATCHED BY TARGET THEN
INSERT (id) VALUES (SOURCE.id)
-- select the data to be returned - will be more columns
OUTPUT source.id, CASE WHEN $action='INSERT' THEN CONVERT(bit, 1) ELSE CONVERT(bit, 0) END
INTO #result ;
-- Select the result
SELECT * FROM #result;
I don't like the WHEN MATCHED THEN UPDATE part, I'd rather leave the redundant update away but then I don't get the result row in the OUTPUT clause.
Is this the most efficient way to do this kind of completing and returning data?
Or would there be a more efficient solution without MERGE, for instance by pre-computing the result with a SELECT and then perform an INSERT of the rows which are new=0? I have difficulties interpreting the query plan since it basically boils down to a "Clustered Index Merge" which is pretty vague to me performance-wise compared to the separate SELECT followed by INSERT variant. And I wonder if SQL Server (2008 R2 with CU1) is actually smart enough to see that the UPDATE is a no-op (e.g. no write required).
You could declare a dummy variable and set its value in the WHEN MATCHED clause.
DECLARE #dummy int;
...
MERGE
...
WHEN MATCHED THEN
UPDATE SET #dummy = 0
...
I believe it should be less expensive than the actual table update.

Navigating the results of a stored procedure via a cursor using T-SQL

Due to a legacy report generation system, I need to use a cursor to traverse the result set from a stored procedure. The system generates report output by PRINTing data from each row in the result set. Refactoring the report system is way beyond scope for this problem.
As far as I can tell, the DECLARE CURSOR syntax requires that its source be a SELECT clause. However, the query I need to use lives in a 1000+ line stored procedure that generates and executes dynamic sql.
Does anyone know of a way to get the result set from a stored procedure into a cursor?
I tried the obvious:
Declare Cursor c_Data For my_stored_proc #p1='foo', #p2='bar'
As a last resort, I can modify the stored procedure to return the dynamic sql it generates instead of executing it and I can then embed this returned sql into another string and, finally, execute that. Something like:
Exec my_stored_proc #p1='foo', #p2='bar', #query='' OUTPUT
Set #sql = '
Declare Cursor c_Data For ' + #query + '
Open c_Data
-- etc. - cursor processing loop etc. goes here '
Exec #sql
Any thoughts? Does anyone know of any other way to traverse the result set from a stored proc via a cursor?
Thanks.
You could drop the results from the stored proc into a temp table and select from that for your cursor.
CREATE TABLE #myResults
(
Col1 INT,
Col2 INT
)
INSERT INTO #myResults(Col1,Col2)
EXEC my_Sp
DECLARE sample_cursor CURSOR
FOR
SELECT
Col1,
Col2
FROM
#myResults
Another option may be to convert your stored procedure into a table valued function.
DECLARE sample_cursor CURSOR
FOR
SELECT
Col1,
Col2
FROM
dbo.NewFunction('foo', 'bar')
You use INSERT ... EXEC to push the result of the procedure into a table (can be a temp #table or a #table variable), the you open the cursor over this table. The article in the link discusses the problems that may occur with this technique: it cannot be nested and it forces a transaction around the procedure.
You could execute your SP into a temporary table and then iterate over the temporary table with the cursor
create table #temp (columns)
insert into #temp exec my_stored_proc ....
perform cursor work
drop table #temp