How to declare Array variable in Azure Data Warehouse? - tsql

I have a list of strings I use in a sql query similar to this:
select count(*) from sometable where somefield in ('val1','val2',...'valn')
I use this pattern in several queries in a single stored proc. I want to reuse the stored proc, changing the values in the array periodically. Using normal SQL databases, you can declare a table variable type but that is not supported in SQL Data Warehouse. You can use a temp table, but these and table variables require more editing when the values change (requiring insert statements or unions to populate the table). How can I declare an array variable?

Create a varchar variable and use the STRING_SPLIT function in a select statement:
DECLARE #ids varchar(8000)
set #ids = 'Val1,Val2,...ValN'
select count(*) from sometable
where somefield in (SELECT value FROM STRING_SPLIT(#ids, ','))
While this works, I'm not sure how well it scales; for performance reasons, you can fall back to using a temp table - then use VSCode to edit the insert statements (Ctrl+Shift+L is your friend).

Related

Process a row with unknown structure in a cursor

I am new to using cursors for looping through a set of rows. But so far I had prior knowledge of which columns I am about to read.
E.g.
DECLARE db_cursor FOR
SELECT Column1, Column2
FROM MyTable
DECLARE #ColumnOne VARCHAR(50), #ColumnTwo VARCHAR(50)
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #ColumnOne, #ColumnTwo
...
But the tables I am about to read into my key/value table have no specific structure and I should be able to process them one row at a time. How, using a nested cursor, can I loop through all the columns of the fetched row and process them according to their type and name?
TSQL cursors are not really designed to read data from tables of unknown structure. The two possibilities I can think of to achieve something in that direction are:
First read the column names of an unknown table from the Information Schema Views (see System Information Schema Views (Transact-SQL)). Then use dynamic SQL to create the cursor.
If you simply want to get any columns as a large string value, you might also try a simple SELECT * FROM TABLE_NAME FOR XML AUTO and further process the retrieved data for your purposes (see FOR XML (SQL Server)).
SQL is not very good in dealing with sets generically. In most cases you must know column names, data types and much more in advance. But there is XQuery. You can transform any SELECT into XML rather easily and use the mighty abilities to deal with generic structures there. I would not recommend this, but it might be worth a try:
CREATE PROCEDURE dbo.Get_EAV_FROM_SELECT
(
#SELECT NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #tmptbl TABLE(TheContent XML);
DECLARE #cmd NVARCHAR(MAX)= N'SELECT (' + #SELECT + N' FOR XML RAW, ELEMENTS XSINIL);';
INSERT INTO #tmptbl EXEC(#cmd);
SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM #tmptbl t
CROSS APPLY t.TheContent.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c)
END;
GO
EXEC Get_EAV_FROM_SELECT #SELECT='SELECT TOP 10 o.object_id,o.* FROM sys.objects o';
GO
--Clean-Up for test purpose
DROP PROCEDURE Get_EAV_FROM_SELECT;
The idea in short
The select is passed into the procedure as string. With the SP we create a statement dynamically and create XML from it.
The very first column is considered to be the Row's ID, if not (like in sys.objects) we can write the SELECT and force it that way.
The inner SELECT will read each row and return a classical EAV-list.

How to avoid static query in Postgres?

In a function, I need an array of values which is a result of a simple query like:
SELECT array_agg( some_col ) FROM some_table;
I could declare it in function like:
my_array text[] := SELECT array_agg( some_col ) FROM some_table;
But:
this dataset changes maybe once in some years
this dataset is really small
this function would be called a lot
this dataset needs to be up to date
Is there a way to avoid executing the same query over and over? It is not particularly expensive to call, but due to its static nature, I'd like to avoid it.
I could set trigger on some_table to generate the cached version of my_array on any mutation on the table, but is there a way to hold such a variable all the time for every connection?
I'd like to write this function in SQL or PLPGSQL.
In Postgres you can create materialized views (see the docs). It allows you to store the result of a query, and refresh it whenever you want.
It acts like a virtual table, so it is very cheap to query against.
CREATE MATERIALIZED VIEW mymatview AS SELECT array_agg( some_col ) FROM some_table;
And when you want to refresh it:
REFRESH MATERIALIZED VIEW mymatview;

IF... ELSE... two mutually exclusive inserts INTO #temptable

I need to insert either set A or set B of records into a #temptable, depending on certain condition
My pseudo-code:
IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1;
IF {some-condition}
SELECT {columns}
INTO #t1
FROM {some-big-table}
WHERE {some-filter}
ELSE
SELECT {columns}
INTO #t1
FROM {some-other-big-table}
WHERE {some-other-filter}
The two SELECTs above are exclusive (guaranteed by the ELSE operator). However, SQL compiler tries to outsmart me and throws the following message:
There is already an object named '#t1' in the database.
My idea of "fixing" this is to create #t1 upfront and then executing a simple INSERT INTO (instead of SELECT... INTO). But I like minimalism and am wondering whether this can be achieved in an easier way i.e. without explicit CREATE TABLE #t1 upfront.
Btw why is it NOT giving me an error on a conditional DROP TABLE in the first line? Just wondering.
You can't have 2 temp tables with the same name in a single SQL batch. One of the MSDN article says "If more than one temporary table is created inside a single stored procedure or batch, they must have different names". You can have this logic with 2 different temp tables or table variable/temp table declared outside the IF-Else block.
Using a Dyamic sql we can handle this situation. As a developoer its not a good practice. Best to use table variable or temp table.
IF 1=2
BEGIN
EXEC ('SELECT 1 ID INTO #TEMP1
SELECT * FROM #TEMP1
')
END
ELSE
EXEC ('SELECT 2 ID INTO #TEMP1
SELECT * FROM #TEMP1
')

SELECT .. INTO to create a table in PL/pgSQL

I want to use SELECT INTO to make a temporary table in one of my functions. SELECT INTO works in SQL but not PL/pgSQL.
This statement creates a table called mytable (If orig_table exists as a relation):
SELECT *
INTO TEMP TABLE mytable
FROM orig_table;
But put this function into PostgreSQL, and you get the error: ERROR: "temp" is not a known variable
CREATE OR REPLACE FUNCTION whatever()
RETURNS void AS $$
BEGIN
SELECT *
INTO TEMP TABLE mytable
FROM orig_table;
END; $$ LANGUAGE plpgsql;
I can SELECT INTO a variable of type record within PL/pgSQL, but then I have to define the structure when getting data out of that record. SELECT INTO is really simple - automatically creating a table of the same structure of the SELECT query. Does anyone have any explanation for why this doesn't work inside a function?
It seems like SELECT INTO works differently in PL/pgSQL, because you can select into the variables you've declared. I don't want to declare my temporary table structure, though. I wish it would just create the structure automatically like it does in SQL.
Try
CREATE TEMP TABLE mytable AS
SELECT *
FROM orig_table;
Per http://www.postgresql.org/docs/current/static/sql-selectinto.html
CREATE TABLE AS is functionally similar to SELECT INTO. CREATE TABLE AS is the recommended syntax, since this form of SELECT INTO is not available in ECPG or PL/pgSQL, because they interpret the INTO clause differently. Furthermore, CREATE TABLE AS offers a superset of the functionality provided by SELECT INTO.

Navigating the results of a stored procedure via a cursor using T-SQL

Due to a legacy report generation system, I need to use a cursor to traverse the result set from a stored procedure. The system generates report output by PRINTing data from each row in the result set. Refactoring the report system is way beyond scope for this problem.
As far as I can tell, the DECLARE CURSOR syntax requires that its source be a SELECT clause. However, the query I need to use lives in a 1000+ line stored procedure that generates and executes dynamic sql.
Does anyone know of a way to get the result set from a stored procedure into a cursor?
I tried the obvious:
Declare Cursor c_Data For my_stored_proc #p1='foo', #p2='bar'
As a last resort, I can modify the stored procedure to return the dynamic sql it generates instead of executing it and I can then embed this returned sql into another string and, finally, execute that. Something like:
Exec my_stored_proc #p1='foo', #p2='bar', #query='' OUTPUT
Set #sql = '
Declare Cursor c_Data For ' + #query + '
Open c_Data
-- etc. - cursor processing loop etc. goes here '
Exec #sql
Any thoughts? Does anyone know of any other way to traverse the result set from a stored proc via a cursor?
Thanks.
You could drop the results from the stored proc into a temp table and select from that for your cursor.
CREATE TABLE #myResults
(
Col1 INT,
Col2 INT
)
INSERT INTO #myResults(Col1,Col2)
EXEC my_Sp
DECLARE sample_cursor CURSOR
FOR
SELECT
Col1,
Col2
FROM
#myResults
Another option may be to convert your stored procedure into a table valued function.
DECLARE sample_cursor CURSOR
FOR
SELECT
Col1,
Col2
FROM
dbo.NewFunction('foo', 'bar')
You use INSERT ... EXEC to push the result of the procedure into a table (can be a temp #table or a #table variable), the you open the cursor over this table. The article in the link discusses the problems that may occur with this technique: it cannot be nested and it forces a transaction around the procedure.
You could execute your SP into a temporary table and then iterate over the temporary table with the cursor
create table #temp (columns)
insert into #temp exec my_stored_proc ....
perform cursor work
drop table #temp