SQL Server 2012 - Is use of cursor/dynamicSQL combo good practice? - tsql

Most common answers are: NO, don't use cursors, don't use dynamic SQL
But this question is to solicit feedback from a coding style that seems nifty but can be bad practice, heavier processing speeds, and SQL injections(?).
I learned this style due to an annoyance with copy-pasting set queries of which only one or two items change between each query. I find this style easier to perform code-reviews due to having only one code block, removing the need to scroll up and down.
An example use case is: a big slow table with historical data from 20 different insurance companies that is 1.7 billion rows by 200 columns. To do productive analysis, 10 columns are retrieved into separate tables for each of the 20 insurance companies.
Before the cursor/dynamic combination, a query was built and code-reviewed for one plan, then copied 19 times, each time retrieving a different plan.
After utilizing cursor/dynamic combo, there is one 20 item cursor and one dynamic SQL block. Review wise, it seems more consistent and less prone to human error.
A code example of the combo is below:
Declare #company_name varchar(10)
,#SQL_STATEMENT varchar(100)
Declare company_cursor cursor fast_forward for (
SELECT * FROM (VALUES('APPLE'),('GOOGLE'),('AMAZON')) AS TABLE_NAME(COLUMN_NAME)
)
open company_cursor
fetch next from company_cursor into #company_name
while ##fetch_status = 0
begin
set #SQL_STATEMENT = 'select * from database.schema.'+#company_name
print (#SQL_STATEMENT)
fetch next from company_cursor into #company_name
end
close company_cursor
deallocate company_cursor
I also noticed using a PRINT instead of EXEC instantaneously prints the query statement and acts as a code generator if the PRINT results are copied into the SQL editor.
Can somebody offer opinions, advice, or general practice rules surrounding this style of T-SQL coding? (bracing for downvotes...sniff)

Related

Statistics of all/many tables in FileMaker

I'm writing a kind of summary page for my FileMaker solution.
For this, I have define a "statistics" table, which uses formula fields with ExecuteSQL to gather info from most tables, such as number of records, recently changed records, etc.
This strangely takes a long time - around 10 seconds when I have a total of about 20k records in about 10 tables. The same SQL on any database system shouldn't take more than some fractions of a second.
What could the reason be, what can I do about it and where can I start debugging to figure out what's causing all this time?
The actual code is, like this:
SQLAusführen ( "SELECT COUNT(*) FROM " & _Stats::Table ; "" ; "" )
SQLAusführen ( "SELECT SUM(\"some_field_name\") FROM " & _Stats::Table ; "" ; "" )
Where "_Stats" is my statistics table, and it has a string field "Table" where I store the name of the other tables.
So each row in this _Stats table should have the stats for the table named in the "Table" field.
Update: I'm not using FileMaker server, this is a standalone client application.
We can definitely talk about why it may be slow. Usually this has mostly to do with the size and complexity of your schema. That is "usually", as you have found.
Can you instead use the DDR ( database design report ) instead? Much will depend on what you are actually doing with this data. Tools like FMPerception also will give you many of the stats you are looking for. Again, depends on what you are doing with it.
Also, can you post your actual calculation? Is the statistic table using unstored calculations? Is the statistics table related to any of the other tables? These are a couple things that will affect how ExecuteSQL performs.
One thing to keep in mind, whether ExecuteSQL, a Perform Find, or relationship, it's all the same basic query under-the-hood. So if it would be slow doing it one way, it's going to likely be slow with any other directly related approach.
Taking these one at a time:
All records count.
Placing an unstored calc in the target table allows you to get the count of the records through the relationship, without triggering a transfer of all records to the client. You can get the value from the first record in the relationship. Super light way to get that info vs using Count which requires FileMaker to touch every record on the other side.
Sum of Records Matching a Value.
using a field on the _Stats table with a relationship to the target table will reduce how much work FileMaker has to do to give you an answer.
Then having a Summary field in the target table so sum the records may prove to be more efficient than using an aggregate function. The summary field will also only sum the records that match the relationship. ( just don't show that field on any of your layouts if you don't need it )
ExecuteSQL is fastest when it can just rely on a simple index lookup. Once you get outside of that, it's primarily about testing to find the sweet-spot. Typically, I will use ExecuteSQL for retrieving either a JSON object from a user table, or verifying a single field value. Once you get into sorting and aggregate functions, you step outside of the optimizations of the function.
Also note, if you have an open record ( that means you as the current user ), FileMaker Server doesn't know what data you have on the client side, and so it sends ALL of the records. That's why I asked if you were using unstored calcs with ExecuteSQL. It can seem slow when you can't control when the calculations fire. Often I will put the updating of that data into a scheduled script.

where column in (single value) performance

I am writing dynamic sql code and it would be easier to use a generic where column in (<comma-seperated values>) clause, even when the clause might have 1 term (it will never have 0).
So, does this query:
select * from table where column in (value1)
have any different performance than
select * from table where column=value1
?
All my test result in the same execution plans, but if there is some knowledge/documentation that sets it to stone, it would be helpful.
This might not hold true for each and any RDBMS as well as for each an any query with its specific circumstances.
The engine will translate WHERE id IN(1,2,3) to WHERE id=1 OR id=2 OR id=3.
So your two ways to articulate the predicate will (probably) lead to exactly the same interpretation.
As always: We should not really bother about the way the engine "thinks". This was done pretty well by the developers :-) We tell - through a statement - what we want to get and not how we want to get this.
Some more details here, especially the first part.
I Think this will depend on platform you are using (optimizer of the given SQL engine).
I did a little test using MySQL Server and:
When I query select * from table where id = 1; i get 1 total, Query took 0.0043 seconds
When I query select * from table where id IN (1); i get 1 total, Query took 0.0039 seconds
I know this depends on Server and PC and what.. But The results are very close.
But you have to remember that IN is non-sargable (non search argument able), it will not use the index to resolve the query, = is sargable and support the index..
If you want the best one to use, You should test them in your environment because they both work so good!!

T-SQL - Trying to query something across all databases on my server

I've got an environment where my server is hosting a variable number of databases, all of which utilize the same table structures/schemas. I need to pull a sum of customers that meet a certain series of constraints with say, the user table. I also need to show which database I am showing the sum for.
I already know all I need to get the sum in a db by db query, but what I'm really looking to do is have one script that hits all of the non-system DBs currently on my server to grab this info.
Please forgive my ignorance in this, just starting out.
Update-
So, to clarify things somewhat; I'm using MS SQL 2014. I know how to pull a listing of the dbs I want to hit by using:
SELECT name
FROM sys.databases
WHERE name not in ('master', 'model', 'msdb', 'tempdb')
AND state = 0
And for the purposes of gathering the data I need from each, let's just say I've got something like:
select count(u.userid)
from users n
join UserAttributes ua on u.userid = ua.userid
where ua.status = 2
New Update:
So, I went ahead and added the ps sp_foreachdb as suggested by #Philip Kelley, and I'm now running into a problem when trying to run this (admittedly, I can tell I'm closer to a solution). So, this is what I'm using to call the sp:
USE [master]
GO
DECLARE #return_value int
EXEC #return_value = [dbo].[sp_foreachdb]
#command = N'select count(userid) as number from ?..users',
#print_dbname = 1,
#user_only = 1
SELECT 'Return Value' = #return_value
GO
This provides a nice and clean output showing a count, but what I'd like to see is the db name in addition to the count, something like this:
|[DB_NAME]|[COUNT]|
But for each DB
Is this even possible?
Source Code: https://codereview.stackexchange.com/questions/113063/executing-dynamic-sql-programmatically
Example Usage:
declare #options int = (
select a.ExcludeSystemDatabases
from dbo.ForEachDatabaseOptions() as a
);
execute dbo.usp_ForEachDatabase
#Command = N'print Db_Name();'
, #Options = #options;
#Command can be anything you want but obviously it needs to be a query that every single database can understand. #Options currently has 3 built-in settings but can be expanded however you see fit.
I wrote this to mimic/expand upon the master.sys.sp_MSforeachdb procedure but it could still use a little bit of polish (especially around the "logic" that replaces ? with the current database name).
Enumerate the databases from schema / sysdatabases. At least in situations without replication, excluding db_ids 1 to 4 as system databases should be reasonably robust:
SELECT [name] FROM master.dbo.sysdatabases WHERE dbid NOT IN (1,2,3,4)
Other methods exist, see here: Get list of databases from SQL Server and here: SQL Server: How to tell if a database is a system database?
Then prefix the query or stored procedure call with the database name, and in a cursor loop over the resultset of the first query, store that in a sysname variable to construct a series of statements like that:
SELECT column FROM databasename.schema.Viewname WHERE ...
and call that using the string execute function
EXECUTE('SELECT ... FROM '+##fully_qualified_table_name+' WHERE ...')
There’s the undocumented sytem procedure, sp_msForEachDB, as found in the master database. Many pundits on the internet recommend not using this, as under obscure fringe cases it can be unreliable and somehow skip random databases. Count me as one of them, this caused me serious grief a few months back.
You can write your own routine to provide this kind of functionality. This is a common task, however, and many people have already done it and posted their code online… so why re-invent the wheel?
#kittoes0124 posted a link to “usp_ForEachDatabse”. This probably works, though pro forma I hate any stored procedures that beings with usp_. I ended up with Aaron Bertrand’s utility, which can be found at http://www.mssqltips.com/sqlservertip/2201/making-a-more-reliable-and-flexible-spmsforeachdb/.
Install a version of this routine, figure out how it works, plug in your script, and go!

Catch-all-search, dynamic SQL?

I asked a question yesterday about a procedure we're trying to re-write/optimize in our application. It's off of a search form with a bunch of criteria the user can specify. 40 parameters, 3 of which are long strings of Guids that I am passing into a UDF that returns a Table variable, all 3 of which we JOIN into our main FROM statement.
We did much of this query using Dynamic SQL, one of the main reasons we're re-writing the whole thing is because it's Dynamic SQL. Everything I ever read about Dynamic SQL is bad, especially for execution plans and optimization. Then I start coming across articles like these two....
Sometimes the Simplest Solution isn't the Best Solution
Erland Sommarskog - Dynamic SQL Conditions in T-SQL
I've always though Dynamic SQL was bad for security and optimization, we've tried removing it form our system wherever possible. Now we're restructuring the most executed query in our system (main search query) and we thought stripping all the Dynamic SQL was going to help.
Basically replacing
IF(#Param1 IS NULL)
#SQLString = #SQLString + " AND FieldX = #Param1"
...execute the #SQLString
with one large SQL block that has
WHERE (#Param1 IS NOT NULL AND FieldX = #Param1)
Reading those two articles it seems like this is going to work against me. I can't use RECOMPILE because we're in 2k5 still and even if we could this stored procedure is very high-use. Do I really want to write this Query in Dynamic SQL? How can it be faster if no execution plans can be stored?

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.