There is some new feature of SQL, I think that it was intruduced in 2005 that I never used. I can't remember what it was called, but I recall reading about it. It was an acronym similar to CT_ or something....
It was used to avoid reproducing the same code within a single SQL, for readability and (I assume) for performance reasons--because the common code would only be evaluated once.
What was it called? I am having a bit of trouble trying to find it and thought someone could easily tell me.
You're talking about a Common Table Expression or CTE. It's especially useful for recursive queries.
Related
I'm pretty new to PostgreSQL so I guess i'm missing some basic information, information that I didn't quite find while googling, guess I didn't really know the right keywords, hopefully here I'll get the missing information :)
I'm using PostgreSQL 11.4.
I've encountered many issues when I create a function that returns a query result as a table, and it executes it about 50 times slower then running the actual query, sometimes even more then that.
I understand that IMMUTABLE can be used when there is no table scans, just when I manipulate and return data based on the function parameters and STABLE when if the query with same parameters do a table scan and always returns the same results.
so the format of my function creation is this:
CREATE FUNCTION fnc_name(columns...)
RETURNS TABLE ( columns..) STABLE AS $func$
BEGIN
select ...
END $func$ LANGUAGE pgplsql;
I can't show the query here since it's work related, but still... there is something that I didn't quite understand about creating functions why is it so slow ? I need to fully understand this issue cause I need to create many more functions and it seems right now that I need to run the actual query to get proper performance instead of using functions and I still don't really have a clue as to why!
any information regarding this issue would be greatly appreciated.
All depends on usage of this function, and size of returned relation.
First I have to say - don't write these functions. It is known antipattern. I'll try to explain why. Use views instead.
Result of table functions written in higher PL languages like Perl, Python or PLpgSQL is materialized. When table is small (to work_mem) it is stored in memory. Bigger tables are stored in temp file. It can have significant overhead.
Function is a black box for optimizer - is not possible to push down predicates, there are not correct statistics, there is not possible to play with form of joins or order of joins. So some not trivial queries can be slower (little bit or significantly) due impossible optimizations.
There is a exception from these rules - simple SQL functions. SQL functions (functions with single SQL statement) can be inlined (when some prerequisites are true). Due inlining the body of function is merged to body of outer SQL query, and the result is same like you will write subquery directly. So result is not materialized and it is not a barrier for optimization.
There is a basic rule - use functions only when you cannot to calculate some data by SQL. Don't try to hide SQL or encapsulate SQL (elsewhere - for simplification some complex queries use views not functions). Same rules are valid for all SQL databases (Oracle, DB2, MSSQL). Postgres is not a exception.
This note is not against stored procedures (functions). It is great technology. But it requires specific style of programming. Wrapping queries into functions (when there is not any other) is bad.
Which performs better: common table expressions or table value functions? Im designing a process that I could use either and am unable to find any real data either way. Whatever route I choose would be executed via a SP and the data would ultimately update a table connected through a linked server (unfortunately there is no way around this). Insights appreciated.
This isn't really a performance question. You are comparing tuna fish and watermelons. A cte is an inline view that can be used by the next query only. A TVF is a complete unit of work that can function on it's own, unlike a cte. They both have their place and when used correctly are incredibly powerful tools.
I am writing an app that will use many tables and i have been told that using stored procs in the app. is not the way to go, that it is too slow.
It has been suggested i use TSQL. I have only used stored procs till now. in what way is using TSQL different, how can I get up to speed. IN fact, is this the way to go for faster data access or is there other methods?
TSQL is Microsoft and Sybase SQL dialect, so your stored procedures are written with TSQL if you use SQLServer.
In the most cases, properly written stored procedures overperform adhoc queries.
On the other hand, coding procedures requires more skills and debugging is quite a tedious process. It's really hard to give advice without seeing your procedures, but there are some common things that slow down SPs.
Execution plan is generated upon the first run, but sometimes the optimal plan depends on input parameters. See here for more details.
Another thing that prevents generating optimal plan is using conditions in SP body.
For example,
IF (something)
BEGIN
SELECT ... FROM table1
INNER JOIN table2 ...
.....
END
ELSE
BEGIN
SELECT ... FROM table2
INNER JOIN table3 ...
.....
END
should be refactored to
IF (something)
EXEC proc1; // create a new SP and move code from IF there
ELSE
EXEC proc2; // create a new SP and move code from ELSE there
The traditional argument for using SPs was always that they're compiled so they run faster. That hasn't been true for many years but nor is it true, in general, that SPs run slower.
If the reference is to development time rather than runtime then there may be some truth to this but, considering your skills, it may be that learning a new approach would slow you down more than using SPs.
If your system uses Object-Relational Mapping (ORM) then SPs will probably get in your way but then you wouldn't really be using T-SQL either - it'll be done for you.
Stored proc's are written with T-SQL, so it's a bit odd that someone would make such a statement.
Daniel is right, ORM is a good option. If you're doing any data intensive operations (such as parsing content), I'd look at the database first and foremost. You might want to do some reading on SP as speed isn't everything... there are other benefits. This was one hit from Google, but you can do more research yourself:
http://msdn.microsoft.com/en-us/library/ms973918.aspx
I'm having this argument about using Cursors in TSQL recently...
First of all, I'm not a cheerleader in the debate. But every time someone says cursor, there's always some knucklehead (or 50) who pounce with the obligatory 'cursors are evil' mantra. I know SQL-Server was optimized for set-based operations, and maybe cursors truly ARE evil incarnate, but if I wanted to put some objective thought behind that...
Here's where my mind is going:
Is the only difference between cursors and set operations one of performance?
Edit: There's been a good case made for it not being simply a matter of performance -- such as running a single batch over-and-over for a list of id's, or alternatively, executing actual SQL text stored in a table field row-by-row.
Follow-up: do cursors always perform worse?
EDIT: #Martin shows a good case where Cursors out-perform set-based operations fairly dramatically. I suspect that this wouldn't be the kind of thing you'd do too often (before you resorted to some kind of OLAP / Data Warehouse kind of solution), but nonetheless, seems like a case where you really couldn't live without a cursor.
reference to TPC benchmarks suggesting cursors may be more competitive than folks generally believe.
reference to memory-usage optimizations for cursors since Sql-Server 2005
Are there any problems you can think of, that cursors are better suited to solve than set-based operations?
EDIT: Set-based operations literally cannot Execute stored procedures, etc. (see edit for item 1 above).
EDIT: Set-based operations are exponentially slower than row-by-row when it comes to aggregating over large data sets.
Article from MSDN explaining their perspective
of the most common problems people resort to cursors for (and some
explanation of set-based techniques that would work better.)
Microsoft says (vaguely) in the 2008 Transact SQL Reference on MSDN: "...there are times when the results are best processed one row at a time", but the don't give any examples as to what cases they're referring to.
Mostly, I'm of a mind to convert cursors to set-based operations in my old code if/as I do any significant upgrades to various applications, as long as there's something to be gained from it. (I tend toward laziness over purity a lot of the time -- i.e., if it ain't broke, don't fix it.)
To answer your question directly:
I have yet to encounter a situation where set operations could not do what might otherwise be done with cursors. However, there are situations where using cursors to break a large set problem down into more manageable chunks proves a better solution for purposes of code maintainability, logging, transaction control, and the like. But I doubt there are any hard-and-fast rules to tell you what types of requirements would lead to one solution or the other -- individual databases and needs are simply far too variant.
That said, I fully concur with your "if it ain't broke, don't fix it" approach. There is little to be gained by refactoring procedural code to set operations for a procedure that is working just fine. However, it is a good rule of thumb to seek first for a set-based solution and only drop into procedural code when you must. Gut feel? If you're using cursors more than 20% of the time, you're doing something wrong.
And for what I really want to say:
When I interview programmers, I always throw them a couple of moderately complex SQL questions and ask them to explain how they'd solve them. These are problems that I know can be solved with set operations, and I'm specifically looking for candidates who are able to solve them without procedural approaches (i.e., cursors).
This is not because I believe there is anything inherently good or more performant in either approach -- different situations yield different results. Rather it's because, in my experience, programmers either get the concept of set-based operations or they do not. If they do not, they will spend too much time developing complex procedural solutions for problems that can be solved far more quickly and simply with set-based operations.
Conversely, a programmer who gets set-based operations almost never has problems implementing a procedural solution when, indeed, it's absolutely necessary.
Running Totals is the classic case where as the number of rows gets larger cursors can out perform set based operations as despite the higher fixed cost of the cursor the work required grows linearly rather than exponentially as with the set based "triangular join" approach.
Itzik Ben Gan does some comparisons here.
Denali has more complete support for the OVER clause however that should make this use redundant.
Since I've seen people manage to re-implement cursors (in all there varied forms) using other TSQL constructs (usually involving at least one while loop), there's nothing that cursors can achieve that can't be done using other constructs.
That's not to say that the re-implementations aren't equally as inefficient as the cursors that were avoided by not including the word "cursor" in that solution. Some people seem to purely hate the word, not the mechanics.
One place I've successfully argued to keep cursors was for a data transfer/transform between two different databases (we were dealing with clients here). Whilst we could have implemented this transfer in a set based manner (indeed, we previously had), there was problematic data that could cause issues for a few clients. In a set based solution, we had either to:
Continue the transfer, excluding failed client data at each table, leaving those clients partially transferred, or,
abort the entire batch
Whereas, by making the unit of transfer the individual client (using a cursor to select each client), we could make each client's transfer between the systems either work fully or be entirely rolled back (i.e. place each transfer in its own transaction)
I can't think of any situations where I've wanted to use a cursor below the "top level" of such transfers though (e.g. selecting which client to transfer next)
Often when you build dynamic sql, you have to use cursors. Imagine a script that search through all tabels in the database for same value in different fields. Best solution will be a cursor. Question where the problem was raised is here How to use EXEC or sp_executeSQL without looping in this case? I will be really impressed if anyone can solve that better without a cursor.
I am very eager to know the real cause though earned some knowledge from googling.
Thanks in adavnce
Because SQL is a really poor language for writing procedural code, and because the SQL engine, storage, and optimizer are designed to make it efficient to assemble and join sets of records.
(Note that this isn't just applicable to SQL Server, but I'll leave your tags as they are)
Because, in general, the hundreds of man-years of development time that have gone into the database engine and optimizer, and the fact that it has access to real-time statistics about the data, have resulted in it being better than the user in working out the best way to process the data, for a given request.
Therefore by saying what we want to achieve (with a set-based approach), and letting it decide how to do it, we generally achieve better results than by spelling out exactly how to provess the data, line by line.
For example, suppose we have a simple inner join from table A to table B. At design time, we generally don't know 'which way round' will be most efficient to process: keep a list of all the values on the A side, and go through B matching them, or vice versa. But the query optimizer will know at runtime both the numbers of rows in the tables, and also the most recent statistics may provide more information about the values themselves. So this decision is obviously better made at runtime, by the optimizer.
Finally, note that I have put a number of 'generally's in this post - there will always be times when we know better than the optimizer will, and for such times we can provide hints (NOLOCK etc).
Set based approaches are declarative, so you don't describe the way the work will be done, only what you want the result to look like. The server can decide between several strategies how to complay with your request, and hopefully choose one that is efficient.
If you write procedural code, that code will at best be less then optimal in some situation.
Because using a set-based approach to SQL development conforms to the design of the data model. SQL is a very set-based language, used to build sets, subsets, unions, etc, from data. Keeping that in mind while developing in TSQL will generally lead to more natural algorithms. TSQL makes many procedural commands available that don't exist in plain SQL, but don't let that switch you to a procedural methodology.
This makes me think of one of my favorite quotes from Rob Pike in Notes on Programming C:
Data dominates. If you have chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
SQL databases and the way we query them are largely set-based. Thus, so should our algorithms be.
From an even more tangible standpoint, SQL servers are optimized with set-based approaches in mind. Indexing, storage systems, query optimizers, and other optimizations made by various SQL database implmentations will do a much better job if you simply tell them the data you need, through a set-based approach, rather than dictating how you want to get it procedurally. Let the SQL engine worry about the best way to get you the data, you just worry about telling it what data you want.
As each one has explained, let the SQL engine help you, believe, it is very smart.
If you do not use to write set based solution and use to develop procedural code, you will have to spend some time until write well formed set based solutions. This is a barrier for most people. A tip if you wish to start coding set base solutions is, stop thinking what you can do with rows, and start thinking what you can do with collumns, and do practice functional languages.