I looked up official documents and researched them, but I still don't know what to use in what situations. Could you tell me a simple example and the difference? What are the advantages of having to write a RULE?
Forget about rules.
The Postgres Wiki recommends to never use them
Why not?
Rules are incredibly powerful, but they don't do what they look like they do. They look like they're some conditional logic, but they actually rewrite a query to modify it or add additional queries to it.
That means that all non-trivial rules are incorrect.
Depesz has more to say about them.
When should you?
Never. While the rewriter is an implementation detail of VIEWs, there is no reason to pry up this cover plate directly.
(emphasis mine)
Related
I'm learning SQL, and my teacher gave us that question.
Create a trigger (T_updates_stock_close) to not allow changing the primary key of the inventory close records. Display a message stating the lock and abort the process without allowing the change.
I wanted only one example of how to do it.
Here is the section in the official manuals covering triggers. Here is the part of the manuals covering writing triggers in pl/pgsql. PostgreSQL supports a range of procedural languages, bug pl/pgsql ships with it and is a good choice for simple triggers.
You will want to understand the difference between a "BEFORE" and an "AFTER" trigger, and in this case will want a row-level trigger. If I were you I would start off with a trigger function that just rejects any change on the target table and then step by step work towards your goal.
If that doesn't get you started and you can't understand the manuals take a step back and think whether a computing based course is the correct choice for you. The PostgreSQL manuals are excellent and there are a lot of resources available to learn from nowadays. If you have put in some serious effort and still can't get anywhere then computing might not be for you.
I'd like to use Postgres as web api storage backend. I certainly need (at least some) glue code to implement my REST interface (and/or WebSocket). I think about two options:
Implement most of the business logic as stored procedures, PL/SQL while using a very thin middle layer to handle the REST/websocket part.
middle layer implements most of the business logic, and reach Pg over it's abstract DB interface.
My question is what are the possible benefits/hindrances of the above designs compared to each other regarding flexibility, scalability, maintainability and availability?
I don't really care about the exact middle layer implementation (it can be either php, node.js, python or whatever), I'm interested in the benefits and pitfalls of the actual architectural design choice.
I'm aware of that I lose some flexibility by choosing (1) since it would be difficult to port the system to other than maybe oracle, and my users will be bound to postgres. In my case it's not very important, the database intended to be an integral part of the system anyway.
I'm especially interested in the benefits lost in case of choosing (2), and possible pitfalls of either case.
I think both options have their benefits and drawbacks.
(2) approach is good and known. Most simple applications and web services are using it. But sometimes, using stored procedure is much better than (2).
Here is some examples which, IMHO, are good to implement with stored procedures:
tracking changes of rows. I.e you have some table with items that are regularly updated and you want to have another table with all changes and dates of that changes for every item.
custom algorithms, if your functions can be used as expressions for indexing data.
you want to share some logic between several micro-services. If every micro-service are implemented using a different language, you have to re-implement some parts of the business logic for every language and micro-service. Using stored procedures obviously can help to avoid this.
Some benefits of (2) approach (with some "however" of course to confuse you :D):
You can use your favorite programing language to write business logic.
However: in (1) approach you can write procedures using pl/v8 or pl/php or pl/python or pl/whatever extension using your favorite language.
maintaning code is more easy than maintaining stored procedures.
However: there is some good methods to avoid such headaches with code maintenance. I.e: migrations, which is a good thing for every approach.
Also, you can put your functions into your own namespace, so to reinstall re-deploy procedures into database you have to just drop and re-create this namespace, not each function. This can be done with simple script.
you can use various ORM's to query data and got abstraction layers which can have much more complex logic and inheritance logic. In (1) it would be hard to use OOP patterns.
I think this is the most powerful argument against (1) approach, and I can't add any 'however' to this.
I created my own R-Tree and I would like to add it to PostgreSQL, I was reading about PostGis, However I don't kwow exactly How can I do that.
R-Tree is implemented in PostgreSQL as GiST index for two-dimensional geometric data types. To add your own implementation you should consider using GiST infrastructure too. Citing the documentation:
Traditionally, implementing a new index access method meant a lot of difficult work. It was necessary to understand the inner workings of the database, such as the lock manager and Write-Ahead Log. The GiST interface has a high level of abstraction, requiring the access method implementer only to implement the semantics of the data type being accessed. The GiST layer itself takes care of concurrency, logging and searching the tree structure.
So, first read this chapter to make sure you understand the concepts of index methods, operator classes and families.
Then, read about GiST index and its API. There you can find useful examples that will help you.
Also, a lot of helpful information you can find on development section of PostgreSQL site.
Any programming questions you may address to PostgreSQL developer's mailing list.
I'm having this argument about using Cursors in TSQL recently...
First of all, I'm not a cheerleader in the debate. But every time someone says cursor, there's always some knucklehead (or 50) who pounce with the obligatory 'cursors are evil' mantra. I know SQL-Server was optimized for set-based operations, and maybe cursors truly ARE evil incarnate, but if I wanted to put some objective thought behind that...
Here's where my mind is going:
Is the only difference between cursors and set operations one of performance?
Edit: There's been a good case made for it not being simply a matter of performance -- such as running a single batch over-and-over for a list of id's, or alternatively, executing actual SQL text stored in a table field row-by-row.
Follow-up: do cursors always perform worse?
EDIT: #Martin shows a good case where Cursors out-perform set-based operations fairly dramatically. I suspect that this wouldn't be the kind of thing you'd do too often (before you resorted to some kind of OLAP / Data Warehouse kind of solution), but nonetheless, seems like a case where you really couldn't live without a cursor.
reference to TPC benchmarks suggesting cursors may be more competitive than folks generally believe.
reference to memory-usage optimizations for cursors since Sql-Server 2005
Are there any problems you can think of, that cursors are better suited to solve than set-based operations?
EDIT: Set-based operations literally cannot Execute stored procedures, etc. (see edit for item 1 above).
EDIT: Set-based operations are exponentially slower than row-by-row when it comes to aggregating over large data sets.
Article from MSDN explaining their perspective
of the most common problems people resort to cursors for (and some
explanation of set-based techniques that would work better.)
Microsoft says (vaguely) in the 2008 Transact SQL Reference on MSDN: "...there are times when the results are best processed one row at a time", but the don't give any examples as to what cases they're referring to.
Mostly, I'm of a mind to convert cursors to set-based operations in my old code if/as I do any significant upgrades to various applications, as long as there's something to be gained from it. (I tend toward laziness over purity a lot of the time -- i.e., if it ain't broke, don't fix it.)
To answer your question directly:
I have yet to encounter a situation where set operations could not do what might otherwise be done with cursors. However, there are situations where using cursors to break a large set problem down into more manageable chunks proves a better solution for purposes of code maintainability, logging, transaction control, and the like. But I doubt there are any hard-and-fast rules to tell you what types of requirements would lead to one solution or the other -- individual databases and needs are simply far too variant.
That said, I fully concur with your "if it ain't broke, don't fix it" approach. There is little to be gained by refactoring procedural code to set operations for a procedure that is working just fine. However, it is a good rule of thumb to seek first for a set-based solution and only drop into procedural code when you must. Gut feel? If you're using cursors more than 20% of the time, you're doing something wrong.
And for what I really want to say:
When I interview programmers, I always throw them a couple of moderately complex SQL questions and ask them to explain how they'd solve them. These are problems that I know can be solved with set operations, and I'm specifically looking for candidates who are able to solve them without procedural approaches (i.e., cursors).
This is not because I believe there is anything inherently good or more performant in either approach -- different situations yield different results. Rather it's because, in my experience, programmers either get the concept of set-based operations or they do not. If they do not, they will spend too much time developing complex procedural solutions for problems that can be solved far more quickly and simply with set-based operations.
Conversely, a programmer who gets set-based operations almost never has problems implementing a procedural solution when, indeed, it's absolutely necessary.
Running Totals is the classic case where as the number of rows gets larger cursors can out perform set based operations as despite the higher fixed cost of the cursor the work required grows linearly rather than exponentially as with the set based "triangular join" approach.
Itzik Ben Gan does some comparisons here.
Denali has more complete support for the OVER clause however that should make this use redundant.
Since I've seen people manage to re-implement cursors (in all there varied forms) using other TSQL constructs (usually involving at least one while loop), there's nothing that cursors can achieve that can't be done using other constructs.
That's not to say that the re-implementations aren't equally as inefficient as the cursors that were avoided by not including the word "cursor" in that solution. Some people seem to purely hate the word, not the mechanics.
One place I've successfully argued to keep cursors was for a data transfer/transform between two different databases (we were dealing with clients here). Whilst we could have implemented this transfer in a set based manner (indeed, we previously had), there was problematic data that could cause issues for a few clients. In a set based solution, we had either to:
Continue the transfer, excluding failed client data at each table, leaving those clients partially transferred, or,
abort the entire batch
Whereas, by making the unit of transfer the individual client (using a cursor to select each client), we could make each client's transfer between the systems either work fully or be entirely rolled back (i.e. place each transfer in its own transaction)
I can't think of any situations where I've wanted to use a cursor below the "top level" of such transfers though (e.g. selecting which client to transfer next)
Often when you build dynamic sql, you have to use cursors. Imagine a script that search through all tabels in the database for same value in different fields. Best solution will be a cursor. Question where the problem was raised is here How to use EXEC or sp_executeSQL without looping in this case? I will be really impressed if anyone can solve that better without a cursor.
I am very eager to know the real cause though earned some knowledge from googling.
Thanks in adavnce
Because SQL is a really poor language for writing procedural code, and because the SQL engine, storage, and optimizer are designed to make it efficient to assemble and join sets of records.
(Note that this isn't just applicable to SQL Server, but I'll leave your tags as they are)
Because, in general, the hundreds of man-years of development time that have gone into the database engine and optimizer, and the fact that it has access to real-time statistics about the data, have resulted in it being better than the user in working out the best way to process the data, for a given request.
Therefore by saying what we want to achieve (with a set-based approach), and letting it decide how to do it, we generally achieve better results than by spelling out exactly how to provess the data, line by line.
For example, suppose we have a simple inner join from table A to table B. At design time, we generally don't know 'which way round' will be most efficient to process: keep a list of all the values on the A side, and go through B matching them, or vice versa. But the query optimizer will know at runtime both the numbers of rows in the tables, and also the most recent statistics may provide more information about the values themselves. So this decision is obviously better made at runtime, by the optimizer.
Finally, note that I have put a number of 'generally's in this post - there will always be times when we know better than the optimizer will, and for such times we can provide hints (NOLOCK etc).
Set based approaches are declarative, so you don't describe the way the work will be done, only what you want the result to look like. The server can decide between several strategies how to complay with your request, and hopefully choose one that is efficient.
If you write procedural code, that code will at best be less then optimal in some situation.
Because using a set-based approach to SQL development conforms to the design of the data model. SQL is a very set-based language, used to build sets, subsets, unions, etc, from data. Keeping that in mind while developing in TSQL will generally lead to more natural algorithms. TSQL makes many procedural commands available that don't exist in plain SQL, but don't let that switch you to a procedural methodology.
This makes me think of one of my favorite quotes from Rob Pike in Notes on Programming C:
Data dominates. If you have chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
SQL databases and the way we query them are largely set-based. Thus, so should our algorithms be.
From an even more tangible standpoint, SQL servers are optimized with set-based approaches in mind. Indexing, storage systems, query optimizers, and other optimizations made by various SQL database implmentations will do a much better job if you simply tell them the data you need, through a set-based approach, rather than dictating how you want to get it procedurally. Let the SQL engine worry about the best way to get you the data, you just worry about telling it what data you want.
As each one has explained, let the SQL engine help you, believe, it is very smart.
If you do not use to write set based solution and use to develop procedural code, you will have to spend some time until write well formed set based solutions. This is a barrier for most people. A tip if you wish to start coding set base solutions is, stop thinking what you can do with rows, and start thinking what you can do with collumns, and do practice functional languages.