Small methods - Small sprocs - tsql

Uncle Bob recommends having small methods. Do stored procedures have an ideal size? Or can they run on for 100's and 100's of lines long?
Also does anyone have anything to say about where to place business logic. If located in stored procedures, the database is being used as data processing tier.
If you read Adam Machanic, his bias is toward the database, does that imply long stored procedures that only the author of the sproc understands, leaving maintainers to deal with the mess?
I guess there is two inter-related questions, somehow.
Thanks in advance for responding to a fuzzy question(s).

Stored procedures are no different than regular functions. They should be of manageable size, regardless. I am biased towards keeping away business logic from stored procedures but reasonable people may disagree.

I think stored procedures nowadays should NEVER be used on a whole system as the only access method to the database. This is an outdated architecture that in the long-term gives much more maintainance problems than benefits.
There are much better ways nowadays to handle every data access requirement.
The best use for stored procedure is for certain rare cases when you want a single, well defined and unique function to retrieve data that you know it will be used in the same way by more applications. The stored procedure will allow you to be DRY in this case.
Also in certain cases where your db administrator that handles security needs to protect certain part of the data (for example a credit card table) on such a granular way that allowing access only to SP is a good option.
Apart from those cases avoid stored procedures as much as possible and stick with only using code with all the benefits of inheritance, compilers checks, tools for refactorizations, enumerations instead of magic strings also in queries, source control, easier deployment etc etc. The list of benefit of avoiding SP as much as possible is just too long to pass nowadays.
BUT if for some reason you decide to use stored procedures you might as well put business logic in there as having such a layer so close to the data without even allowing it to contain business logic will just further complicate your project and you will not reap the very few positive points of using SPs.

I would apply the same recommendations to SP's as much as possible as I consider SP's code.
Business Logic in my opinion belongs in a tier of the code base not in SP's. To me if the SP's keep the business logic they know too much about what they are supposed to do. I think SP's mainly should be tasked with when used for retreiving data and/or storing it. If business logic has been applied up the chain of command / in the code then SP's would only be called when business logic has been satisfied.
I doubt Adam Machanic or most would advocate that long SP's that are hard to understand and maintain are a good thing.

Related

business logic in stored procedures vs. middle layer

I'd like to use Postgres as web api storage backend. I certainly need (at least some) glue code to implement my REST interface (and/or WebSocket). I think about two options:
Implement most of the business logic as stored procedures, PL/SQL while using a very thin middle layer to handle the REST/websocket part.
middle layer implements most of the business logic, and reach Pg over it's abstract DB interface.
My question is what are the possible benefits/hindrances of the above designs compared to each other regarding flexibility, scalability, maintainability and availability?
I don't really care about the exact middle layer implementation (it can be either php, node.js, python or whatever), I'm interested in the benefits and pitfalls of the actual architectural design choice.
I'm aware of that I lose some flexibility by choosing (1) since it would be difficult to port the system to other than maybe oracle, and my users will be bound to postgres. In my case it's not very important, the database intended to be an integral part of the system anyway.
I'm especially interested in the benefits lost in case of choosing (2), and possible pitfalls of either case.
I think both options have their benefits and drawbacks.
(2) approach is good and known. Most simple applications and web services are using it. But sometimes, using stored procedure is much better than (2).
Here is some examples which, IMHO, are good to implement with stored procedures:
tracking changes of rows. I.e you have some table with items that are regularly updated and you want to have another table with all changes and dates of that changes for every item.
custom algorithms, if your functions can be used as expressions for indexing data.
you want to share some logic between several micro-services. If every micro-service are implemented using a different language, you have to re-implement some parts of the business logic for every language and micro-service. Using stored procedures obviously can help to avoid this.
Some benefits of (2) approach (with some "however" of course to confuse you :D):
You can use your favorite programing language to write business logic.
However: in (1) approach you can write procedures using pl/v8 or pl/php or pl/python or pl/whatever extension using your favorite language.
maintaning code is more easy than maintaining stored procedures.
However: there is some good methods to avoid such headaches with code maintenance. I.e: migrations, which is a good thing for every approach.
Also, you can put your functions into your own namespace, so to reinstall re-deploy procedures into database you have to just drop and re-create this namespace, not each function. This can be done with simple script.
you can use various ORM's to query data and got abstraction layers which can have much more complex logic and inheritance logic. In (1) it would be hard to use OOP patterns.
I think this is the most powerful argument against (1) approach, and I can't add any 'however' to this.

Suggest data access design for Entity Framework (stored procedure less)

The plan is to use Entity Framework for data access. We are in a dilemma in deciding whether to use stored procedures or not.
The main idea behind avoiding stored procedures: we don't want any one tempted in writing business logic at the database level. I believe in database is only for storage.
Is there any performance hit if I write joins, business logic at the data access level? Is this as good as stored procedures? please provide your recommendations.
Regards,
Ramana Akula.
SP disadvantages:
SP code is "fixed" - e.g. you don't have LINQ flexibility here. This required extra level of synchronization.
Should be written by specialist in most cases. Almost all C# developers historically are not very good in this.
Although you put extra effort for SPs maintaining you don't get significant performance improvement
Even if you don't have intention, SPs will contain some part of business logic code.
If you loose some control over SPs maintenance (or if you'd add some premature optimization to DAL), you'd need more and more SPs as you go: getXbyY, getXbyZ, getSomeFieldByX, geAnotherFieldsByX, etc, etc.
So my opinion is avoiding SPs as possible. If you feel you need one, that could indicate incorrect data/storage structure design.

ORM for large volume database

I am working on new project which have data oriented means very large volume of data (increasing day by day). So kindly suggest me which type of approach I should use to achieve desire functionality with out any hurdles.
Is database fully normalized?
Which ORM (linq2sql, entity framework) is suitable for this project?
Should I use stored procedures, db functions, triggers, etc?
Whether or not the database is normalized is something you need to know and need to answer!
As for the ORM: it really depends on the type of data and its structure.
Linq-to-SQL is a very simplistic ORM that basically just does a 1:1 mapping of tables to domain objects. As long as you don't need anything else - that's fine. Linq-to-SQL is no longer being actively developed, so that might be a drawback. Also, stored proc support is a bit limited.
Entity Framework (at least in .NET 4) is great and is the current ORM of choice at Microsoft - it's being actively developed, has a lot of backing, lot of flexibility. It offers database-first, model-first and code-first development styles, it supports POCO objects and self-tracking entities, and is very well integrated with stored procs (you can define a stored proc for INSERT, UPDATE, DELETE on every single entity, if you wish to do so). It would be my first choice.
NHibernate is a great, enterprise-level ORM, well established and being actively developed - certainly not a "dead-end" like Linq-to-SQL. I've used it a few years ago, and while it's great and powerful, it's also a bit harder to learn than EF4 (no visual designer, needs more manual labor, manual effort). It's great if you really need all its power and if you're willing to invest the necessary up-front learning time.
As for the database: stored procs are definitely worth while investigating, especially if you need to encapsulate certain database processing into a nice proc to call from your code. I would be rather careful and defensive about using triggers and functions too much - they have their place, but they shouldn't be overused, since they do carry some problems with them (mostly performance problems and problems of "discoverability" - lots of devs don't think about triggers that could be in place, and will not understand what's going on).
#Xulfee, that's a fairly broad question and a lot depends on the nature of your project. The approaches you reference affect a lot of aspects of the overall architecture. For example:
Is the database fully normalized?
Database normalization generally aids in tackling the problem of complexity of your conceptual model. When properly (note I did not say, "fully") normalized, your model should be fairly straight-forward and consumers of the database (developers, your BI team, domain experts, etc) should be able to get a good idea of the business problems that are being approached with your database. That having been said, normalization can lead to a fairly large reporting and analysis problem. When writing a query for a report against a large, fairly normalized database, you may introduce performance problems by joining a lot of tables. Enter snowflake schemas. So, to your question: it depends. What are you reporting requirements? How many transactions on average do you need to support? How complex is your conceptual model? Are you able to break the database into smaller models that are associated, rather than one large one?
Which ORM (linq2sql, entity framework) is suitable for this project?
Again, an ORM is a tool. You must ask yourself what is the specific job that you are trying to accomplish? The choice of an ORM (or in even using an ORM in the first place) is a decision that I would recommend you make fairly early on as it can affect everything from performance to development team cohesion. There are a lot of great choices out there:
Linq-To-Sql
NHibernate
Entity Framework
LLBLGen
Each of the above frameworks does a fantastic job of abstracting your persistence layer. Each has it's pro's and cons - the majority of which come down to infrastructure concerns: performance, configuration, schema/language compatibility, persistence patterns, vendor support. Given the choice, I would ask myself which of the frameworks is my development team most comfortable with? Which one supports the level of system activity that I expect? With which vendor am I willing to "throw-in"? I have seen fairly successful systems that use fairly small ORM's (i.e. Stackoverflow uses a modified version of Linq-To-Sql) as well as fairly large systems fail with fairly complex ORM's.
Should I use stored procedures, db functions, triggers, etc?
This question centers around your persistence store and how you use it (as well as how angry you want to make your DBA :) ). The use of sprocs (stored procedures) lends itself to allowing your dba to provide security at a very granular level. In addition, if the orm you are using generates dynamic sql, you might benefit from the database's ability to cache queries generated using sprocs. DB functions can be a double-sided blade. They offer the ability to add functionality and intelligence to your model, while at the same time allowing you to take a fairly large hit performance-wise (i.e. table-valued UDF's). Triggers have their own pitfalls and should be used with caution, but that discussion could get rather involved. The bottom-line for me in this case is: how much logic in the database do you want to support, and how important is security and performance? Do you have a qualified dba (not just a developer who knows how to write queries, but a dba who is capable of performance tuning and data modeling)? How big is your database? How complex is your data? Think about all of these questions and more when determining how you want to manage you data.
In summary, you are asking some good questions. Don't confuse infrastructure needs with implementation needs. Decide on a stack and run with it, don't get bogged-down in implementation details to the point at which you are unable to successfully complete the project. With the right level of abstraction, you may find it easier to try out new and different technologies without risking the overall success of the project. And remember: there's nothing wrong with experimenting and trying new things, just be prepared to fail gracefully and test, test, test!

Why “Set based approaches” are better than the “Procedural approaches”?

I am very eager to know the real cause though earned some knowledge from googling.
Thanks in adavnce
Because SQL is a really poor language for writing procedural code, and because the SQL engine, storage, and optimizer are designed to make it efficient to assemble and join sets of records.
(Note that this isn't just applicable to SQL Server, but I'll leave your tags as they are)
Because, in general, the hundreds of man-years of development time that have gone into the database engine and optimizer, and the fact that it has access to real-time statistics about the data, have resulted in it being better than the user in working out the best way to process the data, for a given request.
Therefore by saying what we want to achieve (with a set-based approach), and letting it decide how to do it, we generally achieve better results than by spelling out exactly how to provess the data, line by line.
For example, suppose we have a simple inner join from table A to table B. At design time, we generally don't know 'which way round' will be most efficient to process: keep a list of all the values on the A side, and go through B matching them, or vice versa. But the query optimizer will know at runtime both the numbers of rows in the tables, and also the most recent statistics may provide more information about the values themselves. So this decision is obviously better made at runtime, by the optimizer.
Finally, note that I have put a number of 'generally's in this post - there will always be times when we know better than the optimizer will, and for such times we can provide hints (NOLOCK etc).
Set based approaches are declarative, so you don't describe the way the work will be done, only what you want the result to look like. The server can decide between several strategies how to complay with your request, and hopefully choose one that is efficient.
If you write procedural code, that code will at best be less then optimal in some situation.
Because using a set-based approach to SQL development conforms to the design of the data model. SQL is a very set-based language, used to build sets, subsets, unions, etc, from data. Keeping that in mind while developing in TSQL will generally lead to more natural algorithms. TSQL makes many procedural commands available that don't exist in plain SQL, but don't let that switch you to a procedural methodology.
This makes me think of one of my favorite quotes from Rob Pike in Notes on Programming C:
Data dominates. If you have chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
SQL databases and the way we query them are largely set-based. Thus, so should our algorithms be.
From an even more tangible standpoint, SQL servers are optimized with set-based approaches in mind. Indexing, storage systems, query optimizers, and other optimizations made by various SQL database implmentations will do a much better job if you simply tell them the data you need, through a set-based approach, rather than dictating how you want to get it procedurally. Let the SQL engine worry about the best way to get you the data, you just worry about telling it what data you want.
As each one has explained, let the SQL engine help you, believe, it is very smart.
If you do not use to write set based solution and use to develop procedural code, you will have to spend some time until write well formed set based solutions. This is a barrier for most people. A tip if you wish to start coding set base solutions is, stop thinking what you can do with rows, and start thinking what you can do with collumns, and do practice functional languages.

Manual DAL & BLL vs. ORM

Which approach is better: 1) to use a third-party ORM system or 2) manually write DAL and BLL code to work with the database?
1) In one of our projects, we decided using the DevExpress XPO ORM system, and we ran across lots of slight problems that wasted a lot of our time. Amd still from time to time we encounter problems and exceptions that come from this ORM, and we do not have full understanding and control of this "black box".
2) In another project, we decided to write the DAL and BLL from scratch. Although this implied writing boring code many, many times, but this approach proved to be more versatile and flexible: we had full control over the way data was held in the database, how it was obtained from it, etc. And all the bugs could be fixed in a direct and easy way.
Which approach is generally better? Maybe the problem is just with the ORM that we used (DevExpress XPO), and maybe other ORMs are better (such as NHibernate)?
Is it worth using ADO Entiry Framework?
I found that the DotNetNuke CMS uses its own DAL and BLL code. What about other projects?
I would like to get info on your personal experience: which approach do you use in your projects, which is preferable?
Thank you.
My personal experience has been that ORM is usually a complete waste of time.
First, consider the history behind this. Back in the 60s and early 70s, we had these DBMSes using the hierarchical and network models. These were a bit of a pain to use, since when querying them you had to deal with all of the mechanics of retrieval: follow links between records all over the place and deal with the situation when the links weren't the links you wanted (e.g., were pointing in the wrong direction for your particular query). So Codd thought up the idea of a relational DBMS: specify the relationships between things, say in your query only what you want, and let the DBMS deal with figuring out the mechanics of retrieving it. Once we had a couple of good implementations of this, the database guys were overjoyed, everybody switched to it, and the world was happy.
Until the OO guys came along into the business world.
The OO guys found this impedance mismatch: the DBMSes used in business programming were relational, but internally the OO guys stored things with links (references), and found things by figuring out the details of which links they had to follow and following them. Yup: this is essentially the hierarchical or network DBMS model. So they put a lot of (often ingenious) effort into layering that hierarchical/network model back on to relational databases, incidently throwing out many of the advantages given to us by RDBMSes.
My advice is to learn the relational model, design your system around it if it's suitable (it very frequently is), and use the power of your RDBMS. You'll avoid the impedance mismatch, you'll generally find the queries easy to write, and you'll avoid performance problems (such as your ORM layer taking hundreds of queries to do what it ought to be doing in one).
There is a certain amount of "mapping" to be done when it comes to processing the results of a query, but this goes pretty easily if you think about it in the right way: the heading of the result relation maps to a class, and each tuple in the relation is an object. Depending on what further logic you need, it may or may not be worth defining an actual class for this; it may be easy enough just to work through a list of hashes generated from the result. Just go through and process the list, doing what you need to do, and you're done.
Perhaps a little of both is the right fit. You could use a product like SubSonic. That way, you can design your database, generate your DAL code (removing all that boring stuff), use partial classes to extend it with your own code, use Stored Procedures if you want to, and generally get more stuff done.
That's what I do. I find it's the right balance between automation and control.
I'd also point out that I think you're on the right path by trying out different approaches and seeing what works best for you. I think that's ultimately the source for your answer.
Recently I made the decision to use Linq to SQL on a new project, and I really like it. It is lightweight, high-performance, intuitive, and has many gurus at microsoft (and others) that blog about it.
Linq to SQL works by creating a data layer of c# objects from your database. DevExpress XPO works in the opposite direction, creating tables for your C# business objects. The Entity Framework is supposed to work either way. I am a database guy, so the idea of a framework designing the database for me doesn't make much sense, although I can see the attractiveness of that.
My Linq to SQL project is a medium-sized project (hundreds, maybe thousands of users). For smaller projects sometimes I just use SQLCommand and SQLConnection objects, and talk directly to the database, with good results. I have also used SQLDataSource objects as containers for my CRUD, but these seem clunky.
DALs make more sense the larger your project is. If it is a web application, I always use some sort of DAL because they have built-in protections against things like SQL injection attacks, better handling of null values, etc.
I debated whether to use the Entity Framework for my project, since Microsoft says this will be their go-to solution for data access in the future. But EF feels immature to me, and if you search StackOverflow for Entity Framework, you will find several people who are struggling with small, obtuse problems. I suspect version 2 will be much better.
I don't know anything about nHibernate, but there are people out there who love it and would not use anything else.
You might try using NHibernate. Since it's open source, it's not exactly a black box. It is very versatile, and it has many extensibility points for you to plug in your own additional or replacement functionality.
Comment 1:
NHibernate is a true ORM, in that it permits you to create a mapping between your arbitrary domain model (classes) and your arbitrary data model (tables, views, functions, and procedures). You tell it how you want your classes to be mapped to tables, for example, whether this class maps to two joined tables or two classes map to the same table, whether this class property maps to a many-to-many relation, etc. NHibernate expects your data model to be mostly normalized, but it does not require that your data model correspond precisely to your domain model, nor does it generate your data model.
Comment 2:
NHibernate's approach is to permit you to write any classes you like, and then after that to tell NHibernate how to map those classes to tables. There's no special base class to inherit from, no special list class that all your one-to-many properties have to be, etc. NHibernate can do its magic without them. In fact, your business object classes are not supposed to have any dependencies on NHibernate at all. Your business object classes, by themselves, have absolutely no persistence or database code in them.
You will most likely find that you can exercise very fine-grained control over the data-access strategies that NHibernate uses, so much so that NHibernate is likely to be an excellent choice for your complex cases as well. However, in any given context, you are free to use NHibernate or not to use it (in favor of more customized DAL code), as you like, because NHibernate tries not to get in your way when you don't need it. So you can use a custom DAL or DevExpress XPO in one BLL class (or method), and you can use NHibernate in another.
I recently took part in sufficiently large interesting project. I didn't join it from the beginning and we had to support already implemented architecture. Data access to all objects was implemented through stored procedures and automatically generated wrapper-methods on .NET that returned DataTable objects. The development process in such system was really slow and inefficient. We had to write huge stored procedure on PL/SQL for every query, that could be expressed in simple LINQ query. If we had used ORM, we would have implement project several times faster. And I don't see any advantage of such architecture.
I admit, that it is just particular not very successful project, but I made following conclusion: Before refusing to use ORM think twice, do you really need such flexibility and control over database? I think in most cases it isn't worth wasted time and money.
As others explain, there is a fundamental difficulty with ORM's that make it such that no existing solution does a very good job of doing the right thing, most of the time. This Blog Post: The Vietnam Of Computer Science echoes some of my feelings about it.
The executive summary is something along the lines of the assumptions and optimizations that are incompatible between object and relational models. although early returns are good, as the project progresses, the abstractions of the ORM fall short, and the extra overhead of working around it tends to cancel out the successes.
I have used Bold for Delphi four years now. It is great but it is not available anymore for sale and it lacks some features like databinding. ECO the successor has all that.
No I'm not selling ECO-licenses or something but I just think it is a pity that so few people realize what MDD (Model Driven Development) can do. Ability to solve more complex problems in less time and fewer bugs. This is very hard to measure but I have heard something like 5-10 more efficient development. And as I work with it every day I know this is true.
Maybe some traditional developer that is centered around data and SQL say:
"But what about performance?"
"I may loose control of what SQL is run!"
Well...
If you want to load 10000 instances of a table as fast as possible it may be better to use stored procedures, but most application don't do this. Both Bold and ECO use simple SQL queries to load data. Performance is highly dependent of the number of queries to the database to load a certain amount of data. Developer can help by saying this data belong to each other. Load them as effiecent as possible.
The actual queries that is executed can of course be logged to catch any performance problems. And if you really want to use your hyper optimized SQL query this is no problem as long as it don't update the database.
There is many ORM system to choose from, specially if you use dot.net. But honestly it is very very hard to do a good ORM framework. It should be concentrated around the model. If the model change, it should be an easy task to change database and the code dependent of the model. This make it easy to maintain. The cost for making small but many changes is very low. Many developers do the mistake to center around the database and adapt everthing around that. In my opinion this is not the best way to work.
More should try ECO. It is free to use an unlimited time as long the model is no more than 12 classes. You can do a lot with 12 classes!
I suggest you to use Code Smith Tool for creating Nettiers, that is a good option for developer.