I have a web application with:
1 Terabyte DB
200+ tables
At least 50 tables with 1+ million records each
10+ developers
1000s of concurrent users
This project is currently using Ad-Hoc Sql, which is generated by custom ORM solution.
Instead of supporting custom ORM (which is missing a lot of advanced features), I am thinking to switch to Entity Framework.
I used EF 4.1 (Code-First) on a smaller project and it worked pretty well, but is it scalable for a much larger project above?
I (highly) agree with marvelTracker (and Ayende's) thoughts.
Here is some further information though:
Key Strategy
There is a well-known cost when using GUIDs as Primary Keys. It was described by Jimmy Nilsson and it has been publicly available at http://www.informit.com/articles/article.aspx?p=25862. NHibernate supports the GUIDCOMB primary key strategy. However, to achieve that in EntityFramework is a little tricky and requires additional steps.
Enums
EntityFramework doesn’t support enums natively. Until June CTP which adds support for Enums http://blogs.msdn.com/b/adonet/archive/2011/06/30/walkthrough-enums-june-ctp.aspx the only way to map enumerations was using workarounds Please look at: How to work with Enums in Entity Framework?
Queries:
NHibernate offers many ways for querying data:
LINQ (using the re-motion’s re-linq provider, https://www.re-motion.org/web/)
Named Queries encapsulated in query objects
ICriteria/QueryOver for queries where the criteria are not known in advance
Using QueryOver projections and aggregates (In cases, we only need specific properties of an entity. In other cases, we may need the results of an aggregate function, such as average or count):
PagedQueries: In an effort to avoid overwhelming the user, and increase application responsiveness, large result sets are commonly broken into smaller pages of results.
MultiQueries that combine several ICriteria and QueryOver queries into a single database roundtrip
Detached Queries which are query objects in parts of the application without access to the NHibernate session. These objects are then executed elsewhere with a session. This is good because we can avoid complex repositories with many methods.
ISession’s QueryOver:
// Query that depends on a session:
premises = session.QueryOver<Premise>().List();
Detached QueryOver:
// Full reusable query!
var query = QueryOver.Of<Premise>();
// Then later, in some other part of ther application:
premises = query.GetExecutableQueryOver(session).List(); // Could pass IStateleSession too.
Open source
NHibernate has a lot of contribution projects available at http://sourceforge.net/projects/nhcontrib/
This project provides a number of very useful extensions to NHibernate (among other):
Cache Providers (for 2nd-level cache)
Dependency Injection for entities with no default constructor
Full-Text Search (Lucene.NET integration)
Spatial Support (NetTopologySuite integration)
Support
EntityFramework comes with Microsoft support.
NHibernate has an active community:
https://stackoverflow.com/questions/tagged/nhibernate
http://forum.hibernate.org/
http://groups.google.com/group/fluent-nhibernate
Also, have a look at:
http://www.infoq.com/news/2010/01/Comparing-NHibernate-EF-4
NHibernate is the best choice for you because it has good support of complex query, second level cacheing and great support of optimizations. I think EF is getting there. If you are dealing with Legacy systems NHibernate is the best approach.
http://ayende.com/blog/4351/nhibernate-vs-entity-framework-4-0
Suitable is an interesting term. Is it usable? Yes, and you'll find a number of nice features well suited toward rapid application development. That said, it's somewhat of a half baked technology, and lacks many advanced features of its own predecessor, LINQ to SQL (even 3 years after its first release). Here are a few annoyances:
Poor complex LINQ support
No Enum property types
Missing SQL Conversions (parse DateTime, parse int, etc.) (though you can implement these via model defined functions)
Poor SQL readability
Problems keeping multiple ssdl/csdl/msl resources independent for sharding (not really a problem with Code First)
Problems with running multiple concurrent Transactions in different ObjectContext's
Problems with Detached entity scenarios
That said, Microsoft has devoted a lot of effort to it, and hopefully it will continue to improve over time. I personally would spend time implementing a well abstracted Repository/Unit of Work pattern so your code doesn't know it's using EF at all and if necessary you can switch to another LINQ to DB provider in the future.
Most modern ORMs will be a step up from ad-hoc SQL.
Related
I am looking to implement an ORM into our system. We currently have many tables with lots of horrible data and stored procedures. I've heard using an ORM can slow the system down. Does anyone know which ORM is better on speed and performance using Queries created in the C# code and mapping to stored procedures?
Thanks
EDIT:
The project will use existing tables that are large and contain a lot of data, it will also use existing stored procedures that carry out complex tasks in a SQL Server DB. The ORM must be able to carry out transactions and have high performance when running the existing stored procedures and querying the current tables. The project is web based and will use WCF web services with DDD. I can see that EF is a lot easier to use and has greater support but if NH the most suitable option?
The Entity Framework is constantly implementing new features and everything is automated in your project. It is very easy to use Entity Framework, extend and refactor your code.
Visual Studio integrates it (Code-First, Database-first, Entity-First...) like a charm.
Windows Azure makes easy to deploy and change it.
Moreover Visual Studio can generate all your CRUD pages in 3 clicks.
I would suggest you use EF but it depends of your project. Can you give us more details about it?
You can find lots of comparison charts on Google, for example this one explains performance and this one differences.
EDIT:
Can you quantify the number of users in your application?
When you use Database-First in EntityFramework it is very easy to import and use stored procedures, for NHibernate it quite simple as well.
Note that if you use a lot of stored procedures and not a lot of simultaneous users the choice of a ORM between those two may not be so crucial.
Also don't forget the performance of a tool is often based on the way to use it. If you misuse the ORM (e.g. async, lazy/eager loading, base classes, ...) the performance will drastically drop.
Perhaps you can instal both of them, look at how they work and check at their roadmap (e.g. Entity framework) to check at the evolution and interest.
check this artical
Some Advantages of NH over EF
NH has 10+ id generator strategies (IDENTITY, sequence, HiLo, manual, increment, several GUIDs, etc), EF only has manual or SQL Server's IDENTITY;
NH has lazy property support (note: not entities, this is for string, XML, etc properties), EF doesn't;
NH has second level cache support (and, believe me, enterprise developers are using it) and EF doesn't;
NH has support for custom types, even complex, with "virtual" properties, which includes querying for these virtual properties, EF doesn't;
NH has formula properties, which can be any SQL, EF doesn't;
NH has automatic filters for entities and collections, EF doesn't;
NH supports collections of primitive types (strings, integers, etc) as well as components (complex types without identity), EF doesn't;
NH supports 6 kinds of collections (list, set, bag, map, array, id array), EF only supports one;
NH includes a proxy generator that can be used to customize the generated proxies, EF doesn't allow that;
NH has 3 mapping options (.HBM.XML, by code, by attributes) and EF only two (by attributes, by code);
NH allows query and insertion batching (this is because EF only really supports IDENTITY keys), EF doesn't;
NH has several optimistic control strategies (column on the db, including Oracle's ORA_ROWSCN, timestamp, integer version, all, dirty columns), EF only supports SQL Server' TIMESTAMP or all
Both are great solutions, although I personally think NHibernate is better for an inherited database.
There are some things that are clearly better in NHibernate, such as second-level caching support. Documentation is probably a bit sparser than EF, but if you're willing to go through the learning curve, NHibernate gives you a lot more power.
FluentNHibernate is great for typed mapping of classes to underlying tables but there are some places you will just have to revert to XML mappings. There is a new competing API from NHibernate itself however, and I have not checked it out yet (the above blog post mentions it).
If you want to rely on VS tooling support, EF is better. However there will be some magic sometimes (for e.g. EF can use reflection to even populate private properties of an object, NHibernate does not do that; this is a strength or a weakness depending on how you see it). EF also works well with other Microsoft supplied-frameworks (for e.g. RIA services). I also like EF-auto migrations (when you use code-first).
If you want more power in your hands and want to be able to fine-tune how things work with clear separation of concerns (ORM does only what the ORM should do), NH seems to be better. However, it is a bit irritating to make all the properties virtual for NH to be able to access them.
I have used both and it can get a bit clunky either ways sometimes to generate the sql you want; in those 5-10% cases, drop down one more level and use a micro-orm like Dapper, Massive or Petapoco.
EDIT:
NHibernate too can populate private properties, seemingly, so this was just ignorance on my part.
EF 5 or 6 on .NET 4.5 is one sure way to increase performance, don't expect any speed improvement with EF 5 on .NET 4.0 this is documented by Microsoft. (We have another issue with poorly written LINQ statements too).
In General, if performance is high priority, you can't beat ADO.NET with Stored Procedures. You simply cannot. Added ORM, adding and IOC container... how much testing for performance do you want to do?
Spin up a few VM's with JMETER and hit a server with EF and hit another with NHibernate, You just force JMETER to make enough calls to URLS and test the amount of concurrent users and you should see where your bottleneck are.
It may be worth adding here that Entity Framework has issues when attaching disconnected graphs and is missing functionality such as merge in Nhibernate. You can address it with other plugins, but not there out of the box. Here is a link to the feature on Codeplex requesting better support for working with disconnected entities, and there is some further discussion.
I am working on new project which have data oriented means very large volume of data (increasing day by day). So kindly suggest me which type of approach I should use to achieve desire functionality with out any hurdles.
Is database fully normalized?
Which ORM (linq2sql, entity framework) is suitable for this project?
Should I use stored procedures, db functions, triggers, etc?
Whether or not the database is normalized is something you need to know and need to answer!
As for the ORM: it really depends on the type of data and its structure.
Linq-to-SQL is a very simplistic ORM that basically just does a 1:1 mapping of tables to domain objects. As long as you don't need anything else - that's fine. Linq-to-SQL is no longer being actively developed, so that might be a drawback. Also, stored proc support is a bit limited.
Entity Framework (at least in .NET 4) is great and is the current ORM of choice at Microsoft - it's being actively developed, has a lot of backing, lot of flexibility. It offers database-first, model-first and code-first development styles, it supports POCO objects and self-tracking entities, and is very well integrated with stored procs (you can define a stored proc for INSERT, UPDATE, DELETE on every single entity, if you wish to do so). It would be my first choice.
NHibernate is a great, enterprise-level ORM, well established and being actively developed - certainly not a "dead-end" like Linq-to-SQL. I've used it a few years ago, and while it's great and powerful, it's also a bit harder to learn than EF4 (no visual designer, needs more manual labor, manual effort). It's great if you really need all its power and if you're willing to invest the necessary up-front learning time.
As for the database: stored procs are definitely worth while investigating, especially if you need to encapsulate certain database processing into a nice proc to call from your code. I would be rather careful and defensive about using triggers and functions too much - they have their place, but they shouldn't be overused, since they do carry some problems with them (mostly performance problems and problems of "discoverability" - lots of devs don't think about triggers that could be in place, and will not understand what's going on).
#Xulfee, that's a fairly broad question and a lot depends on the nature of your project. The approaches you reference affect a lot of aspects of the overall architecture. For example:
Is the database fully normalized?
Database normalization generally aids in tackling the problem of complexity of your conceptual model. When properly (note I did not say, "fully") normalized, your model should be fairly straight-forward and consumers of the database (developers, your BI team, domain experts, etc) should be able to get a good idea of the business problems that are being approached with your database. That having been said, normalization can lead to a fairly large reporting and analysis problem. When writing a query for a report against a large, fairly normalized database, you may introduce performance problems by joining a lot of tables. Enter snowflake schemas. So, to your question: it depends. What are you reporting requirements? How many transactions on average do you need to support? How complex is your conceptual model? Are you able to break the database into smaller models that are associated, rather than one large one?
Which ORM (linq2sql, entity framework) is suitable for this project?
Again, an ORM is a tool. You must ask yourself what is the specific job that you are trying to accomplish? The choice of an ORM (or in even using an ORM in the first place) is a decision that I would recommend you make fairly early on as it can affect everything from performance to development team cohesion. There are a lot of great choices out there:
Linq-To-Sql
NHibernate
Entity Framework
LLBLGen
Each of the above frameworks does a fantastic job of abstracting your persistence layer. Each has it's pro's and cons - the majority of which come down to infrastructure concerns: performance, configuration, schema/language compatibility, persistence patterns, vendor support. Given the choice, I would ask myself which of the frameworks is my development team most comfortable with? Which one supports the level of system activity that I expect? With which vendor am I willing to "throw-in"? I have seen fairly successful systems that use fairly small ORM's (i.e. Stackoverflow uses a modified version of Linq-To-Sql) as well as fairly large systems fail with fairly complex ORM's.
Should I use stored procedures, db functions, triggers, etc?
This question centers around your persistence store and how you use it (as well as how angry you want to make your DBA :) ). The use of sprocs (stored procedures) lends itself to allowing your dba to provide security at a very granular level. In addition, if the orm you are using generates dynamic sql, you might benefit from the database's ability to cache queries generated using sprocs. DB functions can be a double-sided blade. They offer the ability to add functionality and intelligence to your model, while at the same time allowing you to take a fairly large hit performance-wise (i.e. table-valued UDF's). Triggers have their own pitfalls and should be used with caution, but that discussion could get rather involved. The bottom-line for me in this case is: how much logic in the database do you want to support, and how important is security and performance? Do you have a qualified dba (not just a developer who knows how to write queries, but a dba who is capable of performance tuning and data modeling)? How big is your database? How complex is your data? Think about all of these questions and more when determining how you want to manage you data.
In summary, you are asking some good questions. Don't confuse infrastructure needs with implementation needs. Decide on a stack and run with it, don't get bogged-down in implementation details to the point at which you are unable to successfully complete the project. With the right level of abstraction, you may find it easier to try out new and different technologies without risking the overall success of the project. And remember: there's nothing wrong with experimenting and trying new things, just be prepared to fail gracefully and test, test, test!
In short, ORMs like Entity Framework provides a fast solution but with many limitations, When should they (ORMs) be avoided?
I want to create an engine of a DMS system, I wonder that how could I create the Business Logic Layer.
I'll discuss the following options:
Using Entity Framework and provides it as a Business later for the engine's clients.
The problem is that missing the control on the properties and the validation because it's generated code.
Create my own business layer classes manually without using Entity Framework or any ORM:
The problem is that it's a hard mission and something like reinvent the weel.
Create my own business layer classes up on the Entitiy Framework (use it)
The problem Seems to be code repeating by creating new classes with the same names and every property will cover the opposite one which is generated by the ORM.
Am I discuss the problem in a right way?
In short, ORMs should be avoided when:
your program will perform bulk inserts/updates/deletes (such as insert-selects, and updates/deletes that are conditional on something non-unique). ORMs are not designed to do these kinds of bulk operations efficiently; you will end up deleting each record one at a time.
you are using highly custom data types or conversions. ORMs are generally bad at dealing with BLOBs, and there are limits to how they can be told how to "map" objects.
you need the absolute highest performance in your communication with SQL Server. ORMs can suffer from N+1 problems and other query inefficiencies, and overall they add a layer of (usually reflective) translation between your request for an object and a SQL statement which will slow you down.
ORMs should instead be used in most cases of application-based record maintenance, where the user is viewing aggregated results and/or updating individual records, consisting of simple data types, one at a time. ORMs have the extreme advantage over raw SQL in their ability to provide compiler-checked queries using Linq providers; virtually all of the popular ORMs (Linq2SQL, EF, NHibernate, Azure) have a Linq query interface that can catch a lot of "fat fingers" and other common mistakes in queries that you don't catch when using "magic strings" to form SQLCommands. ORMs also generally provide database independence. Classic NHibernate HBM mappings are XML files, which can be swapped out as necessary to point the repository at MSS, Oracle, SQLite, Postgres, and other RDBMSes. Even "fluent" mappings, which are classes in code files, can be swapped out if correctly architected. EF has similar functionality.
So are you asking how to do "X" without doing "X"? ORM is an abstraction and as any other abstraction it has disadvantages but not those you mentioned.
Code (EFv4) can be generated by T4 template and T4 template is a code that can be modified
Generated code is partial class which can be combined with your partial part containing your logic
Writing classes manually is very common case - using designer as available in Entity framework is more rare
Disclaimer: I work at Mindscape that builds the LightSpeed ORM for .NET
As you don't ask about a specific issue, but about approaches to solving the flexibility problem with an ORM I thought I'd chime in with some views from a vendor perspective. It may or may not be of use to you but might give some food for thought :-)
When designing an O/R Mapper it's important to take into consideration what we call "escape hatches". An ORM will inevitably push a certain set of default behaviours which is one way that developer gain productivity gains.
One of the lessons we have learned with LightSpeed has been where developers need those escape hatches. For example, KeithS here states that ORMs are not good for bulk operations - and in most cases this is true. We had this scenario come up with some customers and added an overload to our Remove() operation that allowed you to pass in a query that removes all records that match. This saved having to load entities into memory and delete them. Listening to where developers are having pain and helping solve those problems quickly is important for helping build solid solutions.
All ORMs should efficiently batch queries. Having said that, we have been surprised to see that many ORMs don't. This is strange given that often batching can be done rather easily and several queries can be bundled up and sent to the database at once to save round trips. This is something we've done since day 1 for any database that supports it. That's just an aside to the point of batching made in this thread. The quality of those batches queries is the real challenge and, frankly, there are some TERRIBLE SQL statements being generated by some ORMs.
Overall you should select an ORM that gives you immediate productivity gains (almost demo-ware styled 'see I queried data in 30s!') but has also paid attention to larger scale solutions which is where escape hatches and some of the less demoed, but hugely useful features are needed.
I hope this post hasn't come across too salesy, but I wanted to draw attention to taking into account the thought process that goes behind any product when selecting it. If the philosophy matches the way you need to work then you're probably going to be happier than selecting one that does not.
If you're interested, you can learn about our LightSpeed ORM for .NET.
in my experience you should avoid use ORM when your application do the following data manipulation:
1)Bulk deletes: most of the ORM tools wont truly delete the data, they will mark it with a garbage collect ID (GC record) to keep the database consistency. The worst thing is that the ORM collect all the data you want to delete before it mark it as deleted. That means that if you want to delete 1000000 rows the ORM will first fetch the data, load it in your application, mark it as GC and then update the database;. which I believe is a huge waist of resources.
2)bulk inserts and data import:most of the ORM tools will create business layer validations on the business classes, this is good if you want to validate 1 record but if you are going to insert/import hundreds or even millions of records the process could take days.
3)Report generation: the ORM tools are good to create simple list reports or simple table joins like in a order-order_details scenario. but it most cases the ORM will only slow down the retrieve of the data and will add more joins that you need for a report. that translate in a give more work to the DB engine than you usually do with a SQL approach
I am new to CSLA and Entity Framework. I am creating a new CSLA / Silverlight application that will replace a 12 year old Win32 C++ system. The old system uses a custom DCOM business object library and uses ODBC to get to SQL Server. The new system will not immediately replace the old system -- they must coexist against the same database for years to come.
At first I thought EF was the way to go since it is the latest and greatest. After making a small EF model and only 2 CSLA editable root objects (I will eventually have hundreds of objects as my DB has 800+ tables) I am seriously questioning the use of EF.
In the current system I have the need many times to do fine detail performance tuning of the queries which I can do because of 100% control of generated SQL. But it seems in EF that so much happens behind the scenes that I lose that control. Article like http://toomanylayers.blogspot.com/2009/01/entity-framework-and-linq-to-sql.html don't help my impression of EF.
People seem to like EF because of LINQ to EF but since my criteria is passed between client and server as criteria object it seems like I could build queries just as easily without LINQ. I understand in WCF RIA that there is query projection (or something like that) where I can do client side LINQ which does move to the server before translation into actual SQL so in that case I can see the benefit of EF, but not in CSLA.
If I use raw ADO.NET, will I regret my decision 5 years from now?
Has anyone else made this choice recently and which way did you go?
In your case, I would still choose EF over doing it all by hand.
Why? EF - especially in .NET 4 - has matured considerably. It will allow you to do most of your database operations a lot easier and with a lot less code than if you have to all hand-code your data access code.
And in cases where you do need the absolute maximum performance, you can always plug in stored procedures for insert, update, delete which EF4 will then use instead of the default behavior of creating the SQL statements on the fly.
EF4 has a much better stored proc integration, and this really opens up the best of both worlds:
use the high productivity of EF for the 80% cases where performance isn't paramount
fine tune and handcraft stored procs for the remaining 20% and plug them into EF4
See some resources:
Using Stored Procedures for Insert, Update and Delete in an Entity Data Model
Practical Entity Framework for C#: Stored Procedures (video)
You seem to have a mix of requirements and a mix of solutions.
I normally rate each requirement with an essential, nice to have, not essential. And then see what works.
I agree with what #marc_s has said, you can have the best of both worlds.
The only other thing I would say is that if this solution is to be around for the next 5 years, have you considered Unit Testing?
There's plenty of examples on how to set this up using EF. (I personally avoid ADO.Net just because the seperating of concerns is so complicated for Unit Tests.)
There is no easy solution. I would pick a feature in your project that would take you a day or so to do. Try the different methods (raw sql, EF, EF + Stored Procs) and see what works!
Take an objective look at CSLA - invoke the 'DataPortal' and check out the call stack.
Next, put those classes on a CI build server that stores runtime data and provides a scatter plot over a series of runs.
Next, look at the code that gets created. Ask yourself how you can use things like dependecy injection in light of classes that rely on static creators with protected/private constructors.
Next, take a look at how many responsibilities the 'CSLA' classes take on.
Finally ask yourself if creating objects with different constructors per environment make sense, and ask yourself how you will unit test those.
We have used Entity Framework on 2 projects both with several 100 tables.
Our experiance is mainly positive. We have had large productivity gains, compare with using Enterprise Library and stored procedures.
However, when I suggest using EF on stackoverflow, I often get negative comments.
On the negative side we have found that there is a steep learning curve for certain functionality.
Finally, to the question: What problems have people had with EF, why do they prefer other ORMS?
Like you, my experience with the EF is mostly positive. The biggest problem I've had is that very complex queries can take a long time to compile. The visual designer is also much less stable and has fewer features than the framework itself. I wish the framework would put the GeneratedCode attribute on code it generates.
I recently used EF and had a relatively good experience with it. I too see a lot of negative feedback around EF, which I think is unfortunate considering all that it offers.
One issue that surprised me was the performance difference between two strategies of fetching data. Initially, I figured that doing eager loading would be more efficient since it would pull the data via a single query. In this case, the data was an order and I was doing an eager load on 5-8 related tables. During development, we found this query to be unreasonably slow. Using SQL profiler, we watched the traffic and analyzed the resulting queries. The generated SQL statement was huge and SQL Server didn't seem to like it all that much.
To work around the issue, I reverted to a lazy-loading / on-demand mode, which resulted in more queries to the server, but a significant boost in performance. This was not what I initially expected. My take-away, which IMHO holds true for all data access implementations, is that I really need to perf test the data access. This is true regardless of whether I use an ORM or SQL procs or parameterized SQL, etc.
I use Entity Framework too and have found the following disadvantages:
I can't work with Oracle that is really necessary for me.
Model Designer for Entity Framework. During update model from database storage part is regenerated too. It is very uncomfortably.
Doesn't have support for instead of triggers in Entity framework.