ORM for large volume database - entity-framework

I am working on new project which have data oriented means very large volume of data (increasing day by day). So kindly suggest me which type of approach I should use to achieve desire functionality with out any hurdles.
Is database fully normalized?
Which ORM (linq2sql, entity framework) is suitable for this project?
Should I use stored procedures, db functions, triggers, etc?

Whether or not the database is normalized is something you need to know and need to answer!
As for the ORM: it really depends on the type of data and its structure.
Linq-to-SQL is a very simplistic ORM that basically just does a 1:1 mapping of tables to domain objects. As long as you don't need anything else - that's fine. Linq-to-SQL is no longer being actively developed, so that might be a drawback. Also, stored proc support is a bit limited.
Entity Framework (at least in .NET 4) is great and is the current ORM of choice at Microsoft - it's being actively developed, has a lot of backing, lot of flexibility. It offers database-first, model-first and code-first development styles, it supports POCO objects and self-tracking entities, and is very well integrated with stored procs (you can define a stored proc for INSERT, UPDATE, DELETE on every single entity, if you wish to do so). It would be my first choice.
NHibernate is a great, enterprise-level ORM, well established and being actively developed - certainly not a "dead-end" like Linq-to-SQL. I've used it a few years ago, and while it's great and powerful, it's also a bit harder to learn than EF4 (no visual designer, needs more manual labor, manual effort). It's great if you really need all its power and if you're willing to invest the necessary up-front learning time.
As for the database: stored procs are definitely worth while investigating, especially if you need to encapsulate certain database processing into a nice proc to call from your code. I would be rather careful and defensive about using triggers and functions too much - they have their place, but they shouldn't be overused, since they do carry some problems with them (mostly performance problems and problems of "discoverability" - lots of devs don't think about triggers that could be in place, and will not understand what's going on).

#Xulfee, that's a fairly broad question and a lot depends on the nature of your project. The approaches you reference affect a lot of aspects of the overall architecture. For example:
Is the database fully normalized?
Database normalization generally aids in tackling the problem of complexity of your conceptual model. When properly (note I did not say, "fully") normalized, your model should be fairly straight-forward and consumers of the database (developers, your BI team, domain experts, etc) should be able to get a good idea of the business problems that are being approached with your database. That having been said, normalization can lead to a fairly large reporting and analysis problem. When writing a query for a report against a large, fairly normalized database, you may introduce performance problems by joining a lot of tables. Enter snowflake schemas. So, to your question: it depends. What are you reporting requirements? How many transactions on average do you need to support? How complex is your conceptual model? Are you able to break the database into smaller models that are associated, rather than one large one?
Which ORM (linq2sql, entity framework) is suitable for this project?
Again, an ORM is a tool. You must ask yourself what is the specific job that you are trying to accomplish? The choice of an ORM (or in even using an ORM in the first place) is a decision that I would recommend you make fairly early on as it can affect everything from performance to development team cohesion. There are a lot of great choices out there:
Linq-To-Sql
NHibernate
Entity Framework
LLBLGen
Each of the above frameworks does a fantastic job of abstracting your persistence layer. Each has it's pro's and cons - the majority of which come down to infrastructure concerns: performance, configuration, schema/language compatibility, persistence patterns, vendor support. Given the choice, I would ask myself which of the frameworks is my development team most comfortable with? Which one supports the level of system activity that I expect? With which vendor am I willing to "throw-in"? I have seen fairly successful systems that use fairly small ORM's (i.e. Stackoverflow uses a modified version of Linq-To-Sql) as well as fairly large systems fail with fairly complex ORM's.
Should I use stored procedures, db functions, triggers, etc?
This question centers around your persistence store and how you use it (as well as how angry you want to make your DBA :) ). The use of sprocs (stored procedures) lends itself to allowing your dba to provide security at a very granular level. In addition, if the orm you are using generates dynamic sql, you might benefit from the database's ability to cache queries generated using sprocs. DB functions can be a double-sided blade. They offer the ability to add functionality and intelligence to your model, while at the same time allowing you to take a fairly large hit performance-wise (i.e. table-valued UDF's). Triggers have their own pitfalls and should be used with caution, but that discussion could get rather involved. The bottom-line for me in this case is: how much logic in the database do you want to support, and how important is security and performance? Do you have a qualified dba (not just a developer who knows how to write queries, but a dba who is capable of performance tuning and data modeling)? How big is your database? How complex is your data? Think about all of these questions and more when determining how you want to manage you data.
In summary, you are asking some good questions. Don't confuse infrastructure needs with implementation needs. Decide on a stack and run with it, don't get bogged-down in implementation details to the point at which you are unable to successfully complete the project. With the right level of abstraction, you may find it easier to try out new and different technologies without risking the overall success of the project. And remember: there's nothing wrong with experimenting and trying new things, just be prepared to fail gracefully and test, test, test!

Related

Why would you use NoSQL to build posts/comments over relational? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

Neo4j instead of relational database

I am implementing a sinatra/rails based web portal that might eventually have few many:many relationships between tables/models. This is a one man team and part time but real world app.
I discussed my entity with someone and was advised to try neo4j. Coming from real 'non-sexy' enterprise world, my inclination is to use relational db until it stops scaling or becomes a nightmare because of sharding etc and then think about anything else.
HOWEVER,
I am using postgres for the first time in this project along with datamapper and its taking me time to get started very fast
I am just trying out few things and building more use cases so I consitently have to update my schema (prototyping idea and feedback from beta) . I wont have to do this in neo4j (except changing my queries)
Seems like its very easy to setup search using neo4j . But Postgres can do full text search as well.
Postgres recently announced support for json and javascript. Wondering if I should just stick with PG and invest more time learning PG (which has a good community) instead neo4j.
Looking for usecases where neo4j is better, especially at protyping/initial phase of a project. I understand if the website grows I might end up having multiple persistent technologies like s3, relational (PG), mongo etc.
Also it would be good to know how it plays out with Rails/Ruby ecosystem.
Update1:
I got a lot of good answers and seems like the right thing to do is stick with Postgres for now (especially since I deploy to heroku)
However the idea of being schema-less is tempting. Basically I am thinking of a approach where you don't define a datamodel until you have say 100-150 users and you have yourself figured out a good schema (business use cases) for your product , while you are just demoing the concept and getting feedback with limited signups. Then one can decide a schema and start with relational.
Would be nice to know if there are easy to use schema/less persistence option (based on ease to use/setup for new user) that might give up say scaling etc.
Graph databases should be considered if you have a really chaotic data model. They were needed to express highly complex relationships between entities. To do that, they store relationships at the data level whereas RDBMS use a declarative approach. Storing relationships only makes sense if these relationships are very different, otherwise you'll just end up duplicating data over and over, taking a lot of space for nothing.
To require such variety in relationships you'd have to handle huge amount of data. This is where graph databases shines because instand of doing tons of joins, they just pick a record and follow his relationships. To support my statement : you'll notice that every use cases on Neo4j's website are dealing with very complex data.
In brief, if you don't feel concerned with what I said above, I think you should use another technology. If this is just about scaling, schemalessness or starting fast a project, then look at other NoSQL solutions (more specifically, either column or document oriented databases). Otherwise you should stick with PostgreSQL. You could also, like you said, consider polyglot persistence,
About your update, you might consider hStore. I think it fits your requirements. It's a PostgreSQL module which also works on Heroku.
I don't think I agree that you should only use a graph database when your data model is very complex. I'm sure they could handle a simple data model/relationships as well.
If you have no prior experience with Neo4j or Postgres, then most likely both with take quite a bit of time to learn well.
Some things to keep in mind when picking:
It's not just about development against a database technology. You should consider deployment as well. How easy is it to deploy and scale Postgres/Neo4j?
Consider the community and tools around each technology. Is there a data mapper for Neo4j like there is for Postgres?
Consider that the data models are considerably different between the two. If you can already think relationally, then I'd probably stick with Postgres. If you go with Neo4j you're going to be making a lot of mistakes for several months with your data models.
Over time I've learned to keep it simple when I can. Postgres might be the boring choice compared to Neo4j, but boring doesn't keep you up at night. =)
Also I never see anyone mention it, but you should look at Riak (http://basho.com/riak/) too. It's a document database that also provides relationships (links) between objects. Not as mature as a graph database, but it can connect a few entities quickly.
The most appropriate choice depends on what problem you are trying to solve.
If you just have a few many to many tables, a relational database can be fine. In general, there is better OR-mapper support for relational databases, as they are much older and have a standardized interface and row-column structure. They also have been improved on for a long time, so they are stable and optimized for what they are doing.
A graph database is better if e.g. your problem is more about the connections between entities, especially if you need higher distance connections, like "detect cycles (of unspecified length)", some "what do friends-of-a-friend like". Things like that get unwieldy when restricted to SQL joins. A problem specific language like cypher in case of Neo4j makes that much more concise. On the downside, there are mappers between graph dbs and objects, but not for every framework and language under the sun.
I recently implemented a system prototype using neo4j and it was very useful to be able to talk about the structure and connections of our data and be able to model that one to one in the data storage. Also, adding other connections between data points was easy, neo4j being a schemaless storage. We ended up switching to mongodb due to troubles with write performance, but I don't think we could have finished the prototype with that in the same time.
Other NoSQL datastores like document based, column, key-value also cover specific usecases. Polyglot persistence is definitively something to look at, so keep your choice of backend reasonably separated from your business logic, to allow you to change your technology later if you learned something new.

How to escape from ORMs limitations or should I avoid them?

In short, ORMs like Entity Framework provides a fast solution but with many limitations, When should they (ORMs) be avoided?
I want to create an engine of a DMS system, I wonder that how could I create the Business Logic Layer.
I'll discuss the following options:
Using Entity Framework and provides it as a Business later for the engine's clients.
The problem is that missing the control on the properties and the validation because it's generated code.
Create my own business layer classes manually without using Entity Framework or any ORM:
The problem is that it's a hard mission and something like reinvent the weel.
Create my own business layer classes up on the Entitiy Framework (use it)
The problem Seems to be code repeating by creating new classes with the same names and every property will cover the opposite one which is generated by the ORM.
Am I discuss the problem in a right way?
In short, ORMs should be avoided when:
your program will perform bulk inserts/updates/deletes (such as insert-selects, and updates/deletes that are conditional on something non-unique). ORMs are not designed to do these kinds of bulk operations efficiently; you will end up deleting each record one at a time.
you are using highly custom data types or conversions. ORMs are generally bad at dealing with BLOBs, and there are limits to how they can be told how to "map" objects.
you need the absolute highest performance in your communication with SQL Server. ORMs can suffer from N+1 problems and other query inefficiencies, and overall they add a layer of (usually reflective) translation between your request for an object and a SQL statement which will slow you down.
ORMs should instead be used in most cases of application-based record maintenance, where the user is viewing aggregated results and/or updating individual records, consisting of simple data types, one at a time. ORMs have the extreme advantage over raw SQL in their ability to provide compiler-checked queries using Linq providers; virtually all of the popular ORMs (Linq2SQL, EF, NHibernate, Azure) have a Linq query interface that can catch a lot of "fat fingers" and other common mistakes in queries that you don't catch when using "magic strings" to form SQLCommands. ORMs also generally provide database independence. Classic NHibernate HBM mappings are XML files, which can be swapped out as necessary to point the repository at MSS, Oracle, SQLite, Postgres, and other RDBMSes. Even "fluent" mappings, which are classes in code files, can be swapped out if correctly architected. EF has similar functionality.
So are you asking how to do "X" without doing "X"? ORM is an abstraction and as any other abstraction it has disadvantages but not those you mentioned.
Code (EFv4) can be generated by T4 template and T4 template is a code that can be modified
Generated code is partial class which can be combined with your partial part containing your logic
Writing classes manually is very common case - using designer as available in Entity framework is more rare
Disclaimer: I work at Mindscape that builds the LightSpeed ORM for .NET
As you don't ask about a specific issue, but about approaches to solving the flexibility problem with an ORM I thought I'd chime in with some views from a vendor perspective. It may or may not be of use to you but might give some food for thought :-)
When designing an O/R Mapper it's important to take into consideration what we call "escape hatches". An ORM will inevitably push a certain set of default behaviours which is one way that developer gain productivity gains.
One of the lessons we have learned with LightSpeed has been where developers need those escape hatches. For example, KeithS here states that ORMs are not good for bulk operations - and in most cases this is true. We had this scenario come up with some customers and added an overload to our Remove() operation that allowed you to pass in a query that removes all records that match. This saved having to load entities into memory and delete them. Listening to where developers are having pain and helping solve those problems quickly is important for helping build solid solutions.
All ORMs should efficiently batch queries. Having said that, we have been surprised to see that many ORMs don't. This is strange given that often batching can be done rather easily and several queries can be bundled up and sent to the database at once to save round trips. This is something we've done since day 1 for any database that supports it. That's just an aside to the point of batching made in this thread. The quality of those batches queries is the real challenge and, frankly, there are some TERRIBLE SQL statements being generated by some ORMs.
Overall you should select an ORM that gives you immediate productivity gains (almost demo-ware styled 'see I queried data in 30s!') but has also paid attention to larger scale solutions which is where escape hatches and some of the less demoed, but hugely useful features are needed.
I hope this post hasn't come across too salesy, but I wanted to draw attention to taking into account the thought process that goes behind any product when selecting it. If the philosophy matches the way you need to work then you're probably going to be happier than selecting one that does not.
If you're interested, you can learn about our LightSpeed ORM for .NET.
in my experience you should avoid use ORM when your application do the following data manipulation:
1)Bulk deletes: most of the ORM tools wont truly delete the data, they will mark it with a garbage collect ID (GC record) to keep the database consistency. The worst thing is that the ORM collect all the data you want to delete before it mark it as deleted. That means that if you want to delete 1000000 rows the ORM will first fetch the data, load it in your application, mark it as GC and then update the database;. which I believe is a huge waist of resources.
2)bulk inserts and data import:most of the ORM tools will create business layer validations on the business classes, this is good if you want to validate 1 record but if you are going to insert/import hundreds or even millions of records the process could take days.
3)Report generation: the ORM tools are good to create simple list reports or simple table joins like in a order-order_details scenario. but it most cases the ORM will only slow down the retrieve of the data and will add more joins that you need for a report. that translate in a give more work to the DB engine than you usually do with a SQL approach

Philosophy of correct working with ORM (Entity Framework )

I'm an old-school database programmer. And all my life i've working with database via DAL and stored procedures. Now i got a requirement to use Entity Framework.
Could you tell me your expirience and architecture best practicies how to work with it ?
As I know ORM was made for programmers who don't know SQL expression. And this is only benefit of ORM. Am I right ?
I got architecture document and I don't know clearly what I shoud do with ORM. I think that my steps should be:
1) Create complete database
2) Create high-level entities in model such "Price" which is realy consists from few database tables
3) Map database tables on entities.
An ORM does a lot more than just allow non-SQL programmers to talk to databases!
Instead of having to deal with loads of handwritten DAL code, and getting back a row/column representation of your data, an ORM turns each row of a table into a strongly-typed object.
So you end up with e.g. a Customer, and you can access its phone number as a strongly-typed property:
string customerPhone = MyCustomer.PhoneNumber;
That is a lot better than:
string customerPhone = MyCustomerTable.Rows[5].Column["PhoneNumber"].ToString();
You get no support whatsoever from the IDE in making this work - be aware of mistyping the column name! You won't find out 'til runtime - either you get no data back, or you get an exception.... no very pleasant.
It's first of all much easier to use that Customer object you get back, the properties are nicely available, strongly-typed, and discoverable in Intellisense, and so forth.
So besides possibly saving you from having to hand-craft a lot of boring SQL and DAL code, an ORM also brings a lot of benefits in using the data from the database - discoverability in your code editor, type safety and more.
I agree - the thought of an ORM generating SQL statements on the fly, and executing those, can be scary. But at least in Entity Framework v4 (.NET 4), Microsoft has done an admirable job of optimizing the SQL being used. It might not be perfect in 100% of the cases, but in a large percentage of the time, it's a lot better than any SQL any non-expert SQL programmer would write...
Plus: in EF4, if you really want to and see a need to, you can always define and use your own Stored procs for INSERT, UPDATE, DELETE on any entity.
I can relate to your sentiment of wanting to have complete control over your SQL. I have been researching ORM usage myself, and while I can't state a case nearly as well as marc_s has, I thought I might chime in with a couple more points.
I think the point of ORM is to shift the focus away from writing SQL and DAL code, and instead focus more on the business logic. You can be more agile with an ORM tool, because you don't have to refactor your data model or stored procedures every time you change your object model. In fact, ORM essentially give you a layer of abstraction, so you can potentially make changes to your schema without affecting your code, and vice-versa. ORM might not always generate the most efficient SQL, but you may benefit in faster development time. For small projects however, the benefits of ORM might not be worth the extra time spent configuring the ORM.
I know that doesn't answer your questions though.
To your 2nd question, it seems to me that many developers on S.O. here who are very skilled in SQL still advocate the use of and themselves use ORM tools such as Hibernate, LINQ to SQL, and Entity Framework. In fact, you still need to know SQL sometimes even if you use ORM, and it's typically the more complicated queries, so your theory about ORM being mainly "for programmers who don't know SQL" might be wrong. Plus you get caching from your ORM layer.
Furthermore, Jeff Atwood, who is the lead developer of S.O. (this site here), claims that he loves SQL (and I'd bet he's very good at it), and he also strives to avoid adding extra tenchnologies to his stack, but yet he choose to use LINQ to SQL to build S.O. Years ago already he claimed that, "Stored Procedures should be considered database assembly language: for use in only the most performance critical situations."
To your 1st question, here's another article from Jeff Atwood's blog that talks about varies ways (including using ORM) to deal with the object-relational impedance mistmatch problem, which helped me put things in perspective. It's also interesting because his opinion of ORM must have changed since then. In the article he said you should, "either abandon relational databases, or abandon objects," as well as, "I tend to err on the side of the database-as-model camp." But as I said, some of the bullet points helped put things into perspective for me.

Manual DAL & BLL vs. ORM

Which approach is better: 1) to use a third-party ORM system or 2) manually write DAL and BLL code to work with the database?
1) In one of our projects, we decided using the DevExpress XPO ORM system, and we ran across lots of slight problems that wasted a lot of our time. Amd still from time to time we encounter problems and exceptions that come from this ORM, and we do not have full understanding and control of this "black box".
2) In another project, we decided to write the DAL and BLL from scratch. Although this implied writing boring code many, many times, but this approach proved to be more versatile and flexible: we had full control over the way data was held in the database, how it was obtained from it, etc. And all the bugs could be fixed in a direct and easy way.
Which approach is generally better? Maybe the problem is just with the ORM that we used (DevExpress XPO), and maybe other ORMs are better (such as NHibernate)?
Is it worth using ADO Entiry Framework?
I found that the DotNetNuke CMS uses its own DAL and BLL code. What about other projects?
I would like to get info on your personal experience: which approach do you use in your projects, which is preferable?
Thank you.
My personal experience has been that ORM is usually a complete waste of time.
First, consider the history behind this. Back in the 60s and early 70s, we had these DBMSes using the hierarchical and network models. These were a bit of a pain to use, since when querying them you had to deal with all of the mechanics of retrieval: follow links between records all over the place and deal with the situation when the links weren't the links you wanted (e.g., were pointing in the wrong direction for your particular query). So Codd thought up the idea of a relational DBMS: specify the relationships between things, say in your query only what you want, and let the DBMS deal with figuring out the mechanics of retrieving it. Once we had a couple of good implementations of this, the database guys were overjoyed, everybody switched to it, and the world was happy.
Until the OO guys came along into the business world.
The OO guys found this impedance mismatch: the DBMSes used in business programming were relational, but internally the OO guys stored things with links (references), and found things by figuring out the details of which links they had to follow and following them. Yup: this is essentially the hierarchical or network DBMS model. So they put a lot of (often ingenious) effort into layering that hierarchical/network model back on to relational databases, incidently throwing out many of the advantages given to us by RDBMSes.
My advice is to learn the relational model, design your system around it if it's suitable (it very frequently is), and use the power of your RDBMS. You'll avoid the impedance mismatch, you'll generally find the queries easy to write, and you'll avoid performance problems (such as your ORM layer taking hundreds of queries to do what it ought to be doing in one).
There is a certain amount of "mapping" to be done when it comes to processing the results of a query, but this goes pretty easily if you think about it in the right way: the heading of the result relation maps to a class, and each tuple in the relation is an object. Depending on what further logic you need, it may or may not be worth defining an actual class for this; it may be easy enough just to work through a list of hashes generated from the result. Just go through and process the list, doing what you need to do, and you're done.
Perhaps a little of both is the right fit. You could use a product like SubSonic. That way, you can design your database, generate your DAL code (removing all that boring stuff), use partial classes to extend it with your own code, use Stored Procedures if you want to, and generally get more stuff done.
That's what I do. I find it's the right balance between automation and control.
I'd also point out that I think you're on the right path by trying out different approaches and seeing what works best for you. I think that's ultimately the source for your answer.
Recently I made the decision to use Linq to SQL on a new project, and I really like it. It is lightweight, high-performance, intuitive, and has many gurus at microsoft (and others) that blog about it.
Linq to SQL works by creating a data layer of c# objects from your database. DevExpress XPO works in the opposite direction, creating tables for your C# business objects. The Entity Framework is supposed to work either way. I am a database guy, so the idea of a framework designing the database for me doesn't make much sense, although I can see the attractiveness of that.
My Linq to SQL project is a medium-sized project (hundreds, maybe thousands of users). For smaller projects sometimes I just use SQLCommand and SQLConnection objects, and talk directly to the database, with good results. I have also used SQLDataSource objects as containers for my CRUD, but these seem clunky.
DALs make more sense the larger your project is. If it is a web application, I always use some sort of DAL because they have built-in protections against things like SQL injection attacks, better handling of null values, etc.
I debated whether to use the Entity Framework for my project, since Microsoft says this will be their go-to solution for data access in the future. But EF feels immature to me, and if you search StackOverflow for Entity Framework, you will find several people who are struggling with small, obtuse problems. I suspect version 2 will be much better.
I don't know anything about nHibernate, but there are people out there who love it and would not use anything else.
You might try using NHibernate. Since it's open source, it's not exactly a black box. It is very versatile, and it has many extensibility points for you to plug in your own additional or replacement functionality.
Comment 1:
NHibernate is a true ORM, in that it permits you to create a mapping between your arbitrary domain model (classes) and your arbitrary data model (tables, views, functions, and procedures). You tell it how you want your classes to be mapped to tables, for example, whether this class maps to two joined tables or two classes map to the same table, whether this class property maps to a many-to-many relation, etc. NHibernate expects your data model to be mostly normalized, but it does not require that your data model correspond precisely to your domain model, nor does it generate your data model.
Comment 2:
NHibernate's approach is to permit you to write any classes you like, and then after that to tell NHibernate how to map those classes to tables. There's no special base class to inherit from, no special list class that all your one-to-many properties have to be, etc. NHibernate can do its magic without them. In fact, your business object classes are not supposed to have any dependencies on NHibernate at all. Your business object classes, by themselves, have absolutely no persistence or database code in them.
You will most likely find that you can exercise very fine-grained control over the data-access strategies that NHibernate uses, so much so that NHibernate is likely to be an excellent choice for your complex cases as well. However, in any given context, you are free to use NHibernate or not to use it (in favor of more customized DAL code), as you like, because NHibernate tries not to get in your way when you don't need it. So you can use a custom DAL or DevExpress XPO in one BLL class (or method), and you can use NHibernate in another.
I recently took part in sufficiently large interesting project. I didn't join it from the beginning and we had to support already implemented architecture. Data access to all objects was implemented through stored procedures and automatically generated wrapper-methods on .NET that returned DataTable objects. The development process in such system was really slow and inefficient. We had to write huge stored procedure on PL/SQL for every query, that could be expressed in simple LINQ query. If we had used ORM, we would have implement project several times faster. And I don't see any advantage of such architecture.
I admit, that it is just particular not very successful project, but I made following conclusion: Before refusing to use ORM think twice, do you really need such flexibility and control over database? I think in most cases it isn't worth wasted time and money.
As others explain, there is a fundamental difficulty with ORM's that make it such that no existing solution does a very good job of doing the right thing, most of the time. This Blog Post: The Vietnam Of Computer Science echoes some of my feelings about it.
The executive summary is something along the lines of the assumptions and optimizations that are incompatible between object and relational models. although early returns are good, as the project progresses, the abstractions of the ORM fall short, and the extra overhead of working around it tends to cancel out the successes.
I have used Bold for Delphi four years now. It is great but it is not available anymore for sale and it lacks some features like databinding. ECO the successor has all that.
No I'm not selling ECO-licenses or something but I just think it is a pity that so few people realize what MDD (Model Driven Development) can do. Ability to solve more complex problems in less time and fewer bugs. This is very hard to measure but I have heard something like 5-10 more efficient development. And as I work with it every day I know this is true.
Maybe some traditional developer that is centered around data and SQL say:
"But what about performance?"
"I may loose control of what SQL is run!"
Well...
If you want to load 10000 instances of a table as fast as possible it may be better to use stored procedures, but most application don't do this. Both Bold and ECO use simple SQL queries to load data. Performance is highly dependent of the number of queries to the database to load a certain amount of data. Developer can help by saying this data belong to each other. Load them as effiecent as possible.
The actual queries that is executed can of course be logged to catch any performance problems. And if you really want to use your hyper optimized SQL query this is no problem as long as it don't update the database.
There is many ORM system to choose from, specially if you use dot.net. But honestly it is very very hard to do a good ORM framework. It should be concentrated around the model. If the model change, it should be an easy task to change database and the code dependent of the model. This make it easy to maintain. The cost for making small but many changes is very low. Many developers do the mistake to center around the database and adapt everthing around that. In my opinion this is not the best way to work.
More should try ECO. It is free to use an unlimited time as long the model is no more than 12 classes. You can do a lot with 12 classes!
I suggest you to use Code Smith Tool for creating Nettiers, that is a good option for developer.