Confusion about NoSQL Design - mongodb

I know that the NoSQL is not the relational database therefore I cannot draw the ERD or other method which can only be applied to relational database.
My confusion is: What kind of method or diagram should I illustrate to design a NoSQL database?
Thanks.

Here's an abstract from a recent 10gen event presentation suggesting that mind maps are the most logical tool for the job. I expect more specialized tools to emerge, but in general, mind maps align well with non-relational schema design.
"Most of us are visual learners. Often, visual learners will find that information "clicks" when it is explained with the aid of a chart or picture. For MongoDB that picture is a leaf representing a natural approach to databases. In the RDBMS world a database schema is "visualized" through an Entity Relationship (ER) diagram. An ER diagram is the primary communication tool about an RDBMS data model. MongoDB provides a powerful dynamic database schema. However it is sometimes difficult to visualize. An accurate visualization of a MongoDB schema dramatically increases the ability to communicate the flexibility and power MongoDB between developers, architects, DBAs and end users. A mind map is a visual thinking tool that helps structure information, do better analysis, comprehend, synthesize and generate new ideas. Its power lies in its simplicity, much like MongoDB. Using a mind mapping open source tool, a clear and vibrant visualization of a dynamic MongoDB schema can be created that "clicks." Further, it works the other way around - mind maps can be used to create a dynamic schema in MongoDB. The mind mapping process allows non-technical business users to visually develop their requirements on the fly. During design process the mind map provides a flexible visual tool which changes in a fluid manner."

You can fairly easily use the standard tools however it depends upon your specific scenario and the problem you are looking to solve. I recently had a conversation about this actually: https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/xZCwEm06eU4 which might help however that conversation is also quite specialised.
I have been thinking ever since that instead of repeating myself on this I should actually write a manual for drawing UML diagrams, MongoDB style.
Maybe if you explain your perspective on what UML diagram you wish to draw then we could provide a more detailed answer on how to accomplish a type of NoSQL representation of them.

Related

Why would you use NoSQL to build posts/comments over relational? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

Neo4j instead of relational database

I am implementing a sinatra/rails based web portal that might eventually have few many:many relationships between tables/models. This is a one man team and part time but real world app.
I discussed my entity with someone and was advised to try neo4j. Coming from real 'non-sexy' enterprise world, my inclination is to use relational db until it stops scaling or becomes a nightmare because of sharding etc and then think about anything else.
HOWEVER,
I am using postgres for the first time in this project along with datamapper and its taking me time to get started very fast
I am just trying out few things and building more use cases so I consitently have to update my schema (prototyping idea and feedback from beta) . I wont have to do this in neo4j (except changing my queries)
Seems like its very easy to setup search using neo4j . But Postgres can do full text search as well.
Postgres recently announced support for json and javascript. Wondering if I should just stick with PG and invest more time learning PG (which has a good community) instead neo4j.
Looking for usecases where neo4j is better, especially at protyping/initial phase of a project. I understand if the website grows I might end up having multiple persistent technologies like s3, relational (PG), mongo etc.
Also it would be good to know how it plays out with Rails/Ruby ecosystem.
Update1:
I got a lot of good answers and seems like the right thing to do is stick with Postgres for now (especially since I deploy to heroku)
However the idea of being schema-less is tempting. Basically I am thinking of a approach where you don't define a datamodel until you have say 100-150 users and you have yourself figured out a good schema (business use cases) for your product , while you are just demoing the concept and getting feedback with limited signups. Then one can decide a schema and start with relational.
Would be nice to know if there are easy to use schema/less persistence option (based on ease to use/setup for new user) that might give up say scaling etc.
Graph databases should be considered if you have a really chaotic data model. They were needed to express highly complex relationships between entities. To do that, they store relationships at the data level whereas RDBMS use a declarative approach. Storing relationships only makes sense if these relationships are very different, otherwise you'll just end up duplicating data over and over, taking a lot of space for nothing.
To require such variety in relationships you'd have to handle huge amount of data. This is where graph databases shines because instand of doing tons of joins, they just pick a record and follow his relationships. To support my statement : you'll notice that every use cases on Neo4j's website are dealing with very complex data.
In brief, if you don't feel concerned with what I said above, I think you should use another technology. If this is just about scaling, schemalessness or starting fast a project, then look at other NoSQL solutions (more specifically, either column or document oriented databases). Otherwise you should stick with PostgreSQL. You could also, like you said, consider polyglot persistence,
About your update, you might consider hStore. I think it fits your requirements. It's a PostgreSQL module which also works on Heroku.
I don't think I agree that you should only use a graph database when your data model is very complex. I'm sure they could handle a simple data model/relationships as well.
If you have no prior experience with Neo4j or Postgres, then most likely both with take quite a bit of time to learn well.
Some things to keep in mind when picking:
It's not just about development against a database technology. You should consider deployment as well. How easy is it to deploy and scale Postgres/Neo4j?
Consider the community and tools around each technology. Is there a data mapper for Neo4j like there is for Postgres?
Consider that the data models are considerably different between the two. If you can already think relationally, then I'd probably stick with Postgres. If you go with Neo4j you're going to be making a lot of mistakes for several months with your data models.
Over time I've learned to keep it simple when I can. Postgres might be the boring choice compared to Neo4j, but boring doesn't keep you up at night. =)
Also I never see anyone mention it, but you should look at Riak (http://basho.com/riak/) too. It's a document database that also provides relationships (links) between objects. Not as mature as a graph database, but it can connect a few entities quickly.
The most appropriate choice depends on what problem you are trying to solve.
If you just have a few many to many tables, a relational database can be fine. In general, there is better OR-mapper support for relational databases, as they are much older and have a standardized interface and row-column structure. They also have been improved on for a long time, so they are stable and optimized for what they are doing.
A graph database is better if e.g. your problem is more about the connections between entities, especially if you need higher distance connections, like "detect cycles (of unspecified length)", some "what do friends-of-a-friend like". Things like that get unwieldy when restricted to SQL joins. A problem specific language like cypher in case of Neo4j makes that much more concise. On the downside, there are mappers between graph dbs and objects, but not for every framework and language under the sun.
I recently implemented a system prototype using neo4j and it was very useful to be able to talk about the structure and connections of our data and be able to model that one to one in the data storage. Also, adding other connections between data points was easy, neo4j being a schemaless storage. We ended up switching to mongodb due to troubles with write performance, but I don't think we could have finished the prototype with that in the same time.
Other NoSQL datastores like document based, column, key-value also cover specific usecases. Polyglot persistence is definitively something to look at, so keep your choice of backend reasonably separated from your business logic, to allow you to change your technology later if you learned something new.

recommendations for a dbms for an EAV system with mostly insert and select operations needs on .net stack

In the project I have been working on, the data modeling requirements are:
A system consisting of N number of clients with each having N number of events. An event is an entity with a required name and timestamp at which it occurs. Optionally, an event may have N number of properties (key/value pares) defining attributes that a client want to store with the particular instance of that event.
The system will have mostly:
inserts – events are logged but never updated.
selects – reports/actions will be generated/executed based on events and properties of any possible combinations.
The requirements reflect an entity-attribute-value (EAV) data model. After researching for sometimes, I feel that a relational dbms like Sql Server might not be a good fit for this. (correct me if I'm wrong!)
So I'm leaning toward NoSql option like MongoDb/CouchDb/RavenDb etc.
My questions are:
What is the best fit in available NoSql solutions keeping in view of my system's heavy insert/select needs?
I'm also open for relational option if these requirements can be translated into relational schema. Although I personally doubt this, but after reading performance DBA answers (like referenced here), I got curious. However, I couldn't figure out myself an optimal relational model for my requirements, perhaps the system being rather generic.
thanks!
MongoDB really shines when you write unstructured data to it (like your event). Also, it is able to sustain pretty heavy write load. However, it's not very good for reporting. At least, for reporting in the traditional sense.
So, if your reporting needs are simple, you might get away with some simple map-reduce jobs. Otherwise you can export data to a relational database (nightly job, for example) and report the hell out of it.
Such hybrid solution is pretty common (in my experience).

Manual DAL & BLL vs. ORM

Which approach is better: 1) to use a third-party ORM system or 2) manually write DAL and BLL code to work with the database?
1) In one of our projects, we decided using the DevExpress XPO ORM system, and we ran across lots of slight problems that wasted a lot of our time. Amd still from time to time we encounter problems and exceptions that come from this ORM, and we do not have full understanding and control of this "black box".
2) In another project, we decided to write the DAL and BLL from scratch. Although this implied writing boring code many, many times, but this approach proved to be more versatile and flexible: we had full control over the way data was held in the database, how it was obtained from it, etc. And all the bugs could be fixed in a direct and easy way.
Which approach is generally better? Maybe the problem is just with the ORM that we used (DevExpress XPO), and maybe other ORMs are better (such as NHibernate)?
Is it worth using ADO Entiry Framework?
I found that the DotNetNuke CMS uses its own DAL and BLL code. What about other projects?
I would like to get info on your personal experience: which approach do you use in your projects, which is preferable?
Thank you.
My personal experience has been that ORM is usually a complete waste of time.
First, consider the history behind this. Back in the 60s and early 70s, we had these DBMSes using the hierarchical and network models. These were a bit of a pain to use, since when querying them you had to deal with all of the mechanics of retrieval: follow links between records all over the place and deal with the situation when the links weren't the links you wanted (e.g., were pointing in the wrong direction for your particular query). So Codd thought up the idea of a relational DBMS: specify the relationships between things, say in your query only what you want, and let the DBMS deal with figuring out the mechanics of retrieving it. Once we had a couple of good implementations of this, the database guys were overjoyed, everybody switched to it, and the world was happy.
Until the OO guys came along into the business world.
The OO guys found this impedance mismatch: the DBMSes used in business programming were relational, but internally the OO guys stored things with links (references), and found things by figuring out the details of which links they had to follow and following them. Yup: this is essentially the hierarchical or network DBMS model. So they put a lot of (often ingenious) effort into layering that hierarchical/network model back on to relational databases, incidently throwing out many of the advantages given to us by RDBMSes.
My advice is to learn the relational model, design your system around it if it's suitable (it very frequently is), and use the power of your RDBMS. You'll avoid the impedance mismatch, you'll generally find the queries easy to write, and you'll avoid performance problems (such as your ORM layer taking hundreds of queries to do what it ought to be doing in one).
There is a certain amount of "mapping" to be done when it comes to processing the results of a query, but this goes pretty easily if you think about it in the right way: the heading of the result relation maps to a class, and each tuple in the relation is an object. Depending on what further logic you need, it may or may not be worth defining an actual class for this; it may be easy enough just to work through a list of hashes generated from the result. Just go through and process the list, doing what you need to do, and you're done.
Perhaps a little of both is the right fit. You could use a product like SubSonic. That way, you can design your database, generate your DAL code (removing all that boring stuff), use partial classes to extend it with your own code, use Stored Procedures if you want to, and generally get more stuff done.
That's what I do. I find it's the right balance between automation and control.
I'd also point out that I think you're on the right path by trying out different approaches and seeing what works best for you. I think that's ultimately the source for your answer.
Recently I made the decision to use Linq to SQL on a new project, and I really like it. It is lightweight, high-performance, intuitive, and has many gurus at microsoft (and others) that blog about it.
Linq to SQL works by creating a data layer of c# objects from your database. DevExpress XPO works in the opposite direction, creating tables for your C# business objects. The Entity Framework is supposed to work either way. I am a database guy, so the idea of a framework designing the database for me doesn't make much sense, although I can see the attractiveness of that.
My Linq to SQL project is a medium-sized project (hundreds, maybe thousands of users). For smaller projects sometimes I just use SQLCommand and SQLConnection objects, and talk directly to the database, with good results. I have also used SQLDataSource objects as containers for my CRUD, but these seem clunky.
DALs make more sense the larger your project is. If it is a web application, I always use some sort of DAL because they have built-in protections against things like SQL injection attacks, better handling of null values, etc.
I debated whether to use the Entity Framework for my project, since Microsoft says this will be their go-to solution for data access in the future. But EF feels immature to me, and if you search StackOverflow for Entity Framework, you will find several people who are struggling with small, obtuse problems. I suspect version 2 will be much better.
I don't know anything about nHibernate, but there are people out there who love it and would not use anything else.
You might try using NHibernate. Since it's open source, it's not exactly a black box. It is very versatile, and it has many extensibility points for you to plug in your own additional or replacement functionality.
Comment 1:
NHibernate is a true ORM, in that it permits you to create a mapping between your arbitrary domain model (classes) and your arbitrary data model (tables, views, functions, and procedures). You tell it how you want your classes to be mapped to tables, for example, whether this class maps to two joined tables or two classes map to the same table, whether this class property maps to a many-to-many relation, etc. NHibernate expects your data model to be mostly normalized, but it does not require that your data model correspond precisely to your domain model, nor does it generate your data model.
Comment 2:
NHibernate's approach is to permit you to write any classes you like, and then after that to tell NHibernate how to map those classes to tables. There's no special base class to inherit from, no special list class that all your one-to-many properties have to be, etc. NHibernate can do its magic without them. In fact, your business object classes are not supposed to have any dependencies on NHibernate at all. Your business object classes, by themselves, have absolutely no persistence or database code in them.
You will most likely find that you can exercise very fine-grained control over the data-access strategies that NHibernate uses, so much so that NHibernate is likely to be an excellent choice for your complex cases as well. However, in any given context, you are free to use NHibernate or not to use it (in favor of more customized DAL code), as you like, because NHibernate tries not to get in your way when you don't need it. So you can use a custom DAL or DevExpress XPO in one BLL class (or method), and you can use NHibernate in another.
I recently took part in sufficiently large interesting project. I didn't join it from the beginning and we had to support already implemented architecture. Data access to all objects was implemented through stored procedures and automatically generated wrapper-methods on .NET that returned DataTable objects. The development process in such system was really slow and inefficient. We had to write huge stored procedure on PL/SQL for every query, that could be expressed in simple LINQ query. If we had used ORM, we would have implement project several times faster. And I don't see any advantage of such architecture.
I admit, that it is just particular not very successful project, but I made following conclusion: Before refusing to use ORM think twice, do you really need such flexibility and control over database? I think in most cases it isn't worth wasted time and money.
As others explain, there is a fundamental difficulty with ORM's that make it such that no existing solution does a very good job of doing the right thing, most of the time. This Blog Post: The Vietnam Of Computer Science echoes some of my feelings about it.
The executive summary is something along the lines of the assumptions and optimizations that are incompatible between object and relational models. although early returns are good, as the project progresses, the abstractions of the ORM fall short, and the extra overhead of working around it tends to cancel out the successes.
I have used Bold for Delphi four years now. It is great but it is not available anymore for sale and it lacks some features like databinding. ECO the successor has all that.
No I'm not selling ECO-licenses or something but I just think it is a pity that so few people realize what MDD (Model Driven Development) can do. Ability to solve more complex problems in less time and fewer bugs. This is very hard to measure but I have heard something like 5-10 more efficient development. And as I work with it every day I know this is true.
Maybe some traditional developer that is centered around data and SQL say:
"But what about performance?"
"I may loose control of what SQL is run!"
Well...
If you want to load 10000 instances of a table as fast as possible it may be better to use stored procedures, but most application don't do this. Both Bold and ECO use simple SQL queries to load data. Performance is highly dependent of the number of queries to the database to load a certain amount of data. Developer can help by saying this data belong to each other. Load them as effiecent as possible.
The actual queries that is executed can of course be logged to catch any performance problems. And if you really want to use your hyper optimized SQL query this is no problem as long as it don't update the database.
There is many ORM system to choose from, specially if you use dot.net. But honestly it is very very hard to do a good ORM framework. It should be concentrated around the model. If the model change, it should be an easy task to change database and the code dependent of the model. This make it easy to maintain. The cost for making small but many changes is very low. Many developers do the mistake to center around the database and adapt everthing around that. In my opinion this is not the best way to work.
More should try ECO. It is free to use an unlimited time as long the model is no more than 12 classes. You can do a lot with 12 classes!
I suggest you to use Code Smith Tool for creating Nettiers, that is a good option for developer.

Are there any data warehouse frameworks?

I've got a lot of mysql data that I need to generate reports from. It's mostly historic data so it won't be changing much, but it weighs in at 20-30 gigabytes easily and is expected to grow. I currently have a collection of php scripts that will do some complex queries and output csv and excel files. I also use phpMyAdmin with bookmarked queries. I manually edit them to change the parameters. The amount of data is growing and the number of people who need access to it is also growing, so I'm making the time to improve this situation.
I started reading about data warehousing the other day and it seems that this an area that relates to what I need to do. I've read some good articles and am even waiting on a book. I think I'm getting a handle on what these sorts of systems do and what's possible.
Creating a reporting system for my data has always been on a todo list, but until recently I figured it would be a highly niche programing venture. Since I now know data warehousing is a common thing, I figure there must be some sort of reporting/warehousing frames available to ease in the development. I'd gladly skip writing interfaces and scripts to schedule and email reports and the like and stick to writing queries and setting up relations.
I've mostly been a lamp guy, but I'm not above switching languages or platforms. I just need a more robust solution as my one off scripts don't scale well.
So where's a good place to get started?
I'll discuss a few points on the {budget, business utility function, time frame} spectrum out there. For convenience, let's follow the architecture conceptualization you linked to at
WikipediaDataWarehouseArticle
Operational database layer
The source data for the data warehouse - Normalized for In One Place Only data maintenance
Data access layer
The transformation of your source data into your informational access layer. ETL tools to extract, transform, load data into the warehouse fall into this layer.
Informational access layer
• Report-facilitating Data Structure
Data is not maintained here. It is merely a reflection of your source data
Hence, denormalized structures (containing duplicate, but systematically derived data)
are usually most effective here
• Reporting tools
How do you actually allow your users access to the data
• pre-canned reports (simple)
• more dynamic slice-and-dice access methods
The data accessed for reporting and analyzing and the tools for reporting and analyzing data
fall into this layer. And the Inmon-Kimball differences about design methodology,
discussed later in the Wikipedia article, have to do with this layer.
Metadata layer (facilitates automation, organization, etc)
Roll your own (low-end)
For very little out-of-pocket cost, just recognizing the need for the denormalized structures can buy those that are not using it some efficiencies
Get in the ballgame (some outlays required)
You don't need to use all the functionality of a platform right off the bat.
IMO, however, you want to be on a platform that you know will grow, and in the highly competitive and consolidating BI environment, that seems to be one of the four enterprise mega-vendors (my opinion)
Microsoft (the platform of our 110 employee firm)
SAP
Oracle
IBM
BiMarketStateArticle
My firm is at this stage, using some of the ETL capability offered by SQL Server Integration Services (SSIS) and some alternate usage of the open source, but in practice license requiring Talend product in the "Data Access Layer", a denormalized reporting structure (implemented completely in the basic SQL Server database), and SQL Server Reporting Services (SSRS) to largely automate (based on your skill) the production of pre-specified reports. Note that an SSRS "report" is merely a (scalable) XML configuration/specification that gets rendered at runtime via the SSRS engine. Choices such as export to an excel file are simple options.
Serious Commitment (some significant human commitment required)
Notice above that we have yet to utilize the data mining/dynamic slicing/dicing
capabilities of SQL Server Analysis Services. We are working toward that,
but now focused on improving the quality of our data cleansing in the "Data Access Layer".
I hope this helps you to get a sense of where to start looking.
Pentaho has put together a pretty comprehensive suite of products. The products are "free", but be prepared for the usual heavy sell once you fork over your identifying information.
I haven't had a chance to really stretch them as we're a Microsoft shop from one sad end to the other.
I think you should first check out Kimball and Inmon and see if you want to approach your data warehouse in a particular way. Kimball, in particular, lays out a very good framework for the modelling and construction of the warehouse.
There are a number of tools which try to make the process of designing, implementing and managing/operating a Data Warehouse and they each have their strengths and weaknesses and often vastly differing price points. Under the covers you are always going to be best off if you have a good knowledge of warsehousing principles from the Kimball and/or Inmon camps.
As well as tools like Kalido and Wherescape RED (which do similar thing in very different ways), many of the ETL platforms now have good in-built support for the donkey work of implementation - SCD components etc and lineage tracking.
Best though to view all these as tools to be used in the hands of you, the craftsman, they make certain easy things even easier (or even trivial), some hard things easier but some things they just get in they way of IMHO ;) Learn the methodology and principles first and get a good understanding of them and then you will know which tools to apply from your kitbag and when...
It hasn't been updated in a while but there's a nice Data Warehousing/ETL Ruby package called ActiveWarehouse.
But I would check out the Pentaho products like Nick mentioned in another answer. It should easily handle the volume of data you have and may provide you with more ways to slice and dice your data than you could have ever imagined.
The best framework you can currently get is Anchor Modeling.
It might look quite complex because of it's generic structure and built-in capability to historize data.
Also modeling technique is quite different than ERD.
But you end-up with sql code to generate all db objects including 3NF views and:
insert/update handled by triggers
query any point/range in history
you application developers will not see underlying 6NF anchor model.
The technology is open sourced and at the moment is unbeatable.
If you would have AM question you may want to ask on that tag anchor-modeling.
Kimball is the simpler method for data warehousing.
We use Informatica for moving data around, but it doesn't do DW things like indexing by default.
I like the idea of Wherescape RED, as a DW tool and using MS SQL's Linked Servers to obviate the need for an ETL tool.