mongodb and sql for document versioning system - mongodb

I'm a computer engineer student and i've got a basically simple assignment but i'd like to make it a little more interesting :). Basically I have to create a simple multiuser, online document versioning system using Java or .NET. Because I'm more a .NET guy I'm gonna use ASP.NET or Silverlight (it's not yet decided).
Anyway the intersting part: I want to use MongoDB to store the documents (they can be virtually anything, video, audio, simple MSOffice files, plain text), each version of them and the related metadata (which library, folder they're in etc). I'd like to put anything else, like users, permissions etc.(the specs are not yet completed) in to an ordinary relational database. The question is what do you think about this? Do you think this makes any sense or I'm just trying to over complicate the whole thing? Would it be simpler to use only mongodb for everything or leave the nosql thingy entirely out of it? Is there any conventions for this kind of stuff? :)
BTW. under any other circumstences I'd use some free, battle tested solution for doc versioning, but I have to design and implement this myself, and I'm trying to do it at least a little unconventionally :).
Thanks for every comment:), any help appreciated:)
Greets

If you were using something like Mongoid on Rails, this would be trivial. http://mongoid.org/docs/extras.html Look for Versioning. The ODM has it built in for you. I've yet to use Mongo in the .NET world, it's always been something like Raven, so I'm not sure if the libraries for it easily work like Mongoid does.
Mongo handles relational data as well. Things you want normalized (e.q. user records) can have references to other documents in the DB. And of course what ever magic document you want to store works as well.

Doing things unconventionnaly is usually a good way to discover new ways (better ways ?) so it's a good idea.
Using mongodb is a good idea too because of the schemaless property.
USing a relationa database for storing extra information is quite weird and will add more complexity without any advantage (IMO)

Related

NoSQL (e.g. MongoDB) or RDMS (e.g. PostgreSQL) for new Scala project?

I'm developing a brand new project in Scala. It's just an application for a bunch of CRUD operations, however, because of some eccentric requirements, Play2 or Lift does not fit the bill, so I'm going to develop the application from the ground up. This means that Anorm or ScalaQuery becomes less obvious choices for database integration, and leaves me with the question: is it time to try something new?
My past technology stacks mostly included Java and PostgreSQL and I have experience with both ORM and plain SQL. Are NoSQL database management systems like MongoDB a good replacement for a typical RDBMS or are they special case application data stores? Also, how does the choice of database effect the greater Scala system design (if at all)? For example, the fact that you are using a JSON-like interface to talk to the database, and JSON between the web and a REST service, does not mean that much if everything in the middle becomes Scala objects, or does it?
I'm basically asking for someone's experience on moving from relational to object/document type databases, using Scala in particular. I know that good RDBMS integration is promised in the upcoming release of SLICK. So, if a company like TypeSafe decides to make a RDBMS integration part of the TypeSafe stack, then will I be swimming upstream by integrating to MongoDB using Casbah for example?
Apologies if this question appears a bit vague. I do hope that someone with the right insights or experience will be able to help though.
Update:
Apologies for not adding links to SLICK (it being fairly new). Here goes:
Quick overview
Project home
Update 2:
My personal first win for a technology is usually developer productivity - this translates to lightweight and simple: quick to learn, easy to maintain, no magic
I am currently in a similar situation, and since I have some experience with web development and SQL databases, I took it as an opportunity to work with MongoDB, Cashbah (and Scalatra). My experience is still very limited and the project and the amount of data I am working with is pretty small, but here are a few observations I've made.
For the few sets of data I have, performance does not seem to motivate either SQL or NoSQL. However, performance in the presence of huge amounts of data is often listed as a reason for using NoSQL, e.g., by Wikipedia
My documents (entries in the database) arise from benchmarking test suits, and mainly have a static structure, and I am optimistic that I could store them in a fixed-schema SQL database. However, a few substructures are not static, e.g., new test cases are added, new statistics are tracked, others are removed. This was my main motivation for trying a schema-free NoSQL database. Also, because I had the feeling that the document approach of MongoDB makes it much more obvious which data belongs together (i.e., to a document), in contrast to entries in a relational database, where the data would be distributed over various tables and rows, and where a full "document" would need to be reconstructed by joins.
Tools such as Lift-Json or Rogue allow you to work with regular Scala objects in a type-safe, although the data is regularly (de-)serialised as (from) JSON. However, this naturally works best if the structure of your data is mainly static, otherwise, you you are left with using strings to access your data (e.g., for expanding the results of a query using Cashbah).
If you are mainly concerned about a coherent representation of data on server and client side, languages such as Opa or Haxe might be of interest, since they compile to code that can executed on both sides. See this page for "multitarget" or "tierless" languages.
Got too long for a comment. Was just trying to relate my short experience with Scala (about 6 months now, since about when Play2 came out--it's quickly become my go to language).
I've enjoyed using Salat/Casbah with MongoDB in my last few projects; most have been in Play2, but the latest was without a webapp framework. It definitely hasn't felt like swimming upstream.
I would say that there are particular use cases for which I wouldn't use mongo, but it works nicely as a general purpose object data store, especially if you expect to query by id or index and don't need transactions (and will need minimal ad-hoc aggregation type stuff).
Expect to require a separate set of servers dedicated to mongodb (or to use a service dedicated to mongodb), but I guess that's normal for most serious database apps.
I've also used Play2/Anorm, which was surprisingly enjoyable to use for some ad-hoc query dashboard-style report pages. I started trying to go the Squeryl route, but Anorm seemed easier to use for one-off aggregation queries. Haven't looked at SLICK, but it sounds interesting.
It's really hard to say without knowing what problems you would like the app to solve.
I've personally found my productivity increased using NoSQL DBs via REST/JSON. Though bear in mind most NoSQL DBs offer REST interfaces which preclude the need for much middleware, Scala or otherwise, unless you intend to write a webapp with a UI.
If this is a learning exercise, I recommend you try multiple things out, as each NoSQL DB has something different to offer to your toolkit, and have personally found CouchDB, Riak, Neo4j, and MongoDb all with various pluses and drawbacks and good for different purposes.
Hope this helps, good luck.

are adhoc queries/updates starting to kill your productivity with MongoDB?

i've been developing a asp mvc website for almost a year now exclusively on mongodb. i've loved it for the most part. development productivity has been great using a C# mongodb driver and tools like mongovue.
however, i've started to reach a point where there are things i really wish i had a SQL server database for. simple tasks like updating a record in the DB and only mildly complex queries to generate some type of report are becoming a pain.
i read an article somewhere that in order for NOSSQL to succeed there needs to be a standard query language for it, and tools developed around it. i'm guessing this is far far away, so right now i'm stuck trying to deal with these things.
i think eventually i will have to have a dual solution with monogDB and sql server. i don't think i will ever get to the point where i am as productive updating and writing queries for mongoDB as i was with sql server.
how are you guys dealing with this when using NOSQL like mongodb? are you facing the same issues as me?
One solution you may consider is LINQPad. You can set up a template with a reference to 10Gen's drivers and write ad-hoc, C# MongoDB queries like you would in your code. My team and I use this method to address the very problem you mention.
Try it out (it's free) and see if it can help with the simple, day-to-day queries you come up with.
Edit I also support Chris's suggestion of familiarizing yourself with the native JSON query language. Nothing beats a quick console window for speed, if you know the syntax.
The official C# driver will probably get a LINQ provider some time in the future, so that'd give .NET devs a familiar syntax for querying and maybe help with initial productivity. There're also some nice docs that help relate MongoDB queries back to SQL:
SQL to Mongo Mapping Chart
SQL to MongoDB (PDF)
These are great for learning, but to get the most out of Mongo it's well worth investing time getting used to the native JSON query syntax and Mongo-specific concepts like map-reduce.
Since your questions asks,
how are you guys dealing with this when using NOSQL like mongodb?
I thought I'd chime in. I felt your pain when working with another NOSQL database, RavenDB.
I wrote a Linqpad driver specifically for ad hoc interactions with RavenDB.
https://github.com/ronnieoverby/RavenDB-Linqpad-Driver

How to model multilingual database with Zend, l18n mysql?

I know this topic was discusses a couple of times, but none of them represents the ultimate solution for me.
Situation
I'm designing a relational mysql database which later should hold multilingual content. You know this from the Wikipedia or Microsoft Tech Support Pages. The contents should be the same for every language. e.g If translations are missing the site offers you the same content automatically translated or in the languages which the information is available in. If some values are not set, it should fallback to the second or default browser language or translate it e.g. through google. Development environment is Zend.
My ideas so far are for Solving the Problem:
Two Primary Keys: (ID, Language)
Advantage: Easy Database Access through database abstraction layers.
Problem: Foreign Keys, Relations ships, Fallbacks
Columns with language suffix:
Advantage: DB Performance, No relational Problems.
Problem: Database abstraction layers cannot handle this?
Has any concept proven itself or is preferable over the other? Has anyone already created something like this and can share his experience with me? Does a modified Zend DB Controller exist for this situation? How do you link this information to a form?
Thank you for your help, hints and suggestions!
Kind regards,
Manuel
The second option would be not maintainable (this should be added on the minuses side). To actually add another language you'll need to modify table and abstraction layers. Sounds like a nightmare.
The first option seems much more promising but unfortunately there is a lot to do to make it work. However, from my experience this is rather typical solution, so I would not reinvent the wheel.
What I have to add is, language fallback should be done on the Zend side, database would miss some information. You may think of some kind of index table to hold information such as unique id of the contents and available languages. If you need to serve something, you would read such record, compare it against of Accept Languages and ask database again for valid contents (using the most suitable language). The only problem is, you would need to create such an index table somehow (the best way I see would be trigger on inserting contents to your content table).
A lot of work but the problem is not too easy.
I am working on the exact same problem right now.
Somehow it does not make sense to me to add everything into the same database. Lets say I want to go to the extreme and support some 50 languages this would just bloat my DB. So, I tend to keep my main DB in my main language and then introduce some Zend_Translate concept into it. Zend_Translate should give you the fallback solution you are looking for. While the main navigation and core design is not much of a problem for my web site my biggest concern right now is how to store all the main content and how to translate because these elements contain HTML among other things. For the main content I will probably use some alternate approach and use a separate DB with tables for each language.
My plattform will be a community driven database. So I actually gonna rely on humans translating it. You have to store the information anyways, so my first concern is not the database size or performance, but easy usability. So far my idea is to implement some structure as described above, not yet sure if i'll do it in doctrine or not.
Language decision:
Start, application gets users preset language, secondary language, english mother-tong of the article. Fetching the article from the database I will check the following for every column: 1. is the primary language available? 2. Is the secondary language available? 3. If neither of them, display article in mother-tong or english and offer the user to translate it with suggestions from the google translate api. I guess it's gonna be quite a bit of coating and manipulating controllers or building a business model doing this.
#tawfekov is something like this or similar easily realizable with doctrine?

MongoDB as the primary database?

I have read a lot of the MongoDB.
I like all the features it provides, but I wonder if it's possible to have it as the only database for my application, including storing sensitive information.
I know that it compromises the durability part in ACID but I will as a solution have 1 master and 2 slaves in different locations.
If I do that, is it possible to use it as the primary database, storing everything?
UPDATE:
Lets put it this way.
I really need a document storage rather than traditional dbms for be able to create my flexible application. But is MongoDB reliable enough to store customer sensitive information if I have multiple database replications and master-slave? Cause as far as I know one major downside is that it compromises the D in ACID. So I solve it with multiple databases.
Now there is not major problems such as lost of data issues?
And someone told me that with MongoDB a customer could be billed twice. Could someone enlighten this?
Yes, MongoDB can work as an application's only data store. Use replication and make sure you know about safe writes and w.
And someone told me that with MongoDB a customer could be billed
twice. Could someone enlighten this?
I'm guessing whoever told you that was talking about eventual consistency. Some NoSQL databases are eventually consistent, meaning that you could have conflicting writes. MongoDB is not, though, it is strongly consistent (just like a relational database).
Your application being flexible or not has absoutely nothing to do with wether you use "nosql", a "document db" or a proper RDBMS. Nobody using your application will care either way.
If you need flexibility while coding, you should research into frameworks, like ActiveRecord for Ruby, which can make DB-interfacing much more simple, generic and powerful. At that level, you can gain alot more than just changing the DB, and you can even become DB-agnostic, meaning you can change DB without changing any code. Indeed, I have found ActiveRecord to boost my productivity many many fold by alleviating me from tedious and error-prone "code intermixed with SQL".
Indeed, if you feel you need a schemaless database, for critical data, you are doing something wrong. You are putting your own convenience over critical needs of the projects, or in ignorance thinking you won't get into problems later. Sooner or later, lack of consistency will bite your ass, hard!
I feel you are hesistant towards RDBMS because you are not really that comfortable with all the jargons, syntax and sound CS principles.
Believe me, if you're going to create an application of any value, you are hundred times better learning SQL, ACID and good database-principles in the first place. Just read up on the basics, and build your knowledge from wherever you are now. It's the same for each and every one of us, it takes time, but you learn to do things right from the start.
Low-level tools like MongoDB and equivalent just provide you with infinitely more ammunition to shoot yourself in the foot with. They make it seem easy. In reality however, they leave the hard work for you, the coder, to deal with, while an RDBMS will handle more of the cruft for you once you grok the basics.
Why use computers at all, if you want more work, you can just go back to paper. Design will be a breeze, and implementation can be super-quick. Done. Except it won't be right of course.
In the real world, we can't afford to ignore consistency, database design and many more sound CS principles. Which is why it's a great idea to study them in the first place, and keep learning more and more.
Don't buy into the hype. You ask question about MongoDB here, but include that you really need its features. With 25 years of computer experience, I simply don't buy it. If you think creatively, an RDBMS can be made to do whatever you want it to, or a framework can be utilized to save you from errors and wasted time.
Crafting ACID properties onto MongoDB seems like more work to me, and by experience, sounds like an excercise in futility, rather than using what is already designed to suit such purposes.
Is it possible? Sure. Why not? It's possible to store everything as XML in text files if you wanted to.
Is it the best idea? It depends on what you're trying to do, the rest of your system architecture, the scale your site is intended to achieve, and so on.

What SQLite library do you recommend on the iPhone/iPad?

A few people have wrapped the SQLite library or provided alternatives. What are their relative merits?
Core Data
Yes, a bit of snarky answer.
There are three primary reasons to use SQLite directly or via a wrapper.
You are sharing databases across platforms and can't use Core Data's schema
You really do need raw SQLite performance and have the 17th level SQLite API Mastery required to actually achieve said performance advantage over SQLite.
You know SQLite inside and out, don't like learning new things, and want to re-invent the bits necessary to get between database and screen. (Slightly snarky again).
Core Data buys you a tremendous amount of functionality that is subtly very hard to do. That is, object graph management with full integrity and undo support.
bbum gave the most succinct answer and you should ponder carefully not using Core Data.
But I thought you also deserved an answer to your actual question.
There are basically two wrapper approaches I know of:
FMDB uses an approach that simply makes a much easier to use Cocoa API overlay for SQLLite. This can be great if you know SQL and database design well and just want a simple database.
Other approaches are generally more object-relational mapping systems, that attempt to give you an object view of an underlying datastore, and hide some queries from you.
In both cases if you have a really simple data store they can make sense to use if you have a specific reason... but using Core Data gives you an awful lot for free (though I'll admit the learning curve can be steep).
There is a third approach. This third approach allows you to build SQLite statements via a set of Objective-C classes. They will handle how your data is converted and sanitized. They will also ensure that your statements are correctly constructed. You can try https://github.com/ziminji/objective-c-sql-query-builder This particular library also offers Object Relational Mapping (ORM) that follows the Active Record design pattern.