I want to learn more about MongoDB's GridFS so I had a look at the manual.
It says:
files.aliases
Optional. An array of alias strings.
I know that this field may contain an array of strings, but what are the values inside this array used for? Alternative filenames?
Yes. For instance, from the MongoDB csharp driver source code (MongoGridFSFileInfo.cs, ln 474):
// copy all createOptions except Aliases (which are considered alternate filenames)
However, it's rather unclear what the semantics of this field are. The csharp driver, for instance, won't look for aliases when you search by name. As far as I can see, there's not even an index on aliases, so searching on that field is practically impossible.
In general, keep in mind that GridFS is a mere concept of how to store large files in MongoDB - the implementation isn't special in any way - it's just regular collections and conventions, plus the command line tools. While the general idea of GridFS is neat, it does come with a lot of assumptions and conventions that you might not want to deal with, and that can be painful to work with in statically typed languages. You can always build your own GridFS with a different fieldset, though.
Related
I have read a few articles recently on the combination of mongodb for storage and elasticsearch for indexing/search. I feel like I'm missing something though. Why would you go this route as opposed to just using mongo to index the data? What benefits does elasticsearch bring and is it worth the added complexity?
ElasticSearch implements a lot more features, such as custom splitting of text into words, custom stemming, facetted search and a whole lot more. While MongoDB's (rather simple) text search does some of this, it is not nearly as powerful as ElasticSearch.
If all you ever do is look for a single string in a single field, then MongoDB's normal query system will work excellently for that. If you need to look for words in multiple fields, then MongoDB's text search will work. If you need anything more than that, ElasticSearch is the way to go.
A search engine and a database do some fundamentally different things. A good search engine (like ElasticSearch) supports far more elaborate and complex indexing, facets, highlighting etc. In the case of ElasticSearch, you also get your replies 'real-time'. On the other hand, a search engine doesn't return every single document that matches your query. Instead, it will score documents according to how much they match, and return the top scoring ones. When you query a database such as MongoDB, you should expect it to return everything that matches your query.
You can store the entire document in ElasticSearch, but it is usually not the optimal solution. Normally you will have it configured to return the document id's, which you use to fetch the document from a database. MongoDB is a database optimized for document based storage. this is why you hear about people using them together.
edit:
When this was posted, it matched the recommendations, but this may no longer be the case.
Derick's answer pretty much nails it. The questions behind all this is:
What are the features you want to implement in your application?
If you rely on heavy searching capabilities in large chunks of text, ElasticSearch is probably a good thing to use. If you want to have a flexible datastore that can cope with complex ad-hoc queries, Mongo might be a good fit. If you have different requirements for a datastore, it is often a good thing to combine two tools instead of implementing all kind of workarounds to make it work with just one datastore.
Choose the right tool for the job.
I am still new on the whole area of MongoDB systems.
I was wondering whether anyone of you knows if MongoDB is declarative or navigational when it comes to accessing objects within a document?
What I mean is:
-> Declarative: a pattern is given and the system works out the result. In other words, it works in the same way as SPJ queries
-> Navigational: it always starts from the beginning of a document and continues from there
SPJ (Select-Project-Join) is more than just accession of documents/rows, it is the entire process of forming a result set for queries.
I am unsure how the two: "Declarative" and "Navigational" are compatible. You talk about "Declarative" being the formation of a result set but then "Navigational" being related to accessing a document, i.e. reading the beginning of it.
I will answer what I believe your question to be about, the access patterns of documents.
I believe MongoDB leaves reading up to the OS itself (might use some C++ library to do its work for it) as such it starts from the beginning of a pointer (i.e. "Navigational") however, as to how it actually reads document does not really matter.
Here is why, MongoDB does not split a document up into many pieces to be stored on the hard disk, instead it stores a document as a single "block" of space. So I am going to turn around and say you probably need to look at the code to be sure exactly how it reads, though I don't see the point and here is a presentation that will help you understand more about the internals of MongoDB: http://www.mongodb.com/presentations/storage-engine-internals
Mongodb's dot notation feature is quite cool, but I find that it seems not work on referred documents.
So dot notation can only be used in embedded arrays or subdocuments, am I right?
Thank you.
Dot notation may only be used to query within the fields of a single document (or sub-document). It cannot be used to refer/query on other documents.
While there are times where I thought it would be an interesting feature to add, it's generally easy to create an alternative schema design that doesn't require it for "queryability" or performance.
I would really like to use the dot notation to create namespaces for my MongoDB collections' names.
For example:
users
users.admins
users.developers
Is this a bad idea ?
Are there any potential problems, drawbacks or limitations when doing this ?
There is nothing wrong with this. In fact, this book recommends the use of dot notation in collections on page 8. It refers to these as subcollections; however, that term doesn't seem to be in broad use.
It's important to realize that the 3 collections you've listed in your question are 3 distinct collections with no relationship to each other except for their naming. The dot notation does not do anything in terms of MongoDB functionality.
It is useful for organization though, and the collections list nicely when sorted alphabetically. So in summary there are no any drawbacks or potential problems and you gain an advantage as your collection names are better organized.
The documents in my database have names and descriptions among other fields. I would like to allow the users to search for those documents by providing some keywords. The keywords should be used to lookup in both the name and the description field. I've read the mongoDB documentation on full text search and it looks really nice and easy if I want to search for keywords in the name field of my documents. However, the description field contains free form text and can take up to 2000 characters, so potentially there are a few hundred words per document. I could treat them the same way as names and just split the whole description into separate words and store it as another tag-like array (as per the Mongo example), but it seems like a terrible idea - each document's size could be almost doubled, plus there are characters like dots, commas, etc.
I know there are specialized solutions for exactly this kind of problems and I was just looking at Lucene.Net, I also saw Solr mentioned here and there.
Should I be looking to implement this search feature in mongoDB or should I use a specialized solution? Currently I just have one instance of mongod and one instance of a web server. We might need to scale later, but for now that is all I use. I'd appreciate any suggestions on how to implement this feature.
If storing the text split out into an array per the documented approach is not viable (I can understand your concerns), then I think you should look into a specialised solution.
Quote from the MongoDB documentation:
MongoDB has interesting functionality
that makes certain search functions
easy. That said, it is not a dedicated
full text search engine.
So, for more advanced full text search functionality I think a dedicated engine would be more suited. I have no experience in this area so I can't offer much in the way of suggestions from here, other than what my thoughts would be if I was in the same boat:
how much work involved in using a dedicated full-text search engine instead of MongoDB's functionality?
does that add more complexity / is it worth it?
would it be quicker/simpler to use MongoDB and just take the hit on the extra disk space?
maybe MongoDB will support better full-text functionality in future (it is rapidly evolving after all)
Fulltext search support is planned for the future. However right now you have to go with Solr & friends. Using the built-in "fulltext" functionality is not really suitable for real world usage.