Our database stores photo albums and photos.
Each album has title, tags, description.
Each photo has title, tags and description.
All I want is the ability to show 5 search results as soon as the user types a word in a search box.
Then show 50 search result per page and so on.
Which fields should I index - only title or tags (ar embedded array) or both?
What to use for best search experience - MongoDB index on filed or other type of index?
Solution must scale as the data grows.
If anyone can help me with some pointers on how to proceed, that will be great.
I am still using older version of MongoDB 1.8
Thanks
If you require only the title to be searchable as per your last comment, then you could simply use the $regex operator:
http://docs.mongodb.org/manual/reference/operator/regex/#op._S_regex
If you anchor the regular expression (i.e. /^something/) then it will even use your indexes which will be super fast.
The performance of this on a huge database is not going to be fantastic though.
Otherwise, as WiredPrairie suggests, look into the keyword search:
http://docs.mongodb.org/manual/tutorial/model-data-for-keyword-search/
Related
Having a collection of {_id: 'xxx', text: 'abc'}
What is the best way to have a list of entities with the same text, considering also spelling mistakes, for example 'gogle' 'google' ordered by the number of similar entities?
mongodb doesn't have the capability to find misspelled items. there are some thirdparty libraries/plugins that offer this feature by storing double metaphone key codes along with the original version of the text. here's an example program in c# that gets the result you want. see this page for a brief explainer on how it works. if you're not coding in c#, there's this plugin for mongoose.
I have a use case where I need to get list of Objects from mongo based off a query. But, to improve performance I am adding Pagination.
So, for first call I get list of say 10 Objects, in next I need 10 more. But I cannot use offset and pageSize directly because the first 10 objects displayed on the page may have been modified [ deleted ].
Solution is to find Object Id of last object passed and retrieve next 10 objects after that ObjectId.
Please help how to efficiently do it using Morphia mongo.
Using morphia you can do this by the following command.
datastore.find(YourClass.class).field(id).smallerThan(lastId).limit(10).order("-ts");
Since you are querying for retrieving the items after the last retrieved id, you won't be bothered to deal with deleted items.
One thing I have thought up of is that you will have the same problem as with using skip() here unless you intend to change how your interface works.
Using ranged queries like this demands that you use a different kind of interface since it is must harder to detect now exactly what page you are on and how many pages exist in the future, especially if you are doing this to avoid problems with conventional paging.
The default type of interface to arise from this type of paging is merely a infinitely scrolling page, think of YouTube video comments or Facebook wall feed or even Google+. There is no physical pagination or "pages", instead you have a get more button.
This is the type of interface you will need to use to get ranged paging working better.
As for the query #cubbuk gives a good example:
datastore.find(YourClass.class).field(id).smallerThan(lastId).limit(10).order("-ts");
Except it should be greaterThan(lastId) since you want to find everything above that last _id. I would also sort by _id unless you make your OjbectIds sometime before you insert a record, if this is the case then you can use a specific timestamp set on insert instead.
I'm looking for some recommendations on how to structure the tags part of this data model:
Here's a simplified version of it:
a Site has many Posts (relational association [references_many in mongoid speak]). A Site has a tree of tags
a Post has an array of tags (subset of the Site's tags, order doesn't matter)
The use cases I care about are:
Quickly saving & retrieving the Site's tags in tree form (ie to be able to display them as a tree in the UI)
Quickly querying which of a Site's posts have a certain tag.
Without the tree structure, http://github.com/wilkerlucio/mongoid_taggable solves my usecases. I've seen some of the acts_as_tree ports for Mongoid like:
http://github.com/benedikt/mongoid-tree
http://github.com/saks/mongoid_acts_as_tree
http://github.com/ticktricktrack/mongoid_tree
They all seem to take a relational approach, as opposed to embedded, to storing the hierarchy, which would mean that both of the use cases above would be slow (likely requiring a map/reduce).
Has anyone done anything similar, or have any advice? Ideally I'd love a Mongoid solution, but I'm happy to drop down to the Ruby driver as well.
Do you need to update the structure of the tree (i.e. move a tag to another parent)? If that is possible, the embedded approach would become difficult, and the relational/normalized approach makes more sense.
I would probably store the tags themselves in the document (embedded), but if there is any chance that I need to move tree nodes around on-line, then I'd store the hierarchy in another document. Queries need not be slow, if you first flatten the search query (according to the current tree) and then search for those tags. This approach probably does not scale to well if the flattened search query ends up having hundreds of tags in them (how tall is your tree?).
If tags cannot be moved to new parents (or only by you, during scheduled maintenance), go ahead and embed the whole hierarchy.
there are two implemented patterns of mongodb tree structure
I've implemented Lucenet.NET on my site, using it to index my products which are theatre shows, tours and attractions around London.
I want to implement a "Did you mean?" feature for when users misspell product names that takes the whole product titles into account and not just single words. For example,
If the user typed:
Lodnon Eye
I would like to auto-suggest:
London
London Eye
I assume I nead to have the analyzer index the titles as if they are a single entity, so that SpellChecker can nearest-match on the phrase, as well as the individual words.
How would I do this?
There is a excellent blog series here:
Lucene.NET
Introduction to Lucene
Indexing basics
Search basics
Did you mean..
Faceted Search
Class Reference
I have also found another project called SimpleLucene which you can use to maintain your lucene indexes whenever you need to update or delete a document. Read about it here
i've just recently implemented a phrase autosuggest system in lucene.net.
basically, the java version of lucene has a shinglefilter in one of the contrib folders which breaks down a sentence into all possible phrase combinations. Unfortunately lucene.nets contrib filters aren't quite there yet and so we don't have a shingle filter.
but, a lucene index written in java can be read by lucene.net as long as the versions are the same. so what i did was the following :
created a spell index in lucene.net using the spellcheck.IndexDictionary method as laid out in the "did you mean" section of jake scotts link. please note that only creates a spelling index of single words, not phrases.
i then created a java app that uses the shingle filter to create phrases of the text i'm searching and saves it in a temporary index.
i then wrote another method in dotnet to open this temporary index and add each of the phrases as a line or document into my spelling index that already contains the single words. the trick is to make sure the documents you're adding have the same form as the rest of the spell documents, so i ripped out the methods used in the spellchecker code in the lucene.net project and edited those.
once you've done that you can call the spellcheck.suggestsimilar method and pass it a misspelled phrase and it will return you a valid suggestion.
This is probably not the best solution and I definitely would use the answer suggested by spaceman but here is another possible solution. Use the KeywordAnalyzer or the KeywordTonenizer on each title, this will not break down the title into separate tokens but keep it as one token. Using the SuggestSimilar method would return the whole title as suggestions.
I found a great tutorial on performing a faceted search.
http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
This article does not explain how to retrieve the narrowed available attributes to filter from (for further drill down).
Lets say I am looking for planners that are red. When I perform the faceted search, I want to return all available attributes to filter from that are red. Then when I add a "weekly format" filter, I want the attribute list to get even smaller, containing only filters available for the segmented group.
I want love to use Solr/SolrNET but I am in a shared hosting situation with limited access to the actual server.
I am fairly new to lucene.net, so examples are much appreciated.
IIUC, you get a BitArray containing the list of the filtered results. In the tutorial's example, you will have combinedResults as this list. If you want to further narrow this down, you need to reiterate the process: run another searchQuery and intersect the results with the BitArray you have for combinedResults.
I want love to use Solr/SolrNET but I am in a shared hosting situation with limited access to the actual server.
You can always use an off-site, hosted Solr solution. See this question for more information.