When to use array or not to use them in mongodb

When to use array or not to use them in mongodb - mongodb

I am working to my very first application in Symfony2/mongodb, I have to store articles and these articles have tags, keywords and related images. At the moment I am storing these information like that:
"category" : [
"category1",
" category2",
" category3"
],
but also I saw a few examples saying to do
"category" : "category1, category2, category3",
so I was guessing which one is the best way to do it?

It's a very bad idea to use string when you actually need an array. If you want to search documents by tag, you definitely need an array. But strings are usefull, when you need text search (for example, searching a word with it forms in sentences).

If you use array, then you will have the following advantages:
You can access each item directly by index.
You can perform queries directly on the array using operators like $in, $nin and $elemMatch
If you use a string, then you will have to:
Split by , in order to do any looping
User text based searching in query, which is slow
One thing you need to keep in mind regarding arrays inside a MongoDB document is that it should not be too large. Arrays can get very large, and if it pushes the size of the document beyond 16 MB, it will cause issues, as 16 MB is the maximum allowed size for a single document.
In that use case, you can split off the contents of your array into a separate collection and created references.

Related

MongoDB fulltext search + workaround for partial word match

Since it is not possible to find "blueberry" by the word "blue" by using a mongodb full text search, I want to help my users to complete the word "blue" to "blueberry". To do so, is it possible to query all the words in a mongodb full text index -> that I can use the words as suggestions i.e. for typeahead.js?

Language stemming in text search uses an algorithm to try to relate words derived from a common base (eg. "running" should match "run"). This is different from the prefix match (eg. "blue" matching "blueberry") that you want to implement for an autocomplete feature.
To most effectively use typeahead.js with MongoDB text search I would suggest focusing on the prefetch support in typeahead:
Create a keywords collection which has the common words (perhaps with usage frequency count) used in your collection. You could create this collection by running a Map/Reduce across the collection you have the text search index on, and keep the word list up to date using a periodic Incremental Map/Reduce as new documents are added.
Have your application generate a JSON document from the keywords collection with the unique keywords (perhaps limited to "popular" keywords based on word frequency to keep the list manageable/relevant).
You can then use the generated keywords JSON for client-side autocomplete with typeahead's prefetch feature:
$('.mysearch .typeahead').typeahead({
name: 'mysearch',
prefetch: '/data/keywords.json'
});
typeahead.js will cache the prefetch JSON data in localStorage for client-side searches. When the search form is submitted, your application can use the server-side MongoDB text search to return the full results in relevance order.

A simple workaround I am doing right now is to break the text into individual chars stored as a text indexed array.
Then when you do the $search query you simply break up the query into chars again.
Please note that this only works for short strings say length smaller than 32 otherwise the indexing building process will take really long thus performance will be down significantly when inserting new records.

You can not query for all the words in the index, but you can of course query the original document's fields. The words in the search index are also not always the full words, but are stemmed anyway. So you probably wouldn't find "blueberry" in the index, but just "blueberri".

Don't know if this might be useful to some new people facing this problem.
Depending on the size of your collection and how much RAM you have available, you can make a search by $regex, by creating the proper index. E.g:
db.collection.find( {query : {$regex: /querywords/}}).sort({'criteria': -1}).limit(limit)
You would need an index as follows:
db.collection.ensureIndex( { "query": 1, "criteria" : -1 } )
This could be really fast if you have enough memory.
Hope this helps.

For those who have not yet started implementing any database architecture and are here for a solution, go for Elasticsearch. Its a json document driven database similar to mongodb structurally. It has "edge-ngram" analyzer which is really really efficient and quick in giving you did you mean for mis-spelled searches. You can also search partially.

In MongoDB, is one big $or search faster than multiple single searches?

I have a list of about 50 tags in an array, and want to search through my documents to find records that match these tags.
Because they're user-submitted and mongoDB is case-sensitive, I'm using /wildcard/i as a means of searching. I know this is not the fastest way to do a search but I can't think of a better solution.
I can do my query in two ways. The first is to run a for loop over my tags array, and for each result, perform:
db.collection.find({tags: /<tag[x]>/i})
Or, I can collect all of the tags and run one single lookup using $or, like so:
db.collection.find({$or:[{tags:/<tag1>/i},{tags:/<tag2>/i},{tags:/<tag3>/i}, ... {tags:/<tag50>/i}]});
I have tried both, and found using $or to be significantly faster - but because of the work-in-progress state of my application, it's very difficult to tell whether this is because it's actually faster or whether my app is causing significant overhead in other areas (it is).
So for clarification, in MongoDB is a big query performed once faster than small queries performed many times?
EDIT: Another example would be whether looking up 3 individual records based on _id is faster than doing one lookup using {$or:[{_id: ObjectId([id1])},{_id: ObjectId([id2])},{_id: ObjectId([id3])}]}. Is less more?

I recommend you adjust your schema so it keeps a normalized array of tags. When you insert a new document, do it like this:
tags : [ "business", "Computing", "PayPal" ],
lowercaseTags : [ "business", "computing", "paypal" ]
Similarly when you update the tags, update both arrays.
Create an index on lowercaseTags, and then when you want to query them, use a single query with the $in operator, and the normalized form of the search terms.
For example, to search for business iTunes YouTube, use this query:
db.collection.find( { tags : $in: [ "business", "itunes", "youtube" ] } )
This answer gives an example of this approach. It should be loads faster than what you have.
An alternate approach you can take is to create a text index and use the text command.
Both of these approaches are geared toward index optimization, and designing your schema to work well with Mongo. The payoff should be a lot higher than whatever difference there is between a single $or query and 50 simpler queries.

How can I filter by the length of an embedded document in MongoDB?

For example given the BlogPost/Comments schema here:
http://mongoosejs.com/
How would I find all posts with more than five comments? I have tried something along the lines of
where('comments').size.gte(5)
But I'm getting tripped up with the syntax

MongoDb doesn't support range queries with size operator (Link). They recommend you to create a separate field to contain the size of the list that you increment yourself.
You cannot use $size to find a range of sizes (for example: arrays with more than 1 element). If you need to query for a range, create an extra size field that you increment when you add elements.

Note that for some queries, it may be feasible to just list all the counts you want in or excluded using (n)or conditions.
In your example, the following query will give all documents with more than 5 comments (using standard mongodb syntax, not mongoose):
db.col.find({"comments":{"$exists"=>true}, "$nor":[{"comments":{"$size"=>4}}, {"comments":{"$size"=>3}}, {"comments":{"$size"=>2}}, {"comments":{"$size"=>1}}, {"comments":{"$size"=>0}}]})
Obviously, this is very repetitive, so it only makes sense for small boundaries, if at all. Keeping a separate count variable, as recommended in the mongodb docs, is usually the better solution.

It's slow, but you could also use the $where clause:
db.Blog.find({$where:"this.comments.length > 5"}).exec(...);

Mongo query for number of items in a sub collection

This seems like it should be very simple but I can't get it to work. I want to select all documents A where there are one or more B elements in a sub collection.
Like if a Store document had a collection of Employees. I just want to find Stores with 1 or more Employees in it.
I tried something like:
{Store.Employees:{$size:{$ne:0}}}
or
{Store.Employees:{$size:{$gt:0}}}
Just can't get it to work.

This isn't supported. You basically only can get documents in which array size is equal to the value. Range searches you can't do.
What people normally do is that they cache array length in a separate field in the same document. Then they index that field and make very efficient queries.
Of course, this requires a little bit more work from you (not forgetting to keep that length field current).

How to search in a Collection with unknown structure?

I spent several hours reading through docs and forums, trying to find a solution for the following problem:
In A Mongo database, I have a collection with some unstructured data:
{"data" : "some data" , "_id" : "497ce96f395f2f052a494fd4"}
{"more_data" : "more data here" ,"recursive_data": {"some_data": "even more data here", "_id" : "497ce96f395f2f052a4323"}
{"more_unknown_data" : "string or even dictionaries" , "_id" : "497ce96f395f2f052a494fsd2"}
...
The catch is that the elements in this collections don't have a predefined structure and they can be unlimited levels.
My goal is to create a query, that searches through the collection and finds all the elements that match a regular expression( in both the keys and the values ).
For example, if I have a regex: '^even more' - It should return all the elements that have the string "even more" somewhere in the structure. In this case - that will be the second one.

Simply add an array to each object and populate it with the strings you want to be able to search on. Typically I'd lowercase those values to make case-insensitive search easy.
e.g. Tags : ["copy of string 1", "copy of string 2", ...]
You can extend this technique to index every word of every element. Sometimes I also add the field with an identifier in front of it, e.g. "genre:rock" which allows searches for values in specific fields (choose the ':' character carefully).
Add an index on this array and now you have the ability to search for any word or phrase in any document in the collection and you can search for "genre:rock" to search for that value in a specific field.

Ever if you will find a way to do this you will still face problem of slow searches, becouse there are no indexes
I had similar problem and solution was to create additional database(on same engine or any other more suitable for search) and to fill it with mongo keys and combined to one text field data. And to update it whenever mongodb data updates.
If it suitable you can also try to go this way...At least search works very fast. (I used postgresql as search backend)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

When to use array or not to use them in mongodb - mongodb

It's a very bad idea to use string when you actually need an array. If you want to search documents by tag, you definitely need an array. But strings are usefull, when you need text search (for example, searching a word with it forms in sentences).

Related

MongoDB fulltext search + workaround for partial word match

In MongoDB, is one big $or search faster than multiple single searches?

How can I filter by the length of an embedded document in MongoDB?

Mongo query for number of items in a sub collection

How to search in a Collection with unknown structure?

Categories

Resources