mongodb retrieve slice of e text - mongodb

Is there something like "$slice" in mongodb, that retrieves slice of a text feild, instread of array?
I mean as we can get slice of comments in this way:
db.posts.find({}, {comments:{$slice: 5}}) // first 5 comments
get slice of descriptions, in some way like this:
db.posts.find({}, {description:{$slice: 100}}) // first 100 chars
thanks

MongoDB's $slice operator only applies to arrays, not text fields. If you want to trim strings you'll have to take care of this in the display code for your application (or possibly save a "trimmed" version of the field for display).
Note that if you are truncating text like a comment or description, the usual practice is to truncate to the nearest whole word (so the logic is a bit more involved than a simple # of characters).
eg: How to Truncate a string in PHP to the word closest to a certain number of characters?.

Related

How to ignore hyphens in mongodb docs values

Hi my problem is that I have a collection with data, in the data I have values string with hyphens between words
example: "item:'e-commerce'"
my question is if there any options to set mongo to ignore the hyphens when I query string,
example to query: value to search "e commerce" and the result should be "item:'e-commerce'", the worst solution is to do Normalization to the collections without hyphens.
Normalizing this field sounds like the way to go here.
Another (likely bad) tack would be using a collation that set alternate to "shifted" so that you ignore all whitespace & punctuation while doing comparisons.

Difference between wildcard search and individual text search

Is there a difference between a wildcard search index like $** and text indexes that I create for each of the fields in the collection ?
I do see a small difference in response time when I individually create text indexes. Using individual indexes, returns a better response. I am not able to post an example now, but will try to.
A wildcard text search will index every field that contains string data for each document in the collection (https://docs.mongodb.com/manual/core/index-text/#wildcard-text-indexes).
Because you are essentially increasing the number of fields indexed with a wild card text index, it would take longer to run compared to targeting specific fields for a text index.
Since you can only have one text index per collection (https://docs.mongodb.com/manual/core/index-text/#create-text-index), its worth considering which fields you plan on querying against beforehand.

When to use array or not to use them in mongodb

I am working to my very first application in Symfony2/mongodb, I have to store articles and these articles have tags, keywords and related images. At the moment I am storing these information like that:
"category" : [
"category1",
" category2",
" category3"
],
but also I saw a few examples saying to do
"category" : "category1, category2, category3",
so I was guessing which one is the best way to do it?
It's a very bad idea to use string when you actually need an array. If you want to search documents by tag, you definitely need an array. But strings are usefull, when you need text search (for example, searching a word with it forms in sentences).
If you use array, then you will have the following advantages:
You can access each item directly by index.
You can perform queries directly on the array using operators like $in, $nin and $elemMatch
If you use a string, then you will have to:
Split by , in order to do any looping
User text based searching in query, which is slow
One thing you need to keep in mind regarding arrays inside a MongoDB document is that it should not be too large. Arrays can get very large, and if it pushes the size of the document beyond 16 MB, it will cause issues, as 16 MB is the maximum allowed size for a single document.
In that use case, you can split off the contents of your array into a separate collection and created references.

MongoDB - Difference between index on text field and text index?

For a MongoDB field that contains strings (for example, state or province names), what (if any) difference is there between creating an index on a string-type field :
db.ensureIndex( { field: 1 } )
and creating a text index on that field:
db.ensureIndex( { field: "text" }
Where, in both cases, field is of string type.
I'm looking for a way to do a case-insensitive search on a text field which would contain a single word (maybe more). Being new to Mongo, I'm having trouble distinguishing between using the above two index methods, and even something like a $regex search.
The two index options are very different.
When you create a regular index on a string field it indexes the
entire value in the string. Mostly useful for single word strings
(like a username for logins) where you can match exactly.
A text index on the other hand will tokenize and stem the content of
the field. So it will break the string into individual words or
tokens, and will further reduce them to their stems so that variants
of the same word will match ("talk" matching "talks", "talked" and
"talking" for example, as "talk" is a stem of all three). Mostly
useful for true text (sentences, paragraphs, etc).
Text Search
Text search supports the search of string content in documents of a
collection. MongoDB provides the $text operator to perform text search
in queries and in aggregation pipelines.
The text search process:
tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query.
The $text operator can search for words and phrases. The query matches
on the complete stemmed words. For example, if a document field
contains the word blueberry, a search on the term blue will not match
the document. However, a search on either blueberry or blueberries
will match.
$regex searches can be used with regular indexes on string fields, to
provide some pattern matching and wildcard search. Not a terribly
effective user of indexes but it will use indexes where it can:
If an index exists for the field, then MongoDB matches the regular
expression against the values in the index, which can be faster than a
collection scan. Further optimization can occur if the regular
expression is a “prefix expression”, which means that all potential
matches start with the same string. This allows MongoDB to construct a
“range” from that prefix and only match against those values from the
index that fall within that range.
http://docs.mongodb.org/manual/core/index-text/
http://docs.mongodb.org/manual/reference/operator/query/regex/
text indexes allow you to search for words inside texts. You can do the same using a regex on a non text-indexed text field, but it would be much slower.
Prior to MongoDB 2.6, text search operations had to be made with their own command, which was a big drawback because you coulnd't combine it with other filters, nor treat the result as a common cursor. As of now, the text search is just another another operator for the typical find method and that's super nice.
So, Why is a text index, and its subsequent searchs faster than a regex on a non-indexed text field? It's because text indexes work as a dictionary, a clever one that's capable of discarding words on a per-language basis (defaults to english). When you run a text search query, you run it against the dictionary, saving yourself the time that would otherwise be spent iterating over the whole collection.
Keep in mind that the text index will grow along with your collection, and it can use a lot of space. I learnt this the hard way when using capped collections. There's no way to cap text indexes.
A regular index on a text field, such as
db.ensureIndex( { field: 1 } )
will be useful only if you search for the whole text. It's used for example to look for alphanumeric hashes. It doesn't make any sense to apply this kind of indexes when storing text paragraphs, phrases, etc.

How can I filter by the length of an embedded document in MongoDB?

For example given the BlogPost/Comments schema here:
http://mongoosejs.com/
How would I find all posts with more than five comments? I have tried something along the lines of
where('comments').size.gte(5)
But I'm getting tripped up with the syntax
MongoDb doesn't support range queries with size operator (Link). They recommend you to create a separate field to contain the size of the list that you increment yourself.
You cannot use $size to find a range of sizes (for example: arrays with more than 1 element). If you need to query for a range, create an extra size field that you increment when you add elements.
Note that for some queries, it may be feasible to just list all the counts you want in or excluded using (n)or conditions.
In your example, the following query will give all documents with more than 5 comments (using standard mongodb syntax, not mongoose):
db.col.find({"comments":{"$exists"=>true}, "$nor":[{"comments":{"$size"=>4}}, {"comments":{"$size"=>3}}, {"comments":{"$size"=>2}}, {"comments":{"$size"=>1}}, {"comments":{"$size"=>0}}]})
Obviously, this is very repetitive, so it only makes sense for small boundaries, if at all. Keeping a separate count variable, as recommended in the mongodb docs, is usually the better solution.
It's slow, but you could also use the $where clause:
db.Blog.find({$where:"this.comments.length > 5"}).exec(...);