Mongo query with regex fails when backslash\newline are present - mongodb

I'm using MongoDB with Sails.
db.post.find( { body: {"$regex" : /^.*twitter.*$/i }}).
This query is supposed to find only posts which contain 'twitter' in their body-field.
For some reason it won't find a match if a backslash is present in the field (like a newline).
Is this known? What can I do about it?
Here are two examples to showcase the problem:
This document is returned by the find-command above:
{
"_id" : ObjectId("54e0d7eac2280519ac14cfda"),
"title" : "Cool Post",
"body" : "Twitter Twitter twitter twittor"
}
This document is not returned by the find-command above:
{
"_id" : ObjectId("54e0d7eac2280519ac14cfdb"),
"title" : "Cool Post with Linebreaks",
"body" : "This is a cool post with Twitter mentioned into it \n Foo bar"
}

Is this known?
No. Its not an issue with the MongoDb search Engine.
What can I do about it?
You could change your Regular Expression to:
var regex = new RegExp("twitter","i");
db.posts.find( { body: regex});
The problem with your code is the decimal point .
According to the doc,
.
(The decimal point) matches any single character except the newline
character.
which is why the string with a newline character is not taken to be a match.

Related

Find doc from field having all specified words

I've problem to find the good document search request do find all documents containing in their 'name' field all the specified values.
I've this document:
{
"_id" : ObjectId("607c1caa4b2964d0185301ff"),
"nb" : 1,
"name" : "mini computer 24GB"
}
When I run the following find request...
db.getCollection('test').find({"$text":{$search:'computer dummy'}})
... the document is returned. An OR is done but I want a AND operation. Should I use a list of $and ?
Many thanks
Ok I found. each word must be in quotes as this:
db.getCollection('test').find({$text:{$search:"\"computer\" \"24GB\""}})
Not really intuitive!

Mongodb aggregation remove accents

Is there any possibility to remove accents from specific field in project stage or any other aggregation stage?
My input document looks like this:
{
"title" : "Está comprometido",
"value" : "3"
},
And I need this output:
{
"title" : "Esta_comprometido",
"value" : "3"
},
According to #WernfriedDomscheit suggestion. I finally solved the issue using client postprocessing. I'm using javascript as client
Here I post a jsbin example of how I solved this:
https://jsbin.com/kecikomuja/edit?html,js,output
I'm ussing normalize("NFD").replace(/[\u0300-\u036f]/g, "") to replace accents
This stackoverflow post gave me the idea https://stackoverflow.com/a/37511463/6329540

Mongo full text search doesn't find

I'm trying to implement full text search in my Mongo database. It's a database of audio tracks metadata. I wan't to search by artistName and title of a track. I have these records in the tracks collection (showing only important fields):
db.tracks.find({},{artistName: 1, title: 1})
{ "_id" : "A10328E00047516670", "artistName" : "Tapani Kansa", "title" : "Tuulia" }
{ "_id" : "A10328E00047516661", "artistName" : "Tapani Kansa", "title" : "Rakkautemme valssi" }
{ "_id" : "A10328E0004751669W", "artistName" : "Tapani Kansa", "title" : "Täysikuu" }
{ "_id" : "A10328E0004751668Y", "artistName" : "Tapani Kansa", "title" : "Muista minua" }
I've created the text index on this collection:
db.tracks.createIndex({artistName: 'text', title: 'text', lyrics: 'text'})
But when I try to search the tracks, no results are returned:
rs-ds047345:PRIMARY> db.tracks.find({$text: {$search: 'Tapani'}}).size()
0
rs-ds047345:PRIMARY> db.tracks.find({$text: {$search: 'Rakkautemme valssi'}}).size()
0
I accidentally noticed, that when I crop some letters from the end of the searched word, I'm starting to get some results... so full text search somehow works, just not in way I would like and expect.
db.tracks.find({$text: {$search: 'Tapa'}}).size()
12
rs-ds047345:PRIMARY> db.tracks.find({$text: {$search: 'Rakkaute'}}).size()
1
Could someone please tell me, how can I search the database using full words, or what I'm doing wrong?
I've tried that on MongoDB versions 3.0.8 and 3.2.1
according to spec -
For case insensitive and diacritic insensitive text searches, the
$text operator matches on the complete stemmed word. So if a document
field contains the word blueberry, a search on the term blue will not
match. However, blueberry or blueberries will match.
what I will suggest is normal index and a regex search
db.tracks.createIndex({"artistName": 1})
db.tracks.createIndex({ "title" : 1})
db.tracks.createIndex({ "lyrics": 1})
db.tracks.find({artistName:"/Tap/[0-10]"}).explain()
the square bracket will force index scan for regex instead of colscan
was testing on 3.0.6 and 3.2.3 with no luck :(
So, the problem was in the documents stored in database. I didn't noticed that they contains a field named language, which changes full text search behaviour, although I tried to disable word stemming by by setting language: 'none' in index and queries.
When I renamed the language field to a different name, the full text search started to work exactly as I expect.

Exact word based search in Solarnet & Mongodb

We have an application where solar-net is integrated with mongodb for searching and full text search is working fine. Now we have to change full text search to exact word based search for example if "DELL" is entered in Search input field, it should only bring up results that are "DELL", and not "DELL Inspiron". Please let us know how to change full text search to exact word based search.Is there any regular expression to do this. Search is based on multiple fields. Please help me.
Thanks
Tarlok
You can use regex as following :
To Search all names starts with "DELL"
db.collectionName.find( { "name" : { $regex : "^DELL.*", $options : "i"} } )
To Search all names contains "DELL" :
db.collectionName.find( { "name" : { $regex : "DELL.*", $options : "i"} } )
here $options : "i" defines ignore case
for more detail visit This.

Mongodb returns capitalized strings first when sorting

When I tried to sort a collection a string field (here Title), sorting not working as expected. Please see below:
db.SomeCollection.find().limit(50).sort({ "Title" : -1 });
Actual Result order
"Title" : "geog.3 students' book"
"Title" : "geog.2 students' book"
"Title" : "geog.1 students' book"
"Title" : "Zoe and Swift"
"Title" : "Zip at the Theme Park"
"Title" : "Zip at the Supermarket"
Expected Result order
"Title" : "Zoe and Swift"
"Title" : "Zip at the Theme Park"
"Title" : "Zip at the Supermarket"
"Title" : "geog.3 students' book"
"Title" : "geog.2 students' book"
"Title" : "geog.1 students' book"
Same issues occurs when I tried to sort by Date field.
Any suggestions?
Update: Version 3.4 has case insensitive indexes
This is a known issue. MongoDB doesn't support lexical sorting for strings (JIRA: String lexicographical ordering). You should sort the results in your application code, or sort using a numeric field. It should sort date fields reliably though. Can you give an example where sorting by date doesn't work?
What exactly surprises you?
It sorts based on the presentation of the numerical representation of the symbol. If you will look here (I know that mongodb stores string in UTF-8, so this is just for educational purpose). You will see that the upper case letters have corresponding numbers lower then lower case letters. Thus they will go in front.
Mongodb can not sort letters based on localization or case insensitive.
In your case g has higher number then Z, so it goes first (sorting in decreasing order). And then 3 has corresponding number higher then 2 and 1. So basically everything is correct.
If you use aggregation expected output is possible see below:
db.collection.aggregate([
{
"$project": {
"Title": 1,
"output": { "$toLower": "$Title" }
}},
{ "$sort": { "output":-1 } },
{"$project": {"Title": 1, "_id":0}}
])
it will give you expected output as below:
{
"result" : [
{
"Title" : "Zoe and Swift"
},
{
"Title" : "Zip at the Theme Park"
},
{
"Title" : "Zip at the Supermarket"
},
{
"Title" : "geog.3 students' book"
},
{
"Title" : "geog.2 students' book"
},
{
"Title" : "geog.1 students' book"
}
],
"ok" : 1
}
Starting with the dates not sorting correctly....
If you're storing a date as a string, it needs to be sortable as a string. It's quite simple:
2013-11-08 // yyyy-mm-dd (the dashes would be optional)
As long as every piece of the date string is padded with 0 correctly, the strings will all sort naturally and in the way you would expect.
A full date time is stored in UTC typically:
2013-11-23T10:46:01.914Z
But, I'd also suggest you instead of storing a date value as a string, you consider whether using a native MongoDB Date would make more sense (reference). If you look at MongoDb's aggregation framework, you'll find there are many functions that can manipulate these dates, while a string is very limited.
As to the string sorting, it's been pointed out that it's sorting like a computer stores the data rather than the way you would sort as a person. If you consider the string is stored as its ASCII/UTF-8 representation, you should see why the sorting is working the way it is:
Zoe = [90, 111, 101]
geo = [103, 101, 111]
If you were to sort those in descending order as you've specified, you should see how "geo"'s internal byte representation is larger than that of the string "Zoe" (with 103 sorting higher than 90 in this case).
Typically, the recommendation when using MongoDb is to store the strings twice if you need to sort a string that has mixed case:
Original string ("Title")
As a normalized string. Possibly for example all as "lowercase"' possibly with accented characters also converted to a common character. So, you'd end up with a new field named "SortedTitle" for example and your code would use that to sort, but display the actual "Title" to users.
If you are doing in ror and mongomapper then follow below steps :
I have taken my model name abc and fetch result for Title.
#test_abc_details_array_full=Abc.collection.aggregate([
{"$project"=> {
"Title"=> 1,
"output"=> { "$toLower"=> "$Title" }
}},
{ "$sort"=> { "output"=>1 } },
{"$project"=> {Title: 1, _id:0}},
]);