MongoDB Text search fails for stop words - mongodb

I'm trying to do a query in my collection, but its not returning anything.
Here's my query:
{'$match': {'$text': {'$search': 'a'}}},
{'$group': {'_id': {'texto': '$texto'},
'somanumero': {'$sum': '$numero'}}}
My collection:
{ "_id" : ObjectId("555cdc4fe13823315537042d"), "texto" : ObjectId("555cdc4fe13823315537042c"), "numero" : ObjectId("555cdc4fe13823315537042e") }
{ "_id" : ObjectId("555cdc5ee13823315537042f"), "numero" : 5, "texto" : "a", "lattexto" : "-15.79506", "lontexto" : "-47.88322" }
{ "_id" : ObjectId("555cdc6ae138233155370430"), "numero" : 10, "texto" : "a", "lattexto" : "-15.79506", "lontexto" : "-47.88322" }
{ "_id" : ObjectId("555cdc73e138233155370431"), "numero" : 3, "texto" : "b", "lattexto" : "-15.79506", "lontexto" : "-47.88322" }
And here's my text index:
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "texto_text",
"ns" : "OSA.teste_texto",
"default_language" : "portuguese",
"weights" : {
"texto" : 1
},
"language_override" : "language",
"textIndexVersion" : 2
}
When i use $group or $match alone, it works.
Am I doing something wrong?

From the docs:
MongoDB supports text search for various languages. text indexes drop
language-specific stop words (e.g. in English, “the”, “an”, “a”,
“and”, etc.) and uses simple language-specific suffix stemming.
The problem with your data is that some of the records have the language-specific stop word, a, which is considered to be a stop word in portugese too. Some of the stop words include, and a is on top of the list.
a
ao
aos
aquela
aquelas
aquele
aqueles
aquilo
as
até
com
como
These words are never indexed, and hence whenever you query for stop words, you get no results.
At the same time, If you query for b, you would get results, since it is not a stop word and would be indexed.

Related

Insert document into mongodb from existing table

I am trying to write a query in mongo that will create a new table, loop through my data set, and insert the TopExecutiveTitle into the new table. I also would like it to keep count of each position and only insert a position into the table when it is new.
This is what I have so far. This code loops through my table and inserts the TopExectuiveTitle into a new table. However, it does not group them together and keep count. How do I write my query so that it will?
db.car.find().forEach( function (x) {
db.TopExecutiveTable.insert({Topexecutivetitle: x.Topexecutivetitle})
});
Here is a sample of a document in my database.
{
"_id" : ObjectId("5a22c8e562c2e489c5df70fa"),
"2016rank" : 1,
"Dealershipgroupname" : "AutoNation Inc.?",
"Address" : "200 S.W. 1st Ave.",
"City/State/Zip" : "Fort Lauderdale, FL 33301",
"Phone" : "(954) 769-7000",
"Companywebsite" : "www.autonation.com",
"Topexecutive" : "Mike Jackson",
"Topexecutivetitle" : "chairman & CEO",
"Totalnewretailunits" : "337,622",
"Totalusedunits" : "225,713",
"Totalfleetunits" : 3,
"Totalwholesaleunits" : "82,342",
"Total_units" : "649,415",
"Total_number_of _dealerships" : 260,
"Grouprevenuealldepartments*" : "$21,609,000,000",
"2015rank" : 1
}
The result I would like is something like this
"Topexecutivetitle" : "chairman & CEO"
"Count" : 3
"Topexecutivetitle" : "president"
"Count" : 7
}
To do this you need to use the aggregate function of mongo, something like this:
db.car.aggregate([
{
$group:{
_id:"$Topexecutivetitle",
count:{$sum:1}
}
},
{
$project:{
Topexecutivetitle:"$_id",
count:1,
_id:0
}
},
{
$out:"result"
}])
This will give you your desired output and store it into a new collection "result":
{
"_id" : "president",
"count" : 1.0
},
{
"_id" : "chairman & CEO",
"count" : 3.0
}

MongoDB count occurances of a substring in a collection

Hello I'm a MongoDb beginner. I have a database of a IRC chatlog. The document structure is very simple
{
"_id" : ObjectId("000"),
"user" : "username",
"message" : "foobar foobar potato idontknow",
"time" : NumberLong(1451775601469)
}
I have thousands of these and I want to count the number of occurrences of the string "foobar". I have googled this issue and found something about aggregations. I looks very complicated and I haven't really found any issue this "simple". I'd be glad if someone pointed me in the right direction what to research and I wouldn't mind an example command that does exactly this what I want. Thank you.
There is no any built-in operator to solve your request.
You can try this query, but it has very poor performance:
db.chat.find().forEach(function(doc){
print(doc["user"] + " > " + ((doc["message"].match(/foobar/g) || []).length))
})
If you could change your message field to array, then we could apply aggregation...
EDIT:
If you add array of splitted words into your entry, we can apply aggregation
Sample:
{
"_id" : ObjectId("569bb7040586bcb40f7d2539"),
"user" : "username",
"fullmessage" : "foobar foobar potato idontknow",
"message" : [
"foobar",
"foobar",
"potato",
"idontknow"
],
"time" : NumberLong(1451775601469)
}
Aggregation. We create new entry for each array element, match given word (foobar, in this case) and then count matched result.
db.chat.aggregate([
{"$unwind" : "$message"},
{"$match" : {"message" : {"$regex" : "foobar", "$options" : "i"}}},
{"$group" : {_id:{"_id" : "$_id", "user" : "$user", "time" : "$time", "fullmessage" : "$fullmessage"}, "count" : {$sum:1}}},
{"$project" : {_id:"$_id._id", "user" : "$_id.user", "time" : "$_id.time", "fullmessage" : "$_id.fullmessage", "count" : "$count"}}
])
Result:
[
{
"_id" : ObjectId("569bb7040586bcb40f7d2539"),
"count" : 2,
"user" : "username",
"time" : NumberLong(1451775601469),
"fullmessage" : "foobar foobar potato idontknow"
}
]

How to see the "queryDebugString" on MongoDB's Text Search?

I'm trying the Text Search functionality from MongoDB, and out there on the internet I can see lots of information where people do this:
db.posts.runCommand("text", {search: '"robots are crazy"'})
And get this:
{
"queryDebugString" : "robot||||robots are||",
"language" : "english",
"results" : [
{
"score" : 0.6666666666666666,
"obj" : {
"_id" : ObjectId("50ebc482214a1e88aaa4ad9e"),
"txt" : "Robots are superior to humans"
}
}
],
"stats" : {
"nscanned" : 2,
"nscannedObjects" : 0,
"n" : 1,
"timeMicros" : 185
},
"ok" : 1
}
I know runCommand("text", ... is deprecated, but I've tried the db.posts.find({ $text: { $search: '"robots are crazy"' } }) approach as well, and nothing there.
How can I see this "queryDebugString" attribute? I've looked for some kind of debug flags to use when starting up mongod, but couldn't find anything.
For more recent versions of Mongo (2.6 at least), use .explain(true) for the verbose output, which will contain a parsedTextQuery field with more, and more readable, information than queryDebugString:
> db.test.find({ "$text" : { "$search" : "cows are lovely" } }).explain(true)
{
"cursor" : "TextCursor",
...
"stats" : {
"type" : "TEXT",
...
"parsedTextQuery" : {
"terms" : ["cow", "love"],
"negatedTerms" : [],
"phrases" : [],
"negatedPhrases" : []
]
}
...
}
Just try this:
db.posts.find({ "text" : '"robots are crazy"' })
Now in Mongo, operations became (in basic state) something like:
db.collection.action({query}}
I think it's enough to start, but try to go to the MongoDB documentation:
http://docs.mongodb.org/manual/

Mongodb + Mongoose: trying to add a sub-sub-item

Does this makes any sense when trying to add a sub-sub-item? (I'm new to mongo - be merciful :-))
question = db.questions.findOne({_id: ObjectId("529c5d44211c9a8c11000006")})
question.answers[0].votes.insert(...)
When I run this from the mongo console the result is an error saying [object object] does not have the method insert.
I have the following mongoDB Question Schema.
{
"__v" : 2,
"_creator" : ObjectId("529c5d2d211c9a8c11000005"),
"_id" : ObjectId("529c5d44211c9a8c11000006"),
"answers" : [
{
"postDate" : ISODate("2013-12-02T10:14:19.060Z"),
"postDateText" : "15min ago",
"authorEmail" : "guys#pix.com",
"authorName" : "guys#pix.com",
"body" : "You need magic powder",
"isWinner" : false,
"_creator" : ObjectId("529c5d2d211c9a8c11000005"),
"_id" : ObjectId("529c5d7b211c9a8c11000008"),
"votes" : [
{
"voteType" : "up",
"_creator" : ObjectId("529c5d2d211c9a8c11000005"),
"_id" : ObjectId("529c5d5b211c9a8c11000007")
}
]
}
],
"authorEmail" : "guys#wix.com",
"authorName" : "guys#wix.com",
"body" : "I'm trying to fly...\n\n<pre class=\"brush: js;\">\nfunction logName(name) {\n console.log(name);\n}\n</pre>",
"isResolved" : false,
"postDate" : ISODate("2013-12-02T10:13:24.235Z"),
"tags" : [
"fly"
],
"title" : "How do I fly?",
"views" : [],
"votes" : [
{
"voteType" : "up",
"_creator" : ObjectId("529c5d2d211c9a8c11000005"),
"_id" : ObjectId("529c5d5b211c9a8c11000007")
}
]
}
I'm trying, given a questionId and an answerId to add a vote to the votes array (which is inside the answer). I can't seem to do it. Help?
insert is for adding whole new documents; when you just want to add a new element to an array field of an existing document, you can use update along with an operator like $push.
So, in the shell you would use something like this:
db.questions.update(
{_id: ObjectId("529c5d44211c9a8c11000006")},
{'answers.0.votes': {$push: voteToPush}})

In MongoDB, how to filter by location, sort by another field and paginate the results correctly?

When i don't use pagination, everything works fine (i have only 3 records in this collection, so all of them are listed here):
db.suppliers.find({location: {$near: [-23.5968323, -46.6782386]}},{name:1,badge:1}).sort({badge:-1})
{ "_id" : ObjectId("4f33ff549112b9b84f000070"), "badge" : 3, "name" : "Dedetizadora Alvorada" }
{ "_id" : ObjectId("4f33ff019112b9b84f00005b"), "badge" : 2, "name" : "Sampex Desentupidora e Dedetizadora" }
{ "_id" : ObjectId("4f33feae9112b9b84f000046"), "badge" : 1, "name" : "Higitec Desentupimento e Dedetização" }
But when i try to paginate from the first to the second page, one record doesn't show up and one is repeated:
db.suppliers.find({location: {$near: [-23.5968323, -46.6782386]}},{name:1,badge:1}).sort({badge:-1}).skip(0).limit(2)
{ "_id" : ObjectId("4f33ff549112b9b84f000070"), "badge" : 3, "name" : "Dedetizadora Alvorada" }
{ "_id" : ObjectId("4f33feae9112b9b84f000046"), "badge" : 1, "name" : "Higitec Desentupimento e Dedetização" }
db.suppliers.find({location: {$near: [-23.5968323, -46.6782386]}},{name:1,badge:1}).sort({badge:-1}).skip(2).limit(2)
{ "_id" : ObjectId("4f33feae9112b9b84f000046"), "badge" : 1, "name" : "Higitec Desentupimento e Dedetização" }
Am i doing something wrong or is this some kind of bug?
edit:
Here is a workaround for this. Basically you shouldn't mix $near queries with sorting; use $within instead.
There is an open issue regarding the same problem. Please have a look & vote Geospatial result paging fails when sorting with additional keys