Stop mongodb from ignoring special characters? - mongodb

Model.find({ $text : {$search: "#text"} })
returns everything that includes "text", not only those documents with "#text". I've tried putting an \ before the #, to no avail. How do I stop mongodb from ignoring my special characters? Thanks.

Tomalak's description of how text indexing works is correct, but you can actually use a text index for an exact phrase match on a phrase with a special character:
> db.test.drop()
> db.test.insert({ "_id" : 0, "t" : "hey look at all this #text" })
> db.test.insert({ "_id" : 1, "t" : "text is the best" })
> db.test.ensureIndex({ "t" : "text" })
> db.test.count({ "$text" : { "$search" : "text" } })
2
> db.test.count({ "$text" : { "$search" : "#text" } })
2
> db.test.find({ "$text" : { "$search" : "\"#text\"" } })
{ "_id" : 0, "t" : "hey look at all this #text" }
Exact phrase matches are indicated by surrounding the phrase in double quotes, which need to be escaped in the shell like "\"#text\"".
Text indexes are larger than normal indexes, but if you are doing a lot of case-insensitive exact phrase matches then they can be a better option than a standard index because they will perform better. For example, on a field t with an index { "t" : 1 }, an exact match regex
> db.test.find({ "t" : /#text/ })
performs a full index scan. The analogous (but not equivalent) text query
> db.test.find({ "$text" : { "$search" : "\"#text\"" } })
will use the text index to locate documents containing the term "text", then scan all those documents to see if they contain the full phrase "#text".
Be careful because text indexes aren't case sensitive. Continuing the example above:
> db.test.insert({ "_id" : 2, "t" : "Never seen so much #TEXT" })
> db.test.find({ "t" : /#text/ })
{ "_id" : 0, "t" : "hey look at all this #text" }
> db.test.find({ "$text" : { "$search" : "\"#text\"" } })
{ "_id" : 0, "t" : "hey look at all this #text" }
{ "_id" : 2, "t" : "Never seen so much #TEXT" }

Related

$and operator on multiple $text search in mongo

Is it possible to have $and operator on multiple $text index search in mongo?
I have documents in tp collection of my db
> db.tp.find()
{ "_id" : ObjectId("...."), "name" : "tp", "dict" : { "item1" : "random", "item2" : "some" } }
{ "_id" : ObjectId("...."), "name" : "tp", "dict" : { "item3" : "rom", "item4" : "tttt" } }
Then I do
> db.tp.createIndex({ "$**": "text" })
> db.tp.find({ $and: [{$text : { $search: "random" } }, {$text : { $search: "redruth" } }]})
And it fails with
Error: error: {
"waitedMS" : NumberLong(0),
"ok" : 0,
"errmsg" : "Too many text expressions",
"code" : 2
}
but text index search works for single search so is it not possible to bind multiple text searches with $and operator? By the way I am using wildcard character $** for indexing because I want to search over entire document.
Base on mongoDB docs, AND operator can use directly in search term by combining quote and space. For example, we search for "ssl certificate" AND "authority key", so the query should like:
> db.tp.find({'$text': {'$search': '"ssl certificate" "authority key"'}})
A query can specify at most one $text expression. See:
https://docs.mongodb.com/manual/reference/operator/query/text/

MongoDB count occurances of a substring in a collection

Hello I'm a MongoDb beginner. I have a database of a IRC chatlog. The document structure is very simple
{
"_id" : ObjectId("000"),
"user" : "username",
"message" : "foobar foobar potato idontknow",
"time" : NumberLong(1451775601469)
}
I have thousands of these and I want to count the number of occurrences of the string "foobar". I have googled this issue and found something about aggregations. I looks very complicated and I haven't really found any issue this "simple". I'd be glad if someone pointed me in the right direction what to research and I wouldn't mind an example command that does exactly this what I want. Thank you.
There is no any built-in operator to solve your request.
You can try this query, but it has very poor performance:
db.chat.find().forEach(function(doc){
print(doc["user"] + " > " + ((doc["message"].match(/foobar/g) || []).length))
})
If you could change your message field to array, then we could apply aggregation...
EDIT:
If you add array of splitted words into your entry, we can apply aggregation
Sample:
{
"_id" : ObjectId("569bb7040586bcb40f7d2539"),
"user" : "username",
"fullmessage" : "foobar foobar potato idontknow",
"message" : [
"foobar",
"foobar",
"potato",
"idontknow"
],
"time" : NumberLong(1451775601469)
}
Aggregation. We create new entry for each array element, match given word (foobar, in this case) and then count matched result.
db.chat.aggregate([
{"$unwind" : "$message"},
{"$match" : {"message" : {"$regex" : "foobar", "$options" : "i"}}},
{"$group" : {_id:{"_id" : "$_id", "user" : "$user", "time" : "$time", "fullmessage" : "$fullmessage"}, "count" : {$sum:1}}},
{"$project" : {_id:"$_id._id", "user" : "$_id.user", "time" : "$_id.time", "fullmessage" : "$_id.fullmessage", "count" : "$count"}}
])
Result:
[
{
"_id" : ObjectId("569bb7040586bcb40f7d2539"),
"count" : 2,
"user" : "username",
"time" : NumberLong(1451775601469),
"fullmessage" : "foobar foobar potato idontknow"
}
]

MongoDB Text search fails for stop words

I'm trying to do a query in my collection, but its not returning anything.
Here's my query:
{'$match': {'$text': {'$search': 'a'}}},
{'$group': {'_id': {'texto': '$texto'},
'somanumero': {'$sum': '$numero'}}}
My collection:
{ "_id" : ObjectId("555cdc4fe13823315537042d"), "texto" : ObjectId("555cdc4fe13823315537042c"), "numero" : ObjectId("555cdc4fe13823315537042e") }
{ "_id" : ObjectId("555cdc5ee13823315537042f"), "numero" : 5, "texto" : "a", "lattexto" : "-15.79506", "lontexto" : "-47.88322" }
{ "_id" : ObjectId("555cdc6ae138233155370430"), "numero" : 10, "texto" : "a", "lattexto" : "-15.79506", "lontexto" : "-47.88322" }
{ "_id" : ObjectId("555cdc73e138233155370431"), "numero" : 3, "texto" : "b", "lattexto" : "-15.79506", "lontexto" : "-47.88322" }
And here's my text index:
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "texto_text",
"ns" : "OSA.teste_texto",
"default_language" : "portuguese",
"weights" : {
"texto" : 1
},
"language_override" : "language",
"textIndexVersion" : 2
}
When i use $group or $match alone, it works.
Am I doing something wrong?
From the docs:
MongoDB supports text search for various languages. text indexes drop
language-specific stop words (e.g. in English, “the”, “an”, “a”,
“and”, etc.) and uses simple language-specific suffix stemming.
The problem with your data is that some of the records have the language-specific stop word, a, which is considered to be a stop word in portugese too. Some of the stop words include, and a is on top of the list.
a
ao
aos
aquela
aquelas
aquele
aqueles
aquilo
as
até
com
como
These words are never indexed, and hence whenever you query for stop words, you get no results.
At the same time, If you query for b, you would get results, since it is not a stop word and would be indexed.

MongoDB full text search - matching words and exact phrases

I'm currently having some issues with the full text search functionality in MongoDB. Specifically when trying to match exact phrases.
I'm testing out the functionality in the mongo shell, but ultimately I'll be using Spring Data MongoDB with Java.
So I first tried running this command to search for the words "delay", "late" and the phrase "on time"
db.mycollection.find( { $text: { $search: "delay late \"on time\"" } }).explain(true);
And the resulting explain query told me:
"parsedTextQuery" : {
"terms" : [
"delay",
"late",
"time"
],
"negatedTerms" : [ ],
"phrases" : [
"on time"
],
"negatedPhrases" : [ ] },
The issues here being that I don't want to search for the word "time", but rather the phrase "on time". I do want to search for delay and late and ideally don't want to prevent the stemming.
I tried a few different permutations e.g.
db.mycollection.find( { $text: { $search: "delay late \"'on time'\"" } }).explain(true);
db.mycollection.find( { $text: { $search: "delay late \"on\" \"time\"" } }).explain(true);
But couldn't seem to get the right results. I can't see anything obvious in the documentation about this.
For my purposes should I use the full text search for individual words and the regex search functionality for phrases?
Currently working with MongoDB version 2.6.5. Thanks.
Did you try the text search to see if it didn't behave correctly? It works as expected for me on MongoDB 2.6.7:
> db.test.drop()
> db.test.insert({ "t" : "I'm on time, not late or delayed" })
> db.test.insert({ "t" : "I'm either late or delayed" })
> db.test.insert({ "t" : "Time flies like a banana" })
> db.test.ensureIndex({ "t" : "text" })
> db.test.find({ "$text" : { "$search" : "time late delay" } }, { "_id" : 0 })
{ "t" : "I'm on time, not late or delayed" }
{ "t" : "Time flies like a banana" }
{ "t" : "I'm either late or delayed" }
> db.test.find({ "$text" : { "$search" : "late delay" } }, { "_id" : 0 })
{ "t" : "I'm on time, not late or delayed" }
{ "t" : "I'm either late or delayed" }
> db.test.find({ "$text" : { "$search" : "late delay \"on time\"" } }, { "_id" : 0 })
{ "t" : "I'm on time, not late or delayed" }
Why is "time" in the terms array in the explain? Because if the phrase "on time" occurs in a document, the term time must also. MongoDB uses the text index to the extent it can to help locate the phrase and then will check the index results to see which actually matches the full phrase and not just the terms in the phrase.

How to see the "queryDebugString" on MongoDB's Text Search?

I'm trying the Text Search functionality from MongoDB, and out there on the internet I can see lots of information where people do this:
db.posts.runCommand("text", {search: '"robots are crazy"'})
And get this:
{
"queryDebugString" : "robot||||robots are||",
"language" : "english",
"results" : [
{
"score" : 0.6666666666666666,
"obj" : {
"_id" : ObjectId("50ebc482214a1e88aaa4ad9e"),
"txt" : "Robots are superior to humans"
}
}
],
"stats" : {
"nscanned" : 2,
"nscannedObjects" : 0,
"n" : 1,
"timeMicros" : 185
},
"ok" : 1
}
I know runCommand("text", ... is deprecated, but I've tried the db.posts.find({ $text: { $search: '"robots are crazy"' } }) approach as well, and nothing there.
How can I see this "queryDebugString" attribute? I've looked for some kind of debug flags to use when starting up mongod, but couldn't find anything.
For more recent versions of Mongo (2.6 at least), use .explain(true) for the verbose output, which will contain a parsedTextQuery field with more, and more readable, information than queryDebugString:
> db.test.find({ "$text" : { "$search" : "cows are lovely" } }).explain(true)
{
"cursor" : "TextCursor",
...
"stats" : {
"type" : "TEXT",
...
"parsedTextQuery" : {
"terms" : ["cow", "love"],
"negatedTerms" : [],
"phrases" : [],
"negatedPhrases" : []
]
}
...
}
Just try this:
db.posts.find({ "text" : '"robots are crazy"' })
Now in Mongo, operations became (in basic state) something like:
db.collection.action({query}}
I think it's enough to start, but try to go to the MongoDB documentation:
http://docs.mongodb.org/manual/