mongoDB prefix wildcard: fulltext-search ($text) find part with search-string

mongoDB prefix wildcard: fulltext-search ($text) find part with search-string - mongodb

I have mongodb with a $text-Index and elements like this:
{
foo: "my super cool item"
}
{
foo: "your not so cool item"
}
If i do search with
mycoll.find({ $text: { $search: "super"} })
i get the first item (correct).
But i also want to search with "uper" to get the fist item - but if i try:
mycoll.find({ $text: { $search: "uper"} })
I dont get any results.
My Question:
If there is a way to use $text so its finds results with a part of the searching string? (e.g. like '%uper%' in mysql)
Attention: I dont ask for a regex only search - i ask for a regex-search within a $text-search!

It's not possible to do it with $text operator.
Text indexes are created with the terms included in the string value or in an array of strings and the search is based in those indices.
You can only group terms on a phrase but not take part of them.
Read $text operator reference and text indexes description.

The best solution is to use both a text index and a regex.
The index will provide excellent speed performances but won't match as many documents as a regex.
The regex will allow a fallback in case the index doesn't return enough results.
db.mycoll.createIndex({ foo: 'text' });
db.mycoll.createIndex({ foo: 1 });
db.mycoll.find({
$or: [
{ $text: { $search: 'uper' } },
{ foo: { $regex: 'uper' } }
]
});
For even better performances (but slightly different results), use ^ inside the regex:
db.mycoll.find({
$or: [
{ $text: { $search: 'uper' } },
{ foo: { $regex: '^uper' } }
]
});

What you are trying to do in your second example is prefix wildcard search in your collection mycoll on field foo. This is not something the textsearch feature is designed for and it is not possible to do it with $text operator. This behaviour does not include wildcard prefix search on any given token in the indexed field. However you can alternatively perform regex search as others suggested. Here is my walkthrough:
>db.mycoll.find()
{ "_id" : ObjectId("53add9364dfbffa0471c6e8e"), "foo" : "my super cool item" }
{ "_id" : ObjectId("53add9674dfbffa0471c6e8f"), "foo" : "your not so cool item" }
> db.mycoll.find({ $text: { $search: "super"} })
{ "_id" : ObjectId("53add9364dfbffa0471c6e8e"), "foo" : "my super cool item" }
> db.mycoll.count({ $text: { $search: "uper"} })
0
The $text operator supports search for a single word, search for one or more words or search for phrase. The kind of search you wish is not supported
The regex solution:
> db.mycoll.find({foo:/uper/})
{ "_id" : ObjectId("53add9364dfbffa0471c6e8e"), "foo" : "my super cool item" }
>
The answer to your final question: to do mysql style %super% in mongoDB you would most likely have to do:
db.mycoll.find( { foo : /.*super.*/ } );

It should work with /uper/.
See http://docs.mongodb.org/manual/reference/operator/query/regex/ for details.
Edit:
As per request in the comments:
The solution wasn't necessarily meant to actually give what the OP requested, but what he needed to solve the problem.
Since $regex searches don't work with text indices, a simple regex search over an indexed field should give the expected result, though not using the requested means.
Actually, it is pretty easy to do this:
db.collection.insert( {foo: "my super cool item"} )
db.collection.insert( {foo: "your not so cool item"})
db.collection.ensureIndex({ foo: 1 })
db.collection.find({'foo': /uper/})
gives us the expected result:
{ "_id" : ObjectId("557f3ba4c1664dadf9fcfe47"), "foo" : "my super cool item" }
An added explain shows us that the index was used efficiently:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"foo" : /uper/
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"foo" : /uper/
},
"keyPattern" : {
"foo" : 1
},
"indexName" : "foo_1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"foo" : [
"[\"\", {})",
"[/uper/, /uper/]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
// skipped
},
"ok" : 1
}
To make a long story short: No, you can not reuse a $text index, but you can do the query efficiently. Like written in Implement auto-complete feature using MongoDB search , one could probably be even more efficient by using a map/reduce approach, eliminating redundancy and unnecessary stop words from the indices, at the cost of being not real time any more.

As francadaval said, text index is searching by terms but if you combine regex and text-index you should be good.
mycoll.find({$or: [
{
$text: {
$search: "super"
}
},
{
'column-name': {
$regex: 'uper',
$options: 'i'
}
]})
Also, make sure that you have normal index applied to the column other than text index.

if you go with regex you can achieve search for "super cool" but not "super item", to achieve both request do an or request with $text and $regex for the search term.
make sure you index both text indexing and normal indexing to work.

You could have achieved is as-
db.mycoll.find( {foo: { $regex : /uper/i } })
Here 'i' is an option, denotes case-insensitive search

Related

Mongodb regex in aggregation using reference to field value

note: I'm using Mongodb 4 and I must use aggregation, because this is a step of a bigger aggregation
Problem
How to find in a collection documents that contains fields that starts with value from another field in same document ?
Let's start with this collection:
db.regextest.insert([
{"first":"Pizza", "second" : "Pizza"},
{"first":"Pizza", "second" : "not pizza"},
{"first":"Pizza", "second" : "not pizza"}
])
and an example query for exact match:
db.regextest.aggregate([
{
$match : { $expr: { $eq: [ "$first" ,"$second" ] } } }
])
I will get a single document
{
"_id" : ObjectId("5c49d44329ea754dc48b5ace"),
"first" : "Pizza", "second" : "Pizza"
}
And this is good.
But how to do the same, but with startsWith ? My plan was to use regex but seems that is not supported in aggregation so far.
With a find and a custom javascript function works fine:
db.regextest.find().forEach(
function(obj){
if (obj.first.startsWith(obj.second)){
print(obj);
}
}
)
And returns correctly:
{
"_id" : ObjectId("5c49d44329ea754dc48b5ace"),
"first" : "Pizza",
"second" : "Pizza"
}
How it's possible to get same result with aggregation framework ?
One idea is to use existing aggregation framework pipeline, out to a temp colletion and then run the find above, to get match I'm looking for. This seems to be a workaround, I hope someone have a better idea.
Edit: here the solution
db.regextest.aggregate([{
$project : {
"first" : 1,
"second" : 1,
fieldExists : {
$indexOfBytes : ['$first', '$second' , 0]
}
}
}, {
$match : {
fieldExists : {
$gt : -1
}
}
}
]);

The simplest way is to use $expr, first available in 3.6 like this:
{
$match: {
$expr: {
$eq: [
'$second',
{
$substr: ['$first', 0, { $strLenCP: '$second' }]
}
]
}
}
}
This compares the string in field second with the first N characters of first where N is the length of second string. If they are equal, then first starts with second.
4.2 adds support for $regex in aggregation expressions, but starts with is much simpler and doesn't need regular expressions.

MongoDB 3.4.5: Hyphen-Minus does not work with $text searches

I use MongoDB, Version 3.4.5 and I tried to exclude a term with -(minus).
For any reason it does not work.
These are my tries:
db.Product.find()
{ "_id" : ObjectId("59cbfcd01889a9fd89a3565c"), "name" : "Produkt Neu", ...
{ "_id" : ObjectId("59cc7d941889a4f4c2f43b14"), "name" : "Produkt2", ...
db.Product.find( { $text: { $search: 'Produkt -Neu' } } );
db.Product.find( { $text: { $search: "Produkt -Neu" } } );
db.Product.find( { $text: { $search: "Produkt2" } } );
{ "_id" : ObjectId("59cc7d941889a4f4c2f43b14"), "name" : "Produkt2", ...
db.Product.dropIndexes()
db.Product.createIndex({ name: "text" })
{
"nIndexesWas" : 2,
"msg" : "non-_id indexes dropped for collection",
"ok" : 1
}
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.Product.find( { $text: { $search: "Produkt -Neu" } } );
db.Product.find( { $text: { $search: "Produkt Neu" } } );
{ "_id" : ObjectId("59cbfcd01889a9fd89a3565c"), "name" : "Produkt Neu", ...
Does anyone know what I have to do in order to get it work with -(minus).

I created a collection: Product with the following documents ...
{
"_id" : ObjectId("59d0ada3c26584cd8b79fc51"),
"name" : "Produkt Neu"
}
{
"_id" : ObjectId("59d0adafc26584cd8b79fc54"),
"name" : "Produkt2"
}
... and I declared a text index on this collection as follows:
db.Product.createIndex({ name: "text" })
I ran the following queries which faithfully reproduce the situation described in your question:
// returns one document since there is one document
// which has the text indexed value: "Produkt Neu"
db.Product.find( { $text: { $search: "Produkt Neu" } } );
// returns no documents since there is no document
// which has the text indexed value: "Produkt2"
db.Product.find( { $text: { $search: "Produkt -Neu" } } )
You are, I think, expecting this query ...
db.Product.find( { $text: { $search: "Produkt -Neu" } } )
... to return the second document on the grounds that excluding Neu should allow a match on the document having name=Produkt2 but this is not how MongoDB $text searches work. MongoDB $text searches do not support partial matching so the search term Produkt -Neu (which evaluates as Produkt) will not match Produkt2. To verify this, I ran the following query:
db.Product.find( { $text: { $search: "Produkt2 -Neu" } } )
This query returns the second document (i.e. the one with name=Produkt2) which proves that the hyphen-minus (-) successfully negated the term: Neu.
On a side note; MongoDB text indexes do support language stemming, to verify this behaviour I added the following document...
{
"_id" : ObjectId("59d0b2b4c26584cd8b79fd7c"),
"name" : "Produkts"
}
...and then ran this query ...
db.Product.find( { $text: { $search: "Produkt -Neu" } } );
This query returns the document with name=Produkts because Product is a stem of Produkts.
In summary, a $text search will find matches where each search term has either (a) a match on a whole world in the text index or (b) is a recognised stem of a whole word in the text index. Note: there are also phrase matches but those are not relevant to the examples in your question. Use of the hyphen-minus serves to change the search terms but it does not change how the search term is evaluated.
More details in the docs and there is an open issue with MongoDB relating to supporting partial matching on text indexes.
If you really need to support partial matching then you'll probably want to discard the text index and use the $regex operator instead. Though it's worth noting that index coverage with the $regex operator is probably not what you expect, the brief summary is this: if your search value is anchored (i.e. Produk, rather than rodukt) then MongoDB can use an index but otherwise it cannot.

How to search document with condition of not having exact object in array of objects?

I have a collection of persons whose schema looks like the collection of following documents.
Document: {
name:
age:
educations:[{
title:xyz,
passed_year:2005,
univercity:abc},
{
title:asd
passed_year:2007,
univercity:mno
}],
current_city:ghi
}
Now I wanna show all the persons who has not done xyz education from abc university in year 2005.
I think two possible queries for this need but not sure which one to use as both of them are giving me the output
Query 1:
db.persons.find({"education":{$ne:{$elemMatch:{"title":"xyz","passed_year":2005,"univercity":"abc"}}}})
Query 2:
db.persons.find({"education":{$not:{$elemMatch:{"title":"xyz","passed_year":2005,"univercity":"abc"}}}})
I'm quite confused about operator $ne and $not, which one should I use with $elemMatch as both of them are giving me the output.

Given this $elemMatch: {"title":"xyz","passed_year":2005,"univercity":"abc"} I think you want to exclude any documents which contain an sub document in the educations array which contains all of these pairs:
"title" : "xyz"
"passed_year" : 2005
"univercity" : "abc"
This query will achieve that:
db.persons.find({
"educations": {
$not: {
$elemMatch:{"title": "xyz", "passed_year": 2005, "univercity": "abc"}
}
}
})
In your question you wrote:
both of them are giving me the output
I suspect this is because your query is specifying education whereas the correct attribute name is educations. By specifying education you are adding a predicate which cannot be evaluated since it references a non existent document attribute so regardless of whether that predicate uses $ne or $not it will simply not be applied.
In answer to the question of which operator to use: $not or $ne: if you run the above query with .explain(true) you'll notice that the parsed query produced by Mongo is very different for each of these operators.
Using $ne
"parsedQuery" : {
"$not" : {
"educations" : {
"$eq" : {
"$elemMatch" : {
"title" : "xyz",
"passed_year" : 2005,
"univercity" : "abc"
}
}
}
}
}
Using $not:
"parsedQuery" : {
"$not" : {
"educations" : {
"$elemMatch" : {
"$and" : [
{
"passed_year" : {
"$eq" : 2005
}
},
{
"title" : {
"$eq" : "xyz"
}
},
{
"univercity" : {
"$eq" : "abc"
}
}
]
}
}
}
}
So, it looks like use of $ne causes Mongo to do something like this psuedo code ...
not educations equalTo "$elemMatch" : {"title" : "xyz", "passed_year" : 2005, "univercity" : "abc"}
... i.e. it treats the elemMatch clause as if it is the RHS of an equality operation whereas use of $not causes Mongo to actually evaluate the elemMatch clause.

Perform a search on main collection field and array of objects simultaneously

I have my document structure as below:
{
"codeId" : 8.7628945723895E13, // long numeric value stored in scientific notation by Mongodb
"problemName" : "Hardware Problem",
"problemErrorCode" : "97695686856",
"status" : "active",
"problemDescription" : "ghdsojgnhsdjgh sdojghsdjoghdghd i0dhgjodshgddsgsdsdfghsdfg",
"subProblems" : [
{
"codeId" : 8.76289457238896E14,
"problemName" : "Some problem",
"problemErrorCode" : "57790389503490249640",
"problemDescription" : "This is edited",
"status" : "active",
"_id" : ObjectId("589476eeae39b20b1c15535b")
},
...
]
}
I have a search field which should search by codeId which basically serves as parentCodeID in search fields as shown below
Now, along with parentIdCode I want to search for codeId, problemCode, problemName and problemDescription as well.
How do I query the submodules with a regex search and at same time tag some parent field with "$or" clause etc. to achieve this ?

You can try something like this.
query = {
'$or': [{
"codeId":somevalue
}, {
"subProblems.codeId": {
"$regex": searchValue,
"$options": "i"
}
}, {
//rest of sub modules fields
}]
};

Append a string to the end of an existing field in MongoDB

I have a document with a field containing a very long string. I need to concatenate another string to the end of the string already contained in the field.
The way I do it now is that, from Java, I fetch the document, extract the string in the field, append the string to the end and finally update the document with the new string.
The problem: The string contained in the field is very long, which means that it takes time and resources to retrieve and work with this string in Java. Furthermore, this is an operation that is done several times per second.
My question: Is there a way to concatenate a string to an existing field, without having to fetch (db.<doc>.find()) the contents of the field first? In reality all I want is (field.contents += new_string).
I already made this work using Javascript and eval, but as I found out, MongoDB locks the database when it executes javascript, which makes the overall application even slower.

Starting Mongo 4.2, db.collection.updateMany() can accept an aggregation pipeline, finally allowing the update of a field based on its current value:
// { a: "Hello" }
db.collection.updateMany(
{},
[{ $set: { a: { $concat: [ "$a", "World" ] } } }]
)
// { a: "HelloWorld" }
The first part {} is the match query, filtering which documents to update (in this case all documents).
The second part [{ $set: { a: { $concat: [ "$a", "World" ] } } }] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline). $set (alias of $addFields) is a new aggregation operator which in this case replaces the field's value (by concatenating a itself with the suffix "World"). Note how a is modified directly based on its own value ($a).

For example (it's append to the start, the same story ):
before
{ "_id" : ObjectId("56993251e843bb7e0447829d"), "name" : "London
City", "city" : "London" }
db.airports
.find( { $text: { $search: "City" } })
.forEach(
function(e, i){
e.name='Big ' + e.name;
db.airports.save(e);
}
)
after:
{ "_id" : ObjectId("56993251e843bb7e0447829d"), "name" : "Big London
City", "city" : "London" }

Old topic but i had the same problem.
Since mongo 2.4, you can use $concat from aggregation framework.
Example
Consider these documents :
{
"_id" : ObjectId("5941003d5e785b5c0b2ac78d"),
"title" : "cov"
}
{
"_id" : ObjectId("594109b45e785b5c0b2ac97d"),
"title" : "fefe"
}
Append fefe to title field :
db.getCollection('test_append_string').aggregate(
[
{ $project: { title: { $concat: [ "$title", "fefe"] } } }
]
)
The result of aggregation will be :
{
"_id" : ObjectId("5941003d5e785b5c0b2ac78d"),
"title" : "covfefe"
}
{
"_id" : ObjectId("594109b45e785b5c0b2ac97d"),
"title" : "fefefefe"
}
You can then save the results with a bulk, see this answer for that.

this is a sample of one document i have :
{
"_id" : 1,
"s" : 1,
"ser" : 2,
"p" : "9919871172",
"d" : ISODate("2018-05-30T05:00:38.057Z"),
"per" : "10"
}
to append a string to any feild you can run a forEach loop throught all documents and then update desired field:
db.getCollection('jafar').find({}).forEach(function(el){
db.getCollection('jafar').update(
{p:el.p},
{$set:{p:'98'+el.p}})
})

This would not be possible.
One optimization you can do is create batches of updates.
i.e. fetch 10K documents, append relevant strings to each of their keys,
and then save them as single batch.
Most mongodb drivers support batch operations.

db.getCollection('<collection>').update(
// query
{},
// update
{
$set: {<field>:this.<field>+"<new string>"}
},
// options
{
"multi" : true, // update only one document
"upsert" : false // insert a new document, if no existing document match the query
});