MongoDB - find if a field value is contained in a given string

MongoDB - find if a field value is contained in a given string - mongodb

Is it possible to query documents where a specific field is contained in a given string?
for example if I have these documents:
{a: 'test blabla'},
{a: 'test not'}
I would like to find all documents that field a is fully included in the string "test blabla test", so only the first document would be returned.
I know I can do it with aggregation using $indexOfCP and it is also possible with $where and mapReduce. I was wandering if it's possible to do it in find query using the standard MongoDB operators (e.g., $gt, $in).
thanks.

I can think of 2 ways you could do this:
Option 1
Using $where:
db.someCol.find( { $where: function() { return "test blabla test".indexOf(this.a) > -1; } }
Explained: Find all documents whose value of field "a" is found WITHIN some given string.
This approach is universal, as you can run any code you like, but less recommended from a performance perspective. For instance, it cannot take advantage of indexes. Read full $where considerations here: https://docs.mongodb.com/manual/reference/operator/query/where/#considerations
Option 2
Using regex matching trickery, ONLY under certain circumstances; below is an example that only works with matching that the field value is found as a starting substring of the given string:
db.someCol.find( { a : /^(t(e(s(t( (b(l(a(b(l(a( (t(e(s(t)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?$/ } )
Explained: Break up the components of your "should-be-contained-within" string and match against all sub-possibilities of that with regex.
For your case, this option is pretty much insane, but it's worth noting as there may be specific cases (such as limited namespace matching), where you would not have to break up each letter, but some very finite set of predetermined parts. And in that case, this option could still make use of indexes, and not suffer the $where performance pentalties (as long as the complexity of the regex doesn't outweigh that benefit :)

You can use regex to search .
db.company.findOne({"companyname" : {$regex : ".*hello.*"}});

If you are using Mongo v3.6+, you can use $expr.
As you mentioned $indexOfCP can be used to get index, here it will be
{
"$expr": {
{$ne : [{$indexOfCP: ["test blabla test", "$a"]}, -1]}
}
}
The field name should be prefixed with a dollar sign ($), as $expr allows filters from aggregation pipeline.

Related

How to evaluate simple expressions

When developing complex aggregations, I want the ability to test out simpler expressions as a sanity check. So I'm wondering if mongo shell has the ability to evaluate simple expressions.
For example, I want to do simple things like:
> { $hour: ISODate("2016-01-01T12:30:00Z") }
ISODate("2016-01-01T12:30:00Z")
In the example above it seems the shell isn't evaluating and returning the hour component as desired.
Is it possible to do what I want here?

If you're willing to use something other than Mongo Shell, NoSQLBooster will evaluate partial query operations. Just highlight the relevant section and click Run.
This is particularly useful for constructing pipelines with multiple stages. You can evaluate your pipeline one stage a time to verify which documents are passed to the next stage.

Not an ideal solution, but you can something like this:
var exprr = { "$hour": ISODate("2016-01-01T12:30:00Z") };
var tempCollection = "tempCollection";
db.getCollection(tempCollection).insert({});
db.getCollection(tempCollection).aggregate([
{"$project" : {"_id" : 0, "result" : exprr}}
]);
db.getCollection(tempCollection).drop();
Or you can wrap last part in a function and save it, for resue. The idea is we will make a temporary collection, insert a blank document on it and evaluate an expression the aggregation way. The downside is you can only evaluate those expressions which are supported in project aggregation operation.

Mongodb text index - better scoring?

I have an index in MongoDB which covers name and email.
This works, and I can query it with:
const c = await Contact.
find({ $text: { $search: search } }, { score: { $meta: "textScore" } })
.sort({ score: { $meta: "textScore" } })
.skip(skip)
.limit(20);
But, the results are somewhat odd, yet logical.
e.g.
if I search for "Roger Johan"
It will start listing both people called Roger and Johan, which is logical.
but, it would have been less odd if it ranked "Roger Johansson" highest as that is a match on both Roger and Johan%
Is there any way to tune this?
I know I can regex match on partial, but that instead fails on things like:
data: Roger T. Johansson
query: Roger Johansson
Is there any fancy trick to combine parts of these two options?

If you apply search by phrase it will be able to find Roger Johan, but it won't work if you will try to search for Rog or Johan.
To make it work with partial matches on the first word we created additional field with prefixes for the word, i.g. ["Rog", "Roge"] and included this field into text index.
Having that implemented search will be able to find searches for Rog as well as Roger Johan.
If you need to search last name Johan you can also include another property with prefixes ["Joh", "Joha", "Johan", "Johans", "Johanss", "Johansso"] and give it lower (or higher, depending on how you want results to appear) rank. Or you can include all prefixes to the same array property if the rank should be same.
Just to be clear, you do need to use phrase search, i.e.: "\"Roger Johan"\".

I haven't tried it myself, but maybe you need to do the search as a phrase: https://docs.mongodb.com/manual/reference/operator/query/text/#phrases
If not I think it will split your search term and then search.

significance of $ and "" in mongodb

I am learning MongoDB. Getting confused on usage of "$"
I have collection as below schema:
{
_id: 1,
"name": "test",
"city": "gr",
"sector": "IT",
"salary":1000
}
I find below output on executing below query:
Query Result
db.user.find({salary:2000}); Works
db.user.find({$salary:2000}); does not work(unknown top level operator: $salary)
db.user.aggregate({$group:{_id:null,avg:{$avg:"$salary"}}}); Works
db.user.aggregate({$group:{_id:null,avg:{$avg:$salary}}}); does not work($salary is not defined)
db.user.aggregate({$group:{_id:null,avg:{$avg:"salary"}}}); gives wrong output.
Can anyone please explain,what is the syntactical significance of "" and $ in mongoDB.

Hi lets look at these queries
1- db.user.find({salary:2000});
2- db.user.find({$salary:2000});
Take a look at this for find.
According to this find takes {field: value}, your first query works because salary is valid field.
Your second query doesn't work becuase there is no field $salary
3- db.user.aggregate({$group:{_id:null,avg:{$avg:"$salary"}}});
4- db.user.aggregate({$group:{_id:null,avg:{$avg:$salary}}});
5- db.user.aggregate({$group:{_id:null,avg:{$avg:"salary"}}});
For aggregation, lets take a look at this $avg.
Here it says that $avg takes {$avg: expression}. So you are actually keeping expression over there not a field.
Now take a look at this for expression.
Expression can be field paths and system variables, literals, expression objects, and expression operators.
Query numbers 3,4,5 aren't expression objects or expression operators. So lets eliminate these options.
Now lets take a look at $literal.
It states that literals can be of any type, however MongoDB parses literals that start with a dollar sign as a path to a field.
Finally take a look at Field Path and System variables.
It states "To specify a field path, use a string that prefixes with a dollar sign $ ... For example, "$user" to specify the field path for the user field or "$user.name" to specify the field path to "user.name" field."
That means you are specifying $salary as path to the field in $avg:"$salary" and query number 3 works.
Query number 4 doesn't work because $salary is an invalid expression.
This should explain the significance of ""
Query number 5 is not working because again it doesn't find any field to average on. Though it works because its a valid query it simply returns null.
You could have had
db.user.aggregate({$group:{_id:null,avg:{$avg:"some_non_existent_field"}}});
And the query will still run fine but you will get null for your results.
I hope this helps, this was a lot of fun to gather.

Mongo find by regex: return only matching string

My application has the following stack:
Sinatra on Ruby -> MongoMapper -> MongoDB
The application puts several entries in the database. In order to crosslink to other pages, I've added some sort of syntax. e.g.:
Coffee is a black, caffeinated liquid made from beans. {Tea} is made from leaves. Both drinks are sometimes enjoyed with {milk}
In this example {Tea} will link to another DB entry about tea.
I'm trying to query my mongoDB about all 'linked terms'. Usually in ruby I would do something like this: /{([a-zA-Z0-9])+}/ where the () will return a matched string. In mongo however I get the whole record.
How can I get mongo to return me only the matched parts of the record I'm looking for. So for the example above it would return:
["Tea", "milk"]
I'm trying to avoid pulling the entire record into Ruby and processing them there

I don't know if I understand.
db.yourColl.aggregate([
{
$match:{"yourKey":{$regex:'[a-zA-Z0-9]', "$options" : "i"}}
},
{
$group:{
_id:null,
tot:{$push:"$yourKey"}
}
}])
If you don't want to have duplicate in totuse $addToSet

The way I solved this problem is using the string aggregation commands to extract the StartingIndexCP, ending indexCP and substrCP commands to extract the string I wanted. Since you could have multiple of these {} you need to have a projection to identify these CP indices in one shot and have another projection to extract the words you need. Hope this helps.

Mongodb $strcasecmp. Strange behaviour when the field content has dollar signs

I'm triying to compare two strings on MongoDB Aggregation Framework. This is the query I'm using:
db.people.aggregate({
$project:{
name:1,
balance:1,
compareBalance:{$strcasecmp:["$balance","$2,500.00"]}
}
});
My problem is that each "$balance" field has a dollar sign at the begining of the string, and the results returned by the query seem to be incorrect. For example:
{
"_id" : ObjectId("5257e2e7834a87e7ea509665"),
"balance" : "$1,416.00",
"name" : "Houston Monroe",
"compareBalance" : 1
}
As you can see the results, the field comparision is 1, but it should be -1 because $2,500.00 is higher than $1,416.00. In fact, all comparisions has a value of 1.
There is a workaround by using $substr to remove the dollar sign at the beginning of all fields, but I want to know who is doing this wrong, MongoDB or me.
Thanks in advance.

It sounds like you are trying to use the "balance" field as a numeric, for example might want to compare $10 to $100.
The best way to do this is to store the actual value, and add the formatting, the $ the , etc when displaying to the user.
So, you would have - balance: 2500
Slightly unrelated...
Not sure if you are doing much calculation on the value, but using binary floating point numbers for currency is a bad idea (can't accurately represent all numbers), so, it's often better to store an integer with the cents (or if high precision is required, an integer for hundredths of cents)
This could give: balanceCents: 250000 or balanceFourDec: 25000000
Then you can use $gt $lt and arithmetic

The $ is used as a field reference operator. So, the aggregation pipeline is trying to do a comparison between a field called "$balance" and "$2,500.00":
{
"balance": "$5,000.00",
"2,500.00": undefined
}
Of course, that's not what you are looking for.
You shouldn't start with the $ in the data. Also, unless you've got fixed length strings, sorting and comparisons isn't going to work the way you would expect if you're trying to store numbers as strings. If you're just doing this as an example, I'd suggest you use the actual math operators for numbers, and leave $strcasecmp to actual strings.

you can use the { $literal: < value > } pipeline operator to ignore the cash sign.
https://docs.mongodb.com/manual/reference/operator/aggregation/literal/