Query to find the a collection with a number in it - mongodb

So I have a query in which we are supposed to find all the numbers which contain "9".
One of my entries is as follows:
`
{
"_id": {
"$oid": "63945fb591a4f6443e8edbeb"
},
"customer_id": {
"$oid": "63945f6191a4f6443e8edbea"
},
"customer_name": [
"Spike",
"Takahashi"
],
"customer_age": {
"$numberLong": "23"
},
"mobile_no.": [
"7898654324",
"9232111456"
]
}
`
Now I know $regex is used to do the same for strings but I can't seem to get the output which i want. Probably coz i am using array and numbers as strings. Can anyone help me figure this out?

Now I know $regex is used to do the same for strings but I can't seem to get the output which i want. Probably coz i am using array and numbers as strings.
Actually I don't think the problem is the array here. MongoDB has pretty intuitive syntax for specifying that you want to match any item in the array. In general, something like this would work:
> db.foo.find()
[
{ _id: 1, x: [ '123', '456' ] },
{ _id: 2, x: [ '789', '000' ] }
]
> db.foo.find({x:/9/})
[
{ _id: 2, x: [ '789', '000' ] }
]
Why do I say "in general" and why didn't I use your exact sample document? It's because the field name of interest currently ends in a "." character. Continuing on the theme of MongoDB syntax, the "." character is what the database typically uses to denote nested fields:
> db.foo.find({'x.y':'abc'})
[
{ _id: 3, x: { y: 'abc', z: 0 } }
]
You can read more about querying nested fields using dot notation here.
The trailing dot in the field name (mobile_no.) is sort of confusing the database. We can see in this playground example that a regex similar to the one above (just using the slightly more verbose syntax due to a limitation of the playground) fails to return any results:
db.collection.find({
"mobile_no.": {
$regex: "9"
}
})
no document found
But if we remove the "." character from the document and the query, the document is retrieved as expected:
db.collection.find({
"mobile_no": {
$regex: "9"
}
})
[
{
"_id": ObjectId("63945fb591a4f6443e8edbeb"),
"customer_age": NumberLong(23),
"customer_id": ObjectId("63945f6191a4f6443e8edbea"),
"customer_name": [
"Spike",
"Takahashi"
],
"mobile_no": [
"7898654324",
"9232111456"
]
}
]
Playground example here.
MongoDB improved the way that we can interact with field names containing periods and dollar signs last year as outlined on this page. While you technically could query this document using those new techniques, it is quite convoluted and will not perform efficiently. The query would look something like this:
db.collection.find({
$expr: {
$reduce: {
input: {
$getField: "mobile_no."
},
initialValue: false,
in: {
$or: [
"$$value",
{
$regexMatch: {
input: "$$this",
regex: "9"
}
}
]
}
}
}
})
Playground example here
Personally I think it is much more straightforward to remove the trailing "." from the field name as it does not provide any particular value being stored in the backend database anyway.

Related

Find document with only expected values allowed in nested array field in MongoDB

I'll start with the example as it's easier to explain for me.
[
{
"_id": 100,
"narr": [
{
"field": 1
}
]
},
{
"_id": 101,
"narr": [
{
"field": 1,
},
{
"field": 2
}
]
}
]
Goal is to find document exactly with values specified by me for a field.
Example:
for lookup = [1] find document with _id=100.
for lookup = [1,2] find document with _id=101.
So far I came up with (for second example with [1,2]):
db.col.find(
{
"narr": {
"$all": [
{
"$elemMatch": {
"field": {
"$in": [1, 2]
}
}
}
]
}
}
)
But it also includes document with _id=100. How can I make it perform strict match?
Building whole arrays won't work as there are multiple fields with unknown values in each nested structure.
Without considering duplication in the field and your input, you can simply do a find on narr.field. It is like performing search on an array with values from field.
db.collection.find({
$expr: {
$eq: [
"$narr.field",
[
1,
2
]
]
}
})
Here is the Mongo playground for your reference.
If duplication may happens, try to use $setEquals.
db.collection.find({
$expr: {
"$setEquals": [
"$narr.field",
[
1,
2
]
]
}
})
Here is the Mongo playground for your reference.

Dynamic field path for $set in mongo db aggregation

I have documents containing multilingual data. A simplified version looks like:
{
languages: [
"en",
"fr"
],
title: {
en: "Potato Gratin",
fr: "Gratin de pomme de terre"
},
}
Important parts are:
title contains translations in the shape <lang> : <text>
languages contains the list of supported languages, the fist one being the default language.
not all documents support the same languages
What I would like to do is querying that document for a specific language and either
replace the title object by the correct translation if the language is supported
replace the title object by the default translation if the language is not
I-e querying the above document in french should return {"title": "Gratin de pomme de terre"} and if queried in chinese, it should return {"title": "Potato Gratin"}
I have a playground setup: https://mongoplayground.net/p/CP0Z20dTpgy. I have it so that it sets a lang property that the output should be in. I would then like to have a stage that looks like "$set": {"title": "$title.$lang"} but it complains that a field path component should not start with $, which I am guessing means that mongo does not support dynamic field paths ?
Any idea on how to achieve something like that?
Some notes:
In the actual documents I have a lot of these fields so a solution with "$unwind" would be costly.
the reason it is structured with languages as keys is that it helps with indexing with Mongo Atlas. Changing the structure to have an array of translations would hurt other parts of the app.
You have to transform your object to an array with $objectToArray, filter this array and get element 0 of it. Then you can transform back you value.
db.collection.aggregate([
{
"$match": {
"_id": BinData(0, "3BByrilZQ2GTdlXG0nrGXw=="),
},
},
{
"$set": {
"lang": {
"$cond": [
{
"$in": [
"zh",
"$languages"
]
},
"zh",
{
"$first": "$languages"
}
]
}
}
},
{
$addFields: {
title: {
"$arrayElemAt": [
{
"$filter": {
"input": {
"$objectToArray": "$title"
},
"as": "title",
"cond": {
$eq: [
"$$title.k",
"$lang"
]
}
}
},
0
]
}
}
},
{
$addFields: {
title: "$title.v"
}
}
])
Of course, you have to pass your 'zh' as parameter in your code, on both places.
You can test it here

Mongo Sort by Count of Matches in Array

Lets say my test data is
db.multiArr.insert({"ID" : "fruit1","Keys" : ["apple", "orange", "banana"]})
db.multiArr.insert({"ID" : "fruit2","Keys" : ["apple", "carrot", "banana"]})
to get individual fruit like carrot i do
db.multiArr.find({'Keys':{$in:['carrot']}})
when i do an or query for orange and banana, i see both the records fruit1 and then fruit2
db.multiArr.find({ $or: [{'Keys':{$in:['carrot']}}, {'Keys':{$in:['banana']}}]})
Result of the output should be fruit2 and then fruit1, because fruit2 has both carrot and banana
To actually answer this first, you need to "calculate" the number of matches to the given condition in order to "sort" the results to return with the preference to the most matches on top.
For this you need the aggregation framework, which is what you use for "calculation" and "manipulation" of data in MongoDB:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$project": {
"ID": 1,
"Keys": 1,
"order": {
"$size": {
"$setIntersection": [ ["carrot", "banana"], "$Keys" ]
}
}
}},
{ "$sort": { "order": -1 } }
])
On an MongoDB older than version 3, then you can do the longer form:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$unwind": "$Keys" },
{ "$group": {
"_id": "$_id",
"ID": { "$first": "$ID" },
"Keys": { "$push": "$Keys" },
"order": {
"$sum": {
{ "$cond": [
{ "$or": [
{ "$eq": [ "$Keys", "carrot" ] },
{ "$eq": [ "$Keys", "banana" ] }
]},
1,
0
]}
}
}
}},
{ "$sort": { "order": -1 } }
])
In either case the function here is to first match the possible documents to the conditions by providing a "list" of arguments with $in. Once the results are obtained you want to "count" the number of matching elements in the array to the "list" of possible values provided.
In the modern form the $setIntersection operator compares the two "lists" returning a new array that only contains the "unique" matching members. Since we want to know how many matches that was, we simply return the $size of that list.
In older versions, you pull apart the document array with $unwind in order to perform operations on it since older versions lacked the newer operators that worked with arrays without alteration. The process then looks at each value individually and if either expression in $or matches the possible values then the $cond ternary returns a value of 1 to the $sum accumulator, otherwise 0. The net result is the same "count of matches" as shown for the modern version.
The final thing is simply to $sort the results based on the "count of matches" that was returned so the most matches is on "top". This is is "descending order" and therefore you supply the -1 to indicate that.
Addendum concerning $in and arrays
You are misunderstanding a couple of things about MongoDB queries for starters. The $in operator is actually intended for a "list" of arguments like this:
{ "Keys": { "$in": [ "carrot", "banana" ] } }
Which is essentially the shorthand way of saying "Match either 'carrot' or 'banana' in the property 'Keys'". And could even be written in long form like this:
{ "$or": [{ "Keys": "carrot" }, { "Keys": "banana" }] }
Which really should lead you to if it were a "singular" match condition, then you simply supply the value to match to the property:
{ "Keys": "carrot" }
So that should cover the misconception that you use $in to match a property that is an array within a document. Rather the "reverse" case is the intended usage where instead you supply a "list of arguments" to match a given property, be that property an array or just a single value.
The MongoDB query engine makes no distinction between a single value or an array of values in an equality or similar operation.

Mongodb find inside sub array

I have a document that's setup like this:
{
_id : ObjectId(),
info : [
[
1399583281000,
20.13
],
[
1399583282000,
20.13
],
[
1399583283000,
20.13
],
[
1399583285000,
20.13
],
[
1399583286000,
20.13
]
]
}
This data could be spread across multiple documents. In general, each document contains data in the info for 59 periods (seconds).
What I would like to do is get all of the info data where the timestamp is greater than a specific time.
Any ideas how I would go about doing this?
Thank you
EDIT:
So, I've found that this seems to return all of the documents:
db.infos.find({
info:{
$elemMatch:{
0:{
$gt:1399583306000
}
}
}
})
But maybe I need to use this in an aggregate query? so that it will return just all the values?
Your on the right track, but there are a few things to note here, aside from the part that nested arrays ( and especially with anonymous keys) are not exactly a great way to store things, but as long as you consistently know the position then that should be reasonably okay.
There is a distinct difference between matching documents and matching "elements of an array". Though your current value would actually not match (your search value is not within the bounds of the document), if the value actually was valid your query correctly matches the "document" here, which contains a matching element in the array.
The "document" contains all of the array elements, even those that do not match, but the condition says the "document" does match, so it is returned. If you just want the matching "elements" then use .aggregate() instead:
db.infos.aggregate([
// Still match the document
{ "$match": {
"info": {
"$elemMatch": { "0": {"$gte": 1399583285000} }
}
}},
// unwind the array for the matched documents
{ "$unwind": "$info" },
// Match only the elements
{ "$match": { "info.0": { "$gte": 1399583285000 } } },
// Group back to the original form if you want
{ "$group": {
"_id": "$_id",
"info": { "$push": "$info" }
}}
])
And that returns just the elements that matched the condition:
{
"_id" : ObjectId("536c1145e99dc11e65ed07ce"),
"info" : [
[
1399583285000,
20.13
],
[
1399583286000,
20.13
]
]
}
Or course if you only ever expected one element to match, then you could simply use projection with .find()**:
db.infos.find(
{
"info":{
"$elemMatch":{
"0": {
"$gt": 1399583285000
}
}
}
},
{
"info.$": 1
}
)
But with a term like $gt you are likely to get multiple hits within a document so the aggregate approach is going to be safer considering that the positional $ operator is only going to return the first match.

How to concatenate all values and find specific substring in Mongodb?

I have json document like this:
{
"A": [
{
"C": "abc",
"D": "de"
},
{
"C": "fg",
"D": "hi"
}
]
}
I would check whether "A" contains string ef or not.
first Concatenate all values abcdefghi then search for ef
In XML, XPATH it would be something like:
//A[contains(., 'ef')]
Is there any similar query in Mongodb?
All options are pretty horrible for this type of search, but there are a few approaches you can take. Please note though that the end case here is likely the best solution, but I present the options in order to illustrate the problem.
If your keys in the array "A" are consistently defined and always contained an array, you would be searching like this:
db.collection.aggregate([
// Filter the documents containing your parts
{ "$match": {
"$and": [
{ "$or": [
{ "A.C": /e/ },
{ "A.D": /e/ }
]},
{"$or": [
{ "A.C": /f/ },
{ "A.D": /f/ }
]}
]
}},
// Keep the original form and a copy of the array
{ "$project": {
"_id": {
"_id": "$_id",
"A": "$A"
},
"A": 1
}},
// Unwind the array
{ "$unwind": "$A" },
// Join the two fields and push to a single array
{ "$group": {
"_id": "$_id",
"joined": { "$push": {
"$concat": [ "$A.C", "$A.D" ]
}}
}},
// Copy the array
{ "$project": {
"C": "$joined",
"D": "$joined"
}},
// Unwind both arrays
{ "$unwind": "$C" },
{ "$unwind": "$D" },
// Join the copies and test if they are the same
{ "$project": {
"joined": { "$concat": [ "$C", "$D" ] },
"same": { "$eq": [ "$C", "$D" ] },
}},
// Discard the "same" elements and search for the required string
{ "$match": {
"same": false,
"joined": { "$regex": "ef" }
}},
// Project the origial form of the matching documents
{ "$project": {
"_id": "$_id._id",
"A": "$_id.A"
}}
])
So apart from the horrible $regex matching there are a few hoops to go through in order to get the fields "joined" in order to again search for the string in sequence. Also note the reverse joining that is possible here that could possibly produce a false positive. Currently there would be no simple way to avoid that reverse join or otherwise filter it, so there is that to consider.
Another approach is to basically run everything through arbitrary JavaScript. The mapReduce method can be your vehicle for this. Here you can be a bit looser with the types of data that can be contained in "A" and try to tie in some more conditional matching to attempt to reduce the set of documents you are working on:
db.collection.mapReduce(
function () {
var joined = "";
if ( Object.prototype.toString.call( this.A ) === '[object Array]' ) {
this.A.forEach(function(doc) {
for ( var k in doc ) {
joined += doc[k];
}
});
} else {
joined = this.A; // presuming this is just a string
}
var id = this._id;
delete this["_id"];
if ( joined.match(/ef/) )
emit( id, this );
},
function(){}, // will not reduce
{
"query": {
"$or": [
{ "A": /ef/ },
{ "$and": [
{ "$or": [
{ "A.C": /e/ },
{ "A.D": /e/ }
]},
{"$or": [
{ "A.C": /f/ },
{ "A.D": /f/ }
]}
] }
]
},
"out": { "inline": 1 }
}
);
So you can use that with whatever arbitrary logic to search the contained objects. This one just differentiates between "arrays" and presumes otherwise a string, allowing the additional part of the query to just search for the matching "string" element first, and which is a "short circuit" evaluation.
But really at the end of the day, the best approach is to simply have the data present in your document, and you would have to maintain this yourself as you update the document contents:
{
"A": [
{
"C": "abc",
"D": "de"
},
{
"C": "fg",
"D": "hi"
}
],
"search": "abcdefghi"
}
So that is still going to invoke a horrible usage of $regex type queries but at least this avoids ( or rather shifts to writing the document ) the overhead of "joining" the elements in order to effect the search for your desired string.
Where this eventually leads is that a "full blown" text search solution, and that means an external one at this time as opposed to the text search facilities in MongoDB, is probably going to be your best performance option.
Either using the "pre-stored" approach in creating your "joined" field or otherwise where supported ( Solr is one solution that can do this ) have a "computed field" in this text index that is created when indexing document content.
At any rate, those are the approaches and the general point of the problem. This is not XPath searching, not is their some "XPath like" view of an entire collection in this sense, so you are best suited to structuring your data towards the methods that are going to give you the best performance.
With all of that said, your sample here is a fairly contrived example, and if you had an actual use case for something "like" this, then that actual case may make a very interesting question indeed. Actual cases generally have different solutions than the contrived ones. But now you have something to consider.