I have a mongoDB database, where some entries have a field of a name "rx" + value of another field in this entry (let's call it rxid), so the final name of this field could look like "rx1234" or "rx2836". Inside this field there is a list of dictionaries which MAY contain fields with names "A" and "B". My task is to find how many entries in this database have at least one non-empty field named A or B, no matter where it is nested. I tried to search for nested queries, however it always requires to specify the "parent" field, which in my case is a combination of rx and a value of other field, so it is not a constant name.
My database schema look something like:
{
"_id" : 1,
"rxid" 1234,
"rx1234" : [
{
"A" : "somevalue",
"B" : "someothervalue",
},
{
"A" : "somevalue2",
}
]
},
{
"_id" : 2,
"rxid" 2345
}
In this case I expect to count how many objects are structured like _id = 1 and how many like _id = 2 (there may be other fields inside but they are irrelevant)
Refer this
It checks for rxCustomNumber field and then checks for A/B inside that field. You can simply add a count pipeline to get the counts.
db.collection.aggregate([
{
"$project": {
"concatValue": {//Form the dynamic field dynamically
"$concat": [
"rx",
{
"$toString": "$rxid"
}
]
},
"data": {
"$objectToArray": "$$ROOT"
}
}
},
{//Restructure the field
$unwind: "$data"
},
{
$match: {
$and: [//Check for the match
{
$expr: {
"$eq": [
"$concatValue",
"$data.k"
]
}
}
]
}
},
{//Print the existence
"$project": {
"data.v.A": 1,
"data.v.B": 1
}
}
])
Related
I have many documents, but I want to figure out how to get only documents that have ALL FIELDS non null.
Suppose I have these documents:
[
{
'a': 1,
'b': 2,
'c': 3
},
{
'a': 9,
'b': 12
},
{
'a': 5
}
]
So filtering the documents, only the first have ALL FIELDS not null. So filtering out these documents, I would get only the first. How can I do this?
So when you wanted to get only the documents which have ALL FIELDS, without specifying all of them in filter query like this : { a: {$exists : true}, b : {$exists : true}, c : {$exists : true}} then it might not be a good idea, in other way technically if you've 10s of fields in the document then it wouldn't either be a good idea to mention all of them in the query. Anyhow as you don't want to list them all - We can try this hack if it performs well, Let's say if you've a fixed schema & say that all of your documents may contain only fields a, b & c (_id is default & exceptional) but nothing apart from those try this :
If you can get count of total fields, We can check for field count which says all fields do exists, Something like below :
db.collection.aggregate([
/** add a new field which counts no.of fields in the document */
{
$addFields: { count: { $size: { $objectToArray: "$$ROOT" } } }
},
{
$match: { count: { $eq: 4 } } // we've 4 as 3 fields + _id
},
{
$project: { count: 0 }
}
])
Test : mongoplayground
Note : We're only checking for field existence but not checking for false values like null or [] or '' on fields. Also this might not work for nested fields.
Just in case if you wanted to check all fields exist in the document with their names, So if you can pass all fields names as input, then try below query :
db.collection.aggregate([
/** create a field with all keys/field names in the document */
{
$addFields: {
data: {
$let: {
vars: { data: { $objectToArray: "$$ROOT" } },
in: "$$data.k"
}
}
}
},
{
$match: { data: { $all: [ "b", "c", "a" ] } } /** List down all the field names from schema */
},
{
$project: { data: 0 }
}
])
Test : mongoplayground
Ref : aggregation-pipeline
You can try to use explain to check your queries performance.
Match documents if a value in an array of sub-documents is greater than some value only if the same document contains a field that is equal to some value
I have a collection that contains documents with an array of sub-documents. This array of sub-documents contains a field that dictates whether or not I can filter the documents in the collection based on another field in the sub-document. This'll make more sense when you see an example of the document.
{
"_id":"ObjectId('XXX')",
"Data":{
"A":"",
"B":"-25.78562 ; 28.35629",
"C":"165"
},
"SubDocuments":[
{
"_id":"ObjectId('XXX')",
"Data":{
"Value":"XXX",
"DataFieldId":"B"
}
},
{
"_id":"ObjectId('XXX')",
"Data":{
"Value":"",
"DataFieldId":"A"
}
},
{
"_id":"ObjectId('XXX')",
"Data":{
"Value":"105",
"DataFieldId":"Z"
}
}
]
}
I only want to match documents that contain sub-documents with a DataFieldId that is equal to Z but also filter for Values that are greater than 105 only if Data Field Id is equal to Z.
Try as below:
db.collection.aggregate([
{
$project: {
_id:1,
Data:1,
filteredSubDocuments: {
$filter: {
input: "$SubDocuments",
as: "subDoc",
cond: {
$and: [
{ $eq: ["$$subDoc.Data.DataFieldId", "Z"] },
{ $gte: ["$$subDoc.Data.Value", 105] }
]
}
}
}
}
}
])
Resulted response will be:
{
"_id" : ObjectId("5cb09659952e3a179190d998"),
"Data" : {
"A" : "",
"B" : "-25.78562 ; 28.35629",
"C" : "165"
},
"filteredSubDocuments" : [
{
"_id" : "ObjectId('XXX')",
"Data" : {
"Value" : 105,
"DataFieldId" : "Z"
}
}
]
}
This can be done by using the $elemMatch operator on sub-documents, for details you can click on provided link. For your problem you can try below query by using $elemMatch which is match simpler than aggregation:
db.collectionName.find({
"SubDocuments": {
$elemMatch: {
"Data.DataFieldId": "Z" ,
"Data.Value" : {$gte: 105}
}
} })
Its working fine, I have verified it locally, one modification you required is that you have to put the value of SubDocuments.Data.Value as Number or Long as per your requirements.
I have a mongoDB orders collection, the documents of which look as follows:
[{
"_id" : ObjectId("59537df80ab10c0001ba8767"),
"shipments" : {
"products" : [
{
"orderDetails" : {
"id" : ObjectId("59537df80ab10c0001ba8767")
}
},
{
"orderDetails" : {
"id" : ObjectId("59537df80ab10c0001ba8767")
}
}
]
},
}
{
"_id" : ObjectId("5953831367ae0c0001bc87e1"),
"shipments" : {
"products" : [
{
"orderDetails" : {
"id" : ObjectId("5953831367ae0c0001bc87e1")
}
}
]
},
}]
Now, from this collection, I want to filter out the elements in which, any of the values at shipments.products.orderDetails.id path is same as value at _id path.
I tried:
db.orders.aggregate([{
"$addFields": {
"same": {
"$eq": ["$shipments.products.orderDetails.id", "$_id"]
}
}
}])
to add a field same as a flag to decide whether the values are equal, but the value of same comes as false for all documents.
EDIT
What I want to do is compare the _id field the the documents with all shipments.products.orderDetails.id values in the array.
If even 1 of the shipments.products.orderDetails.ids match the value of the _id field, I want that document to be present in the final result.
PS I am using MongoDB 3.4, and have to use the aggregation pipeline.
Your current attempt fails because the notation returns an "array" in comparison with a "single value".
So instead either use $in where available, which can compare to see if one value is "in" an array:
db.orders.aggregate([
{ "$addFields": {
"same": {
"$in": [ "$_id", "$shipments.products.orderDetails.id" ]
}
}}
])
Or notate both as arrays using $setIsSubset
db.orders.aggregate([
{ "$addFields": {
"same": {
"$setIsSubset": [ "$shipments.products.orderDetails.id", ["$_id"] ]
}
}}
])
Where in that case it's doing a comparison to see if the "sets" have an "intersection" that makes _id the "subset" of the array of values.
Either case will return true when "any" of the id properties within the array entries at the specified path are a match for the _id property of the document.
I have a document that's setup like this:
{
_id : ObjectId(),
info : [
[
1399583281000,
20.13
],
[
1399583282000,
20.13
],
[
1399583283000,
20.13
],
[
1399583285000,
20.13
],
[
1399583286000,
20.13
]
]
}
This data could be spread across multiple documents. In general, each document contains data in the info for 59 periods (seconds).
What I would like to do is get all of the info data where the timestamp is greater than a specific time.
Any ideas how I would go about doing this?
Thank you
EDIT:
So, I've found that this seems to return all of the documents:
db.infos.find({
info:{
$elemMatch:{
0:{
$gt:1399583306000
}
}
}
})
But maybe I need to use this in an aggregate query? so that it will return just all the values?
Your on the right track, but there are a few things to note here, aside from the part that nested arrays ( and especially with anonymous keys) are not exactly a great way to store things, but as long as you consistently know the position then that should be reasonably okay.
There is a distinct difference between matching documents and matching "elements of an array". Though your current value would actually not match (your search value is not within the bounds of the document), if the value actually was valid your query correctly matches the "document" here, which contains a matching element in the array.
The "document" contains all of the array elements, even those that do not match, but the condition says the "document" does match, so it is returned. If you just want the matching "elements" then use .aggregate() instead:
db.infos.aggregate([
// Still match the document
{ "$match": {
"info": {
"$elemMatch": { "0": {"$gte": 1399583285000} }
}
}},
// unwind the array for the matched documents
{ "$unwind": "$info" },
// Match only the elements
{ "$match": { "info.0": { "$gte": 1399583285000 } } },
// Group back to the original form if you want
{ "$group": {
"_id": "$_id",
"info": { "$push": "$info" }
}}
])
And that returns just the elements that matched the condition:
{
"_id" : ObjectId("536c1145e99dc11e65ed07ce"),
"info" : [
[
1399583285000,
20.13
],
[
1399583286000,
20.13
]
]
}
Or course if you only ever expected one element to match, then you could simply use projection with .find()**:
db.infos.find(
{
"info":{
"$elemMatch":{
"0": {
"$gt": 1399583285000
}
}
}
},
{
"info.$": 1
}
)
But with a term like $gt you are likely to get multiple hits within a document so the aggregate approach is going to be safer considering that the positional $ operator is only going to return the first match.
I have a mongodb collection, let's call it rows containing documents with the following general structure:
{
"setid" : 154421,
"date" : ISODate("2014-02-22T14:06:48.229Z"),
"version" : 2,
"data" : [
{
"k" : "name",
"v" : "ryan"
},
{
"k" : "points",
"v" : "375"
},
{
"k" : "email",
"v" : "ryan#123.com"
}
],
}
There is no guarantee what values of k and v might populate the "data" property for any particular document (eg. other documents might have 5 k-v pairs with different key names in it). The only rule is that documents with the same setid have the same k-v pairs. (i.e. the rows collection might hold 100 other documents with setid = 154421, that have the same set of 3 keys in the data property: "name", "points", "email", with their own respective values.
How would one, with this setup, construct a query to retrieve all rows with a particular setid sorted by points? I need, in effect, some way of saying 'sort by the the field data.v where the value of k==points or something like that...?
Something like this:
db.rows.find({setid:154421},{$sort:{'data.v',-1}, {$where: k:'points'}}})
I know this is the incorrect syntax, but I'm just taking a stab at it to illustrate my point.
Is it possible?
Assuming that what you want would be all the documents that have the "points" value as a "key" in the array, and then sort on the "value" for that "key", then this is a little out of scope for the .find() method.
Reason being if you did something like this
db.collection.find({
"setid": 154421, "data.k": "point" }
).sort({ "data.v" : -1 })
The problem is that even though the matched elements do have the matching key of "point", there is no way of telling which data.v you are referring to for the sort. Also, a sort within .find() results will not do something like this:
db.collection.find({
"setid": 154421, "data.k": "point" }
).sort({ "data.$.v" : -1 })
Which would be trying to use a positional operator within a sort, essentially telling which element to use the value of v on. But this is not supported and not likely to be, and for the most likely explaination, that "index" value would be likely different in every document.
But this kind of selective sorting can be done with the use of .aggregate().
db.collection.aggregate([
// Actually shouldn't need the setid
{ "$match": { "data": {"$elemMatch": { "k": "points" } } } },
// Saving the original document before you filter
{ "$project": {
"doc": {
"_id": "$_id",
"setid": "$setid",
"date": "$date",
"version": "$version",
"data": "$data"
},
"data": "$data"
}}
// Unwind the array
{ "$unwind": "$data" },
// Match the "points" entries, so filtering to only these
{ "$match": { "data.k": "points" } },
// Sort on the value, presuming you want the highest
{ "$sort": { "data.v": -1 } },
// Restore the document
{ "$project": {
"setid": "$doc.setid",
"date": "$doc.date",
"version": "$doc.version",
"data": "$doc.data"
}}
])
Of course that presumes the data array only has the one element that has the key points. If there were more than one, you would need to $group before the sort like this:
// Group to remove the duplicates and get highest
{ "$group": {
"_id": "$doc",
"value": { "$max": "$data.v" }
}},
// Sort on the value
{ "$sort": { "value": -1 } },
// Restore the document
{ "$project": {
"_id": "$_id._id",
"setid": "$_id.setid",
"date": "$_id.date",
"version": "$_id.version",
"data": "$_id.data"
}}
So there is one usage of .aggregate() in order to do some complex sorting on documents and still return the original document result in full.
Do some more reading on aggregation operators and the general framework. It's a useful tool to learn that takes you beyond .find().