Mongo - Querying inside array - mongodb

I have this db structure
{
"_id": 107,
"standard": {"name": "building",
"item": [{"code": 151001,
"quantity": 10,
"delivered": 8,
"um": "kg" },
{"code": 151001,
"quantity": 20,
"delivered": 6,
"um": "kg" }]
}
}
And i would like to find all the objects that have code:151001 and just show the delivered field.
For example it would show something like this:
{delivered: 8}
{delivered: 6}
So far i got this query, but it does not show exactly what i want:
db.test.find(
{
"standard.item.code": 151001
}
).pretty()

Since your items are in an array, your best approach will be to use the Aggregation Framework for this.
Example code:
db.test.aggregate(
// Find matching documents (could take advantage of an index)
{ $match: {
"standard.item.code" : 151001,
}},
// Unpack the item array into a stream of documents
{ $unwind: "$standard.item" },
// Filter to the items that match the code
{ $match: {
"standard.item.code" : 151001,
}},
// Only show the delivered amounts
{ $project: {
_id: 0,
delivered: "$standard.item.delivered"
}}
)
Results:
{
"result" : [
{
"delivered" : 8
},
{
"delivered" : 6
}
],
"ok" : 1
}
You'll notice there are two $match steps in the aggregation. The first is to match the documents including that item code. After using $unwind on the array, the second $match limits to the items with that code.

Related

How do I get a sum of the occurrence of each item in an array across all documents?

I want to get an aggregation/count of the occurrence of all items in an array across all documents. I've tried looking up examples but none of them seem to cover this scenario exactly or go about it in a very obtuse way.
Here's a simple idea of the document model i'm working with. The itemIds array within each object is always unique (no repeated values):
[{
_id:1,
itemIds:[3, 4, 6, 12]
},
{
_id:2,
itemIds:[4, 12]
},
{
_id:3,
itemIds:[3, 4, 8, 9, 12]
}]
I need the counts of each of these summed up (doesn't have to be this exact format but just giving a general idea of what I need):
{
itemsCount:[
{
itemId:3,
count:2
},
{
itemId:4,
count:3
},
{
itemId:6,
count:1
},
{
itemId:8,
count:1
},
{
itemId:9,
count:1
},
{
itemId:12,
count:3
}
]
}
Please try this :
db.yourCollection.aggregate([
{$project : {'itemIds' : 1, _id :0}},
{$unwind : '$itemIds'},
{$group : {'_id': '$itemIds', count :{$sum :1}}}
])

Move data from inside nested array

I have inserted multiple documents in my Mongo database incorrectly. I have accidentally nested the data inside another data object:
{
"_id": "5cdfda8ddc5cf00031fd3949",
"payload": {
"timestamp": "2019-05-18T10:12:29.896Z",
"data": {
"data": {
"name": 10,
"age": 10,
}
}
},
"__v": 0
}
I would like the document to not have the extra data object. So I would like it to look like this:
{
"_id": "5cdfda8ddc5cf00031fd3949",
"payload": {
"timestamp": "2019-05-18T10:12:29.896Z",
"data": {
"name": 10,
"age": 10,
}
},
"__v": 0
}
Is there a way in Mongo for me to update all the documents that have 2 data objects to just have one like shown above?
Alas, you cannot do this with one database request. You have to loop over all documents programmatically, set the new data and update them in the database.
You could use the aggregation framework, which won't let you update in place, but you could use the $out operator to write the results to a new collection, if that's an option.
db.collection.aggregate([
{
$project: {
__v : 1,
"payload.timestamp" : 1,
"payload.data" : "$payload.data.data"
},
},
{
"$out": "newCollection"
}
])
Or if you have a mixture of docs with correct format and docs with incorrect format, you can use the $cond operator to determine the correct output:
db.collection.aggregate([
{
$project: {
__v : 1,
"payload.timestamp" : 1,
"payload.data" : {
$cond: [
{ $ne : [ "$payload.data.data", undefined]},
"$payload.data.data",
"$payload.data"
]}
}
},
{
"$out": "newCollection"
}
])

MongoDB project the documents with count greater than 2 [duplicate]

This question already has answers here:
Query for documents where array size is greater than 1
(14 answers)
Closed 6 years ago.
I have a collection like
{
"_id": "201503110040020021",
"Line": "1", // several documents may have this Line value
"LineStart": ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND": [{
"Secuence": 10,
"Title": 1,
},
{
"Secuence": 183,
"Title": 613,
},
...
],
} {
"_id": "201503110040020022",
"Line": "1", // several documents may have this Line value
"LineStart": ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND": [{
"Secuence": 10,
"Title": 1,
},
],
}
SSCEXPEND is an array. I am trying to count the size of SSC array and project if the count is greater than or equal to 2. My query is something like this
db.entity.aggregate(
[
{
$project: {
SSCEXPEND_count: {$size: "$SSCEXPEND"}
}
},
{
$match: {
"SSCEXPEND_count2": {$gte: ["$SSCEXPEND_count",2]}
}
}
]
)
I am expecting the output to be only the the first document whose array size is greater than 2.
Project part is working fine and I am able to get the counts but I need to project only those which has count greater than or equal to two but my match part is not working. Can any one guide me as where am I going wrong?
You need to project the other fields and your $match pipeline will just need to do a query on the newly-created field to filter the documents based on the array size. Something like the following should work:
db.entity.aggregate([
{
"$project": {
"Line": 1,
"LineStart": 1, "SSCEXPEND": 1,
"SSCEXPEND_count": { "$size": "$SSCEXPEND" }
}
},
{
"$match": {
"SSCEXPEND_count": { "$gte": 2 }
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : "201503110040020021",
"Line" : "1",
"LineStart" : ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND" : [
{
"Secuence" : 10,
"Title" : 1
},
{
"Secuence" : 183,
"Title" : 613
}
],
"SSCEXPEND_count" : 2
}
],
"ok" : 1
}
This is actually a very simple query, where the trick is to use a property of "dot notation" in order to test the array. All you really need to ask for is documents where the array index of 2 $exists, which means the array must contain 3 elements or more:
db.entity.find({ "SSCEXPEND.2": { "$exists": true } })
It's the fastest way to do it and can even use indexes. No need for calculations in aggregation operations.

MongoDB lists - get every Nth item

I have a Mongodb schema that looks roughly like:
[
{
"name" : "name1",
"instances" : [
{
"value" : 1,
"date" : ISODate("2015-03-04T00:00:00.000Z")
},
{
"value" : 2,
"date" : ISODate("2015-04-01T00:00:00.000Z")
},
{
"value" : 2.5,
"date" : ISODate("2015-03-05T00:00:00.000Z")
},
...
]
},
{
"name" : "name2",
"instances" : [
...
]
}
]
where the number of instances for each element can be quite big.
I sometimes want to get only a sample of the data, that is, get every 3rd instance, or every 10th instance... you get the picture.
I can achieve this goal by getting all instances and filtering them in my server code, but I was wondering if there's a way to do it by using some aggregation query.
Any ideas?
Updated
Assuming the data structure was flat as #SylvainLeroux suggested below, that is:
[
{"name": "name1", "value": 1, "date": ISODate("2015-03-04T00:00:00.000Z")},
{"name": "name2", "value": 5, "date": ISODate("2015-04-04T00:00:00.000Z")},
{"name": "name1", "value": 2, "date": ISODate("2015-04-01T00:00:00.000Z")},
{"name": "name1", "value": 2.5, "date": ISODate("2015-03-05T00:00:00.000Z")},
...
]
will the task of getting every Nth item (of specific name) be easier?
It seems that your question clearly asked "get every nth instance" which does seem like a pretty clear question.
Query operations like .find() can really only return the document "as is" with the exception of general field "selection" in projection and operators such as the positional $ match operator or $elemMatch that allow a singular matched array element.
Of course there is $slice, but that just allows a "range selection" on the array, so again does not apply.
The "only" things that can modify a result on the server are .aggregate() and .mapReduce(). The former does not "play very well" with "slicing" arrays in any way, at least not by "n" elements. However since the "function()" arguments of mapReduce are JavaScript based logic, then you have a little more room to play with.
For analytical processes, and for analytical purposes "only" then just filter the array contents via mapReduce using .filter():
db.collection.mapReduce(
function() {
var id = this._id;
delete this._id;
// filter the content of "instances" to every 3rd item only
this.instances = this.instances.filter(function(el,idx) {
return ((idx+1) % 3) == 0;
});
emit(id,this);
},
function() {},
{ "out": { "inline": 1 } } // or output to collection as required
)
It's really just a "JavaScript runner" at this point, but if this is just for anaylsis/testing then there is nothing generally wrong with the concept. Of course the output is not "exactly" how your document is structured, but it's as near a facsimile as mapReduce can get.
The other suggestion I see here requires creating a new collection with all the items "denormalized" and inserting the "index" from the array as part of the unqique _id key. That may produce something you can query directly, bu for the "every nth item" you would still have to do:
db.resultCollection.find({
"_id.index": { "$in": [2,5,8,11,14] } // and so on ....
})
So work out and provide the index value of "every nth item" in order to get "every nth item". So that doesn't really seem to solve the problem that was asked.
If the output form seemed more desirable for your "testing" purposes, then a better subsequent query on those results would be using the aggregation pipeline, with $redact
db.newCollection([
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$mod": [ { "$add": [ "$_id.index", 1] }, 3 ] },
0 ]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
That at least uses a "logical condition" much the same as what was applied with .filter() before to just select the "nth index" items without listing all possible index values as a query argument.
No $unwind is needed here. You can use $push with $arrayElemAt to project the array value at requested index inside $group aggregation.
Something like
db.colname.aggregate(
[
{"$group":{
"_id":null,
"valuesatNthindex":{"$push":{"$arrayElemAt":["$instances",N]}
}}
},
{"$project":{"valuesatNthindex":1}}
])
You might like this approach using the $lookup aggregation. And probably the most convenient and fastest way without any aggregation trick.
Create a collection Names with the following schema
[
{ "_id": 1, "name": "name1" },
{ "_id": 2, "name": "name2" }
]
and then Instances collection having the parent id as "nameId"
[
{ "nameId": 1, "value" : 1, "date" : ISODate("2015-03-04T00:00:00.000Z") },
{ "nameId": 1, "value" : 2, "date" : ISODate("2015-04-01T00:00:00.000Z") },
{ "nameId": 1, "value" : 3, "date" : ISODate("2015-03-05T00:00:00.000Z") },
{ "nameId": 2, "value" : 7, "date" : ISODate("2015-03-04T00:00:00.000Z") },
{ "nameId": 2, "value" : 8, "date" : ISODate("2015-04-01T00:00:00.000Z") },
{ "nameId": 2, "value" : 4, "date" : ISODate("2015-03-05T00:00:00.000Z") }
]
Now with $lookup aggregation 3.6 syntax you can use $sample inside the $lookup pipeline to get the every Nth element randomly.
db.Names.aggregate([
{ "$lookup": {
"from": Instances.collection.name,
"let": { "nameId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$nameId", "$$nameId"] }}},
{ "$sample": { "size": N }}
],
"as": "instances"
}}
])
You can test it here
Unfortunately, with the aggregation framework it's not possible as this would require an option with $unwind to emit an array index/position, of which currently aggregation can't handle. There is an open JIRA ticket for this here SERVER-4588.
However, a workaround would be to use MapReduce but this comes at a huge performance cost since the actual calculations of getting the array index are performed using the embedded JavaScript engine (which is slow), and there still is a single global JavaScript lock, which only allows a single JavaScript thread to run at a single time.
With mapReduce, you could try something like this:
Mapping function:
var map = function(){
for(var i=0; i < this.instances.length; i++){
emit(
{ "_id": this._id, "index": i },
{ "index": i, "value": this.instances[i] }
);
}
};
Reduce function:
var reduce = function(){}
You can then run the following mapReduce function on your collection:
db.collection.mapReduce( map, reduce, { out : "resultCollection" } );
And then you can query the result collection to geta list/array of every Nth item of the instance array by using the map() cursor method :
var thirdInstances = db.resultCollection.find({"_id.index": N})
.map(function(doc){return doc.value.value})
You can use below aggregation:
db.col.aggregate([
{
$project: {
instances: {
$map: {
input: { $range: [ 0, { $size: "$instances" }, N ] },
as: "index",
in: { $arrayElemAt: [ "$instances", "$$index" ] }
}
}
}
}
])
$range generates a list of indexes. Third parameter represents non-zero step. For N = 2 it will be [0,2,4,6...], for N = 3 it will return [0,3,6,9...] and so on. Then you can use $map to get correspinding items from instances array.
Or with just a find block:
db.Collection.find({}).then(function(data) {
var ret = [];
for (var i = 0, len = data.length; i < len; i++) {
if (i % 3 === 0 ) {
ret.push(data[i]);
}
}
return ret;
});
Returns a promise whose then() you can invoke to fetch the Nth modulo'ed data.

MongoDB : limit query to a field and array projection

I have a collection that contains following information
{
"_id" : 1,
"info" : { "createdby" : "xyz" },
"states" : [ 11, 10, 9, 3, 2, 1 ]}
}
I project only states by using query
db.jobs.find({},{states:1})
Then I get only states (and whole array of state values) ! or I can select only one state in that array by
db.jobs.find({},{states : {$slice : 1} })
And then I get only one state value, but along with all other fields in the document as well.
Is there a way to select only "states" field, and at the same time slice only one element of the array. Of course, I can exclude fields but I would like to have a solution in which I can specify both conditions.
You can do this in two ways:
1> Using mongo projection like
<field>: <1 or true> Specify the inclusion of a field
and
<field>: <0 or false> Specify the suppression of the field
so your query as
db.jobs.find({},{states : {$slice : 1} ,"info":0,"_id":0})
2> Other way using mongo aggregation as
db.jobs.aggregate({
"$unwind": "$states"
}, {
"$match": {
"states": 11
}
}, // match states (optional)
{
"$group": {
"_id": "$_id",
"states": {
"$first": "$states"
}
}
}, {
"$project": {
"_id": 0,
"states": 1
}
})