I'm trying to run a $graphLookup like demonstrated in print bellow:
The objective is to, given a specific record (commented $match there), retrieve it's full "path" throught immediateAncestors property. As you can see, it's not happening.
I introduced $convert here to deal with _id from collection as string, believing it could be possible to "match" with _id from immediateAncestors records list (which is a string).
So, I did run another test with different data (no ObjectIds involved):
db.nodos.insert({"id":5,"name":"cinco","children":[{"id":4}]})
db.nodos.insert({"id":4,"name":"quatro","ancestors":[{"id":5}],"children":[{"id":3}]})
db.nodos.insert({"id":6,"name":"seis","children":[{"id":3}]})
db.nodos.insert({"id":1,"name":"um","children":[{"id":2}]})
db.nodos.insert({"id":2,"name":"dois","ancestors":[{"id":1}],"children":[{"id":3}]})
db.nodos.insert({"id":3,"name":"três","ancestors":[{"id":2},{"id":4},{"id":6}]})
db.nodos.insert({"id":7,"name":"sete","children":[{"id":5}]})
And the query:
db.nodos.aggregate( [
{ $match: { "id": 3 } },
{ $graphLookup: {
from: "nodos",
startWith: "$ancestors.id",
connectFromField: "ancestors.id",
connectToField: "id",
as: "ANCESTORS_FROM_BEGINNING"
}
},
{ $project: {
"name": 1,
"id": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING.id"
}
}
] )
...which outputs what I was expecting (the five records directly and indirectly connected to the one with id 3):
{
"_id" : ObjectId("5afe270fb4719112b613f1b4"),
"id" : 3.0,
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
1.0,
4.0,
6.0,
5.0,
2.0
]
}
The question is: there is a way to achieve the objetive I mentioned in the beginning?
I'm running Mongo 3.7.9 (from official Docker)
Thanks in advance!
You are currently using a development version of MongoDB which has some features enabled expected to be released with MongoDB 4.0 as an official release. Note that some features may be subject to change before the final release, so production code should be aware of this before you commit to it.
Why $convert fails here
Probably the best way to explain this is to look at your altered sample but replacing with ObjectId values for _id and "strings" for those under the the arrays:
{
"_id" : ObjectId("5afe5763419503c46544e272"),
"name" : "cinco",
"children" : [ { "_id" : "5afe5763419503c46544e273" } ]
},
{
"_id" : ObjectId("5afe5763419503c46544e273"),
"name" : "quatro",
"ancestors" : [ { "_id" : "5afe5763419503c46544e272" } ],
"children" : [ { "_id" : "5afe5763419503c46544e277" } ]
},
{
"_id" : ObjectId("5afe5763419503c46544e274"),
"name" : "seis",
"children" : [ { "_id" : "5afe5763419503c46544e277" } ]
},
{
"_id" : ObjectId("5afe5763419503c46544e275"),
"name" : "um",
"children" : [ { "_id" : "5afe5763419503c46544e276" } ]
}
{
"_id" : ObjectId("5afe5763419503c46544e276"),
"name" : "dois",
"ancestors" : [ { "_id" : "5afe5763419503c46544e275" } ],
"children" : [ { "_id" : "5afe5763419503c46544e277" } ]
},
{
"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ancestors" : [
{ "_id" : "5afe5763419503c46544e273" },
{ "_id" : "5afe5763419503c46544e274" },
{ "_id" : "5afe5763419503c46544e276" }
]
},
{
"_id" : ObjectId("5afe5764419503c46544e278"),
"name" : "sete",
"children" : [ { "_id" : "5afe5763419503c46544e272" } ]
}
That should give a general simulation of what you were trying to work with.
What you attempted was to convert the _id value into a "string" via $project before entering the $graphLookup stage. The reason this fails is whilst you did an initial $project "within" this pipeline, the problem is that the source for $graphLookup in the "from" option is still the unaltered collection and therefore you don't get the correct details on the subsequent "lookup" iterations.
db.strcoll.aggregate([
{ "$match": { "name": "três" } },
{ "$addFields": {
"_id": { "$toString": "$_id" }
}},
{ "$graphLookup": {
"from": "strcoll",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
}},
{ "$project": {
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"
}}
])
Does not match on the "lookup" therefore:
{
"_id" : "5afe5763419503c46544e277",
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [ ]
}
"Patching" the problem
However that is the core problem and not a failing of $convert or it's aliases itself. In order to make this actually work we can instead create a "view" which presents itself as a collection for the sake of input.
I'll do this the other way around and convert the "strings" to ObjectId via $toObjectId:
db.createView("idview","strcoll",[
{ "$addFields": {
"ancestors": {
"$ifNull": [
{ "$map": {
"input": "$ancestors",
"in": { "_id": { "$toObjectId": "$$this._id" } }
}},
"$$REMOVE"
]
},
"children": {
"$ifNull": [
{ "$map": {
"input": "$children",
"in": { "_id": { "$toObjectId": "$$this._id" } }
}},
"$$REMOVE"
]
}
}}
])
Using the "view" however means that the data is consistently seen with the values converted. So the following aggregation using the view:
db.idview.aggregate([
{ "$match": { "name": "três" } },
{ "$graphLookup": {
"from": "idview",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
}},
{ "$project": {
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"
}}
])
Returns the expected output:
{
"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
ObjectId("5afe5763419503c46544e275"),
ObjectId("5afe5763419503c46544e273"),
ObjectId("5afe5763419503c46544e274"),
ObjectId("5afe5763419503c46544e276"),
ObjectId("5afe5763419503c46544e272")
]
}
Fixing the problem
With all of that said, the real issue here is that you have some data which "looks like" an ObjectId value and is in fact valid as an ObjectId, however it has been recorded as a "string". The basic issue to everything working as it should is that the two "types" are not the same and this results in an equality mismatch as the "joins" are attempted.
So the real fix is still the same as it always has been, which is to instead go through the data and fix it so that the "strings" are actually also ObjectId values. These will then match the _id keys which they are meant to refer to, and you are saving a considerable amount of storage space since an ObjectId takes up a lot less space to store than it's string representation in hexadecimal characters.
Using MongoDB 4.0 methods, you "could" actually use the "$toObjectId" in order to write a new collection, just in much the same matter that we created the "view" earlier:
db.strcoll.aggregate([
{ "$addFields": {
"ancestors": {
"$ifNull": [
{ "$map": {
"input": "$ancestors",
"in": { "_id": { "$toObjectId": "$$this._id" } }
}},
"$$REMOVE"
]
},
"children": {
"$ifNull": [
{ "$map": {
"input": "$children",
"in": { "_id": { "$toObjectId": "$$this._id" } }
}},
"$$REMOVE"
]
}
}}
{ "$out": "fixedcol" }
])
Or of course where you "need" to keep the same collection, then the traditional "loop and update" remains the same as what has always been required:
var updates = [];
db.strcoll.find().forEach(doc => {
var update = { '$set': {} };
if ( doc.hasOwnProperty('children') )
update.$set.children = doc.children.map(e => ({ _id: new ObjectId(e._id) }));
if ( doc.hasOwnProperty('ancestors') )
update.$set.ancestors = doc.ancestors.map(e => ({ _id: new ObjectId(e._id) }));
updates.push({
"updateOne": {
"filter": { "_id": doc._id },
update
}
});
if ( updates.length > 1000 ) {
db.strcoll.bulkWrite(updates);
updates = [];
}
})
if ( updates.length > 0 ) {
db.strcoll.bulkWrite(updates);
updates = [];
}
Which is actually a bit of a "sledgehammer" due to actually overwriting the entire array in a single go. Not a great idea for a production environment, but enough as a demonstration for the purposes of this exercise.
Conclusion
So whilst MongoDB 4.0 will add these "casting" features which can indeed be very useful, their actual intent is not really for cases such as this. They are in fact much more useful as demonstrated in the "conversion" to a new collection using an aggregation pipeline than most other possible uses.
Whilst we "can" create a "view" which transforms the data types to enable things like $lookup and $graphLookup to work where the actual collection data differs, this really is only a "band-aid" on the real problem as the data types really should not differ, and should in fact be permanently converted.
Using a "view" actually means that the aggregation pipeline for construction needs to effectively run every time the "collection" ( actually a "view" ) is accessed, which creates a real overhead.
Avoiding overhead is usually a design goal, therefore correcting such data storage mistakes is imperative to getting real performance out of your application, rather than just working with "brute force" that will only slow things down.
A much safer "conversion" script which applied "matched" updates to each array element. The code here requires NodeJS v10.x and a latest release MongoDB node driver 3.1.x:
const { MongoClient, ObjectID: ObjectId } = require('mongodb');
const EJSON = require('mongodb-extended-json');
const uri = 'mongodb://localhost/';
const log = data => console.log(EJSON.stringify(data, undefined, 2));
(async function() {
try {
const client = await MongoClient.connect(uri);
let db = client.db('test');
let coll = db.collection('strcoll');
let fields = ["ancestors", "children"];
let cursor = coll.find({
$or: fields.map(f => ({ [`${f}._id`]: { "$type": "string" } }))
}).project(fields.reduce((o,f) => ({ ...o, [f]: 1 }),{}));
let batch = [];
for await ( let { _id, ...doc } of cursor ) {
let $set = {};
let arrayFilters = [];
for ( const f of fields ) {
if ( doc.hasOwnProperty(f) ) {
$set = { ...$set,
...doc[f].reduce((o,{ _id },i) =>
({ ...o, [`${f}.$[${f.substr(0,1)}${i}]._id`]: ObjectId(_id) }),
{})
};
arrayFilters = [ ...arrayFilters,
...doc[f].map(({ _id },i) =>
({ [`${f.substr(0,1)}${i}._id`]: _id }))
];
}
}
if (arrayFilters.length > 0)
batch = [ ...batch,
{ updateOne: { filter: { _id }, update: { $set }, arrayFilters } }
];
if ( batch.length > 1000 ) {
let result = await coll.bulkWrite(batch);
batch = [];
}
}
if ( batch.length > 0 ) {
log({ batch });
let result = await coll.bulkWrite(batch);
log({ result });
}
await client.close();
} catch(e) {
console.error(e)
} finally {
process.exit()
}
})()
Produces and executes bulk operations like these for the seven documents:
{
"updateOne": {
"filter": {
"_id": {
"$oid": "5afe5763419503c46544e272"
}
},
"update": {
"$set": {
"children.$[c0]._id": {
"$oid": "5afe5763419503c46544e273"
}
}
},
"arrayFilters": [
{
"c0._id": "5afe5763419503c46544e273"
}
]
}
},
{
"updateOne": {
"filter": {
"_id": {
"$oid": "5afe5763419503c46544e273"
}
},
"update": {
"$set": {
"ancestors.$[a0]._id": {
"$oid": "5afe5763419503c46544e272"
},
"children.$[c0]._id": {
"$oid": "5afe5763419503c46544e277"
}
}
},
"arrayFilters": [
{
"a0._id": "5afe5763419503c46544e272"
},
{
"c0._id": "5afe5763419503c46544e277"
}
]
}
},
{
"updateOne": {
"filter": {
"_id": {
"$oid": "5afe5763419503c46544e274"
}
},
"update": {
"$set": {
"children.$[c0]._id": {
"$oid": "5afe5763419503c46544e277"
}
}
},
"arrayFilters": [
{
"c0._id": "5afe5763419503c46544e277"
}
]
}
},
{
"updateOne": {
"filter": {
"_id": {
"$oid": "5afe5763419503c46544e275"
}
},
"update": {
"$set": {
"children.$[c0]._id": {
"$oid": "5afe5763419503c46544e276"
}
}
},
"arrayFilters": [
{
"c0._id": "5afe5763419503c46544e276"
}
]
}
},
{
"updateOne": {
"filter": {
"_id": {
"$oid": "5afe5763419503c46544e276"
}
},
"update": {
"$set": {
"ancestors.$[a0]._id": {
"$oid": "5afe5763419503c46544e275"
},
"children.$[c0]._id": {
"$oid": "5afe5763419503c46544e277"
}
}
},
"arrayFilters": [
{
"a0._id": "5afe5763419503c46544e275"
},
{
"c0._id": "5afe5763419503c46544e277"
}
]
}
},
{
"updateOne": {
"filter": {
"_id": {
"$oid": "5afe5763419503c46544e277"
}
},
"update": {
"$set": {
"ancestors.$[a0]._id": {
"$oid": "5afe5763419503c46544e273"
},
"ancestors.$[a1]._id": {
"$oid": "5afe5763419503c46544e274"
},
"ancestors.$[a2]._id": {
"$oid": "5afe5763419503c46544e276"
}
}
},
"arrayFilters": [
{
"a0._id": "5afe5763419503c46544e273"
},
{
"a1._id": "5afe5763419503c46544e274"
},
{
"a2._id": "5afe5763419503c46544e276"
}
]
}
},
{
"updateOne": {
"filter": {
"_id": {
"$oid": "5afe5764419503c46544e278"
}
},
"update": {
"$set": {
"children.$[c0]._id": {
"$oid": "5afe5763419503c46544e272"
}
}
},
"arrayFilters": [
{
"c0._id": "5afe5763419503c46544e272"
}
]
}
}
Related
There are 15,000 documents in collection
This is old collection
[
{
"_id" : ObjectId("611f0b9f9964fea718ccea5f"),
"quotationNO" : "Q-000001",
"note": "21-8-2021<->send to DC<->John<#>21-8-2021<->OK<->Bob"
}
{
"_id" : ObjectId("611f2afa9964fea718ccea9c"),
"quotationNO" : "Q-000002",
"note": "22-8-2021<->send to DC<->Bob"
}
]
This is new collection . I want to modify note field from string to object array like this. what is the best solution to do?
[
{
"_id" : ObjectId("611f0b9f9964fea718ccea5f"),
"quotationNO" : "Q-000001",
"note": [
{
"data": "21-8-2021",
"message": "send to DC",
"user": "John"
},
{
"data": "21-8-2021",
"message": "OK",
"user": "Bob"
}
]
}
{
"_id" : ObjectId("611f2afa9964fea718ccea9c"),
"quotationNO" : "Q-000002",
"note": [
{
"data": "22-8-2021",
"message": "send to DC",
"user": "Bob"
}
]
}
]
Chain up $split and $map to split your note string and create the desired object. Finally do a $merge to upsert into new_collection.
db.collection.aggregate([
{
"$addFields": {
"note": {
"$split": [
"$note",
"<#>"
]
}
}
},
{
"$addFields": {
"note": {
"$map": {
"input": "$note",
"as": "n",
"in": {
$split: [
"$$n",
"<->"
]
}
}
}
}
},
{
"$addFields": {
"note": {
"$map": {
"input": "$note",
"as": "n",
"in": {
"data": {
"$arrayElemAt": [
"$$n",
0
]
},
"message": {
"$arrayElemAt": [
"$$n",
1
]
},
"user": {
"$arrayElemAt": [
"$$n",
2
]
}
}
}
}
}
},
{
"$merge": {
"into": "new_collection",
"on": "_id",
"whenMatched": "replace",
"whenNotMatched": "insert"
}
}
])
Here is the Mongo Playground for your reference.
You can try following these steps:
$project required fields and $split note by <#>
Afterwards using JS $function build from obtained arrays new objects by splitting elements by <-> separator and assign function result to new field note;
function(new_note){
let result = [];
for(let i = 0; i < new_note.length; i++){
const nested = new_note[i].split('<->');
result.push( {data:nested[0], message:nested[1],user:nested[2]});
}
return result
}
Afterwards $project required fields
Use MongoDb $merge to save data in new collection.
db.collection.aggregate([
{
$project: {
new_note: {
$split: [
"$note",
"<#>"
]
},
quotationNO: 1
}
},
{
$addFields: {
note: {
$function: {
body: "function(new_note){let result = []; for(let i = 0; i < new_note.length; i++){ const nested = new_note[i].split('<->'); result.push( {data:nested[0], message:nested[1],user:nested[2]}); } return result}",
args: [
"$new_note"
],
lang: "js"
}
}
}
},
{
$project: {
note: 1,
quotationNO: 1
}
},
{
$merge: {
into: "new_collection",
on: "_id",
whenMatched: "replace",
whenNotMatched: "insert"
}
}
])
I have a collection AccountSupport. I has array of supports property. I want to filter the record on parent property and a property of an array
db = {
"AccountSupport": [
{
"_id" : ObjectId("5e9c6170b38c373530c5b00a"),
"accountName" : "domestic",
"supports" : [
{
"subject" : "Traverse",
"desc" : "Travers support consolidation",
},
{
"subject" : "Non Traverse",
"desc" : "Non Travers support consolidation",
},
{
"subject" : "Domestic Traverse",
"desc" : "Domestic Travers support consolidation",
}
],
}
I want to filter on accountName and supports.subject.
Below is my query
db.AccountSupport.aggregate([
{
"$match": {
"$and": [
{
"supports.subject": "Traverse"
},
{
"accountName": "domestic"
}
]
}
},
{
"$unwind": "$supports"
},
{
"$project": {
"SupportName": "$supports.subject",
"desc": "$supports.desc"
}
}
])
The above query returns me all the supports of a particular accountName whereas i only want the single object of the matched subject. Is above the simplest approach to do it?
MongoPlayGround
MongoPlayGround for my query
Took a little bit of experimenting but try this:
db.AccountSupport.aggregate([
{
$match: {
"accountName": "domestic",
"supports.subject": "Traverse"
}
},
{
$project: {
accountName: "$accountName",
subject: {
$filter: {
input: "$supports",
as: "supports",
cond: {
$eq: [
"$$supports.subject",
"Traverse"
]
}
}
},
_id: 0
}
}
])
Which returns:
[
{
"accountName": "domestic",
"subject": [
{
"desc": "Travers support consolidation",
"subject": "Traverse"
}
]
}
]
One option is to just add another $match step to your aggregation query. Since the $unwind breaks your doc into 3 docs, you can then match the individual doc you are interested in returning. The $filter solution by james looks worth investigating and could be the simpler approach for you.
Query:
db.AccountSupport.aggregate([
{
"$match": {
"$and": [
{
"supports.subject": "Traverse"
},
{
"accountName": "domestic"
}
]
}
},
{
"$unwind": "$supports"
},
{
"$match": {
"supports.subject": "Traverse"
}
},
{
"$project": {
"SupportName": "$supports.subject",
"desc": "$supports.desc"
}
}
])
Result:
[
{
"SupportName": "Traverse",
"_id": ObjectId("5e9c6170b38c373530c5b00a"),
"desc": "Travers support consolidation"
}
]
MongoPlayground
Below is the query to get my needed result, not sure if this is the best practice though, as the other answers posted works fine as well for the mentioned requirement
db.AccountSupport.find({'accountName': 'domestic' },
{
'supports':
{
'$elemMatch': { 'subject': 'Traverse'}
}
})
I'm having group of elements in MongoDB as given below:
{
"_id": ObjectId("5942643ea2042e12245de00c"),
"user": NumberInt(1),
"name": {
"value": "roy",
"time": NumberInt(121)
},
"lname": {
"value": "roy s",
"time": NumberInt(122)
},
"sname": {
"value": "roy 9",
"time": NumberInt(123)
}
}
but when I execute the query below
db.temp.find({
$or: [{
'name.time': {
$gte: 123
}
}, {
'lname.time': {
$gte: 123
}
}, {
'sname.time': {
$gte: 123
}
}]
})
it is returning the whole document which is correct.
Is there any way to fetch only specified object in which condition matched.Like in my document let condition within lname.time equl to 122 then only lname object will return rest will ignored.
The type of thing you are asking for is only really "practical" with MongoDB 3.4 in order to return this from the server.
Summary
The general case here is that the "projection" of fields by logical conditions is not straightforward. Whilst it would be nice if MongoDB had such a DSL for projection, this is basically delegated either to:
Do your manipulation "after" the results are returned from the server
Use the aggregation pipeline in order to manipulate the documents.
Therefore, in "CASE B" being "aggregation pipeline", this is really only a practical excercise if the steps involved "mimic" the standard .find() behavior of "query" and "project". Introducing other pipeline stages beyond that will only introduce performance problems greatly outweighing any gain from "trimming" the documents to return.
Thus the summary here is $match then $newRoot to "project", following the pattern. It is also I think a good "rule of thumb" to consider here that the aggregation approach "should only" be applied where there is a significant reduction in the size of data returned. I would expand by example saying that "if" the size of the keys to "trim" was actually in the Megabytes range on the returned result, then it is a worthwhile exercise to remove them "on the server".
In the case where such a saving would really only constitute "bytes" in comparison, then the most logical course is to simply allow the documents to return in the cursor "un-altered", and only then in "post processing" would you bother removing unwanted keys that did not meet the logical condition.
That said, On with the actual methods.
Aggregation Case
db.temp.aggregate([
{ "$match": {
"$or": [
{ "name.time": { "$gte": 123 } },
{ "lname.time": { "$gte": 123 } },
{ "sname.time": { "$gte": 123 } }
]
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$concatArrays": [
[
{ "k": "_id", "v": "$_id" },
{ "k": "user", "v": "$user" },
],
{ "$filter": {
"input": [
{ "$cond": [
{ "$gte": [ "$name.time", 123 ] },
{ "k": "name", "v": "$name" },
false
]},
{ "$cond": [
{ "$gte": [ "$lname.time", 123 ] },
{ "k": "lname", "v": "$lname" },
false
]},
{ "$cond": [
{ "$gte": [ "$sname.time", 123 ] },
{ "k": "sname", "v": "$sname" },
false
]}
],
"as": "el",
"cond": "$$el"
}}
]
}
}
}}
])
It's a pretty fancy statement that relies on $arrayToObject and $replaceRoot to achieve the dynamic structure. At its core the "keys" are all represented in array form, where the "array" only contains those keys that actually pass the conditions.
Fully constructed after the conditions are filtered we turn the array into a document and return the projection to the new Root.
Cursor Processing Case
You can actually do this in the client code with ease though. For example in JavaScript:
db.temp.find({
"$or": [
{ "name.time": { "$gte": 123 } },
{ "lname.time": { "$gte": 123 } },
{ "sname.time": { "$gte": 123 } }
]
}).map(doc => {
if ( doc.name.time < 123 )
delete doc.name;
if ( doc.lname.time < 123 )
delete doc.lname;
if ( doc.sname.time < 123 )
delete doc.sname;
return doc;
})
In both cases you get the same desired result:
{
"_id" : ObjectId("5942643ea2042e12245de00c"),
"user" : 1,
"sname" : {
"value" : "roy 9",
"time" : 123
}
}
Where sname was the only field to meet the condition in the document and therefore the only one returned.
Dynamic Generation and DSL Re-use
Addressing Sergio's question then I suppose you can actually re-use the DSL from the $or condition to generate in both cases:
Considering the variable defined
var orlogic = [
{
"name.time" : {
"$gte" : 123
}
},
{
"lname.time" : {
"$gte" : 123
}
},
{
"sname.time" : {
"$gte" : 123
}
}
];
Then with cursor iteration:
db.temp.find({
"$or": orlogic
}).map(doc => {
orlogic.forEach(cond => {
Object.keys(cond).forEach(k => {
var split = k.split(".");
var op = Object.keys(cond[k])[0];
if ( op === "$gte" && doc[split[0]][split[1]] < cond[k][op] )
delete doc[split[0]];
else if ( op === "$lte" && doc[split[0]][split[1]] > cond[k][op] )
delete doc[split[0]];
})
});
return doc;
})
Which evaluates against the DSL to actually perform the operations without "hardcoded" ( somewhat ) if statements;
Then the aggregation approach would also be:
var pipeline = [
{ "$match": { "$or": orlogic } },
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$concatArrays": [
[
{ "k": "_id", "v": "$_id" },
{ "k": "user", "v": "$user" }
],
{ "$filter": {
"input": orlogic.map(cond => {
var obj = {
"$cond": {
"if": { },
"then": { },
"else": false
}
};
Object.keys(cond).forEach(k => {
var split = k.split(".");
var op = Object.keys(cond[k])[0];
obj.$cond.if[op] = [ `$${k}`, cond[k][op] ];
obj.$cond.then = { "k": split[0], "v": `$${split[0]}` };
});
return obj;
}),
"as": "el",
"cond": "$$el"
}}
]
}
}
}}
];
db.test.aggregate(pipeline);
So the same basic conditions where we re-use existing $or DSL to generate the required pipeline parts as opposed to hard coding them in.
The second argument to find specifies the fields to return (projection)
db.collection.find(query, projection)
https://docs.mongodb.com/manual/reference/method/db.collection.find/
as in example
db.bios.find( { }, { name: 1, contribs: 1 } )
db.temp.find({
"$elemMatch": "$or"[
{
'name.time': {
$gte: 123
}
},
{
'lname.time': {
$gte: 123
}
},
{
'sname.time': {
$gte: 123
}
}
]
},
{
{
"name.time": 1,
"lname.time": 1,
"sname.time": 1
}
}
})
My approach using aggregation pipeline
$project - Project is used to create an key for the documents name, sname and lname
Initial project Query
db.collection.aggregate([{$project: {_id:1, "tempname.name": "$name", "templname.lname":"$lname", "tempsname.sname":"$sname"}}]);
Result of this query is
{"_id":ObjectId("5942643ea2042e12245de00c"),"tempname":{"name":{"value":"roy","time":121}},"templname":{"lname":{"value":"roy s","time":122}},"tempsname":{"sname":{"value":"roy 9","time":123}}}
Use $project one more time to add the documents into an array
db.collection.aggregate([{$project: {_id:1, "tempname.name": "$name", "templname.lname":"$lname", "tempsname.sname":"$sname"}},
{$project: {names: ["$tempname", "$templname", "$tempsname"]}}])
Our document will be like this after the execution of second project
{"_id":ObjectId("5942643ea2042e12245de00c"),"names":[{"name":{"value":"roy","time":121}},{"lname":{"value":"roy s","time":122}},{"sname":{"value":"roy 9","time":123}}]}
Then use $unwind to break the array into separate documents
after breaking the documents use $match with $or to get the desired result
**
Final Query
**
db.collection.aggregate([
{
$project: {
_id: 1,
"tempname.name": "$name",
"templname.lname": "$lname",
"tempsname.sname": "$sname"
}
},
{
$project: {
names: [
"$tempname",
"$templname",
"$tempsname"
]
}
},
{
$unwind: "$names"
},
{
$match: {
$or: [
{
"names.name.time": {
$gte: 123
}
},
{
"names.lname.time": {
$gte: 123
}
},
{
"names.sname.time": {
$gte: 123
}
}
]
}
}
])
Final result of the query closer to your expected result(with an additional key)
{
"_id" : ObjectId("5942643ea2042e12245de00c"),
"names" : {
"sname" : {
"value" : "roy 9",
"time" : 123
}
}
}
This is a long question. If you bother answering, I will be extra grateful.
I have some time series data that I am trying to query to create various charts. The data format isn't the most simple, but I think my aggregation pipeline is getting a bit out of hand. I am planning to use charts.js to visualise the data on the client.
I will post a sample of my data below as well as my pipeline, with the desired output.
My question is in two parts - answering either one could solve the problem.
Does charts.js accept data formats other than an array of numbers per row? This would mean my pipeline could try to do less.
My pipeline doesn't quite get to the result I need. Can you recommend any alterations to get the correct result from my pipeline? Is there is a simpler way to get my desired output format?
Sample data
Here is a real data sample - a brand with one facebook account and one twitter account. There is some data for some dates in June. Lots of null day and month fields have been omitted.
Brand
[{
"_id": "5943f427e7c11ac3ad3652b0",
"name": "Brand1",
"facebookAccounts": [
"5943f427e7c11ac3ad3652ac",
],
"twitterAccounts": [
"5943f427e7c11ac3ad3652aa",
],
}]
FacebookAccounts
[
{
"_id" : "5943f427e7c11ac3ad3652ac"
"name": "Brand 1 Name",
"years": [
{
"date": "2017-01-01T00:00:00.000Z",
"months": [
{
"date": "2017-06-01T00:00:00.000Z",
"days": [
{
"date": "2017-06-16T00:00:00.000Z",
"likes": 904025,
},
{
"date": "2017-06-17T00:00:00.000Z",
"likes": null,
},
{
"date": "2017-06-18T00:00:00.000Z",
"likes": 904345,
},
],
},
],
}
]
}
]
Twitter accounts
[
{
"_id": "5943f427e7c11ac3ad3652aa",
"name": "Brand 1 Name",
"vendorId": "twitterhandle",
"years": [
{
"date": "2017-01-01T00:00:00.000Z",
"months": [
{
"date": "2017-06-01T00:00:00.000Z",
"days": [
{
"date": "2017-06-16T00:00:00.000Z",
"followers": 69390,
},
{
"date": "2017-06-17T00:00:00.000Z",
"followers": 69397,
{
"date": "2017-06-18T00:00:00.000Z",
"followers": 69428,
},
{
"date": "2017-06-19T00:00:00.000Z",
"followers": 69457,
},
]
},
],
}
]
}
]
The query
For this example, I want, for each brand, a daily sum of facebook likes and twitter followers between June 16th and June 18th. So here, the required format is:
{
brand: Brand1,
date: ["2017-06-16T00:00:00.000Z", "2017-06-17T00:00:00.000Z", "2017-06-18T00:00:00.000Z"],
stat: [973415, 69397, 973773]
}
The pipeline
The pipeline seems more convoluted due to the population, but I accept that complexity and it is necessary. Here are the steps:
db.getCollection('brands').aggregate([
{ $match: { _id: { $in: [ObjectId("5943f427e7c11ac3ad3652b0") ] } } },
// Unwind all relevant account types. Make one row per account
{ $project: {
accounts: { $setUnion: [ '$facebookAccounts', '$twitterAccounts' ] } ,
name: '$name'
}
},
{ $unwind: '$accounts' },
// populate the accounts.
// These transform the arrays of facebookAccount ObjectIds into the objects described above.
{ $lookup: { from: 'facebookaccounts', localField: 'accounts', foreignField: '_id', as: 'facebookAccounts' } },
{ $lookup: { from: 'twitteraccounts', localField: 'accounts', foreignField: '_id', as: 'twitterAccounts' } },
// unwind the populated accounts. Back to one record per account.
{ $unwind: { path: '$facebookAccounts', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$twitterAccounts', preserveNullAndEmptyArrays: true } },
// unwind to the granularity we want. Here it is one record per day per account per brand.
{ $unwind: { path: '$facebookAccounts.years', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$facebookAccounts.years.months', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$facebookAccounts.years.months.days', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$facebookAccounts.years.months.days', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$twitterAccounts.years', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$twitterAccounts.years.months', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$twitterAccounts.years.months.days', preserveNullAndEmptyArrays: true } },
{ $unwind: { path: '$twitterAccounts.years.months.days', preserveNullAndEmptyArrays: true } },
// Filter each one between dates
{ $match: { $or: [
{ $and: [
{ 'facebookAccounts.years.months.days.date': { $gte: new Date('2017-06-16') } } ,
{ 'facebookAccounts.years.months.days.date': { $lte: new Date('2017-06-18') } }
]},
{ $and: [
{ 'twitterAccounts.years.months.days.date': { $gte: new Date('2017-06-16') } } ,
{ 'twitterAccounts.years.months.days.date': { $lte: new Date('2017-06-18') } }
]}
] }},
// Build stats and date arrays for each account
{ $group: {
_id: '$accounts',
brandId: { $first: '$_id' },
brandName: { $first: '$name' },
stat: {
$push: {
$sum: {
$add: [
{ $ifNull: ['$facebookAccounts.years.months.days.likes', 0] },
{ $ifNull: ['$twitterAccounts.years.months.days.followers', 0] }
]
}
}
},
date: { $push: { $ifNull: ['$facebookAccounts.years.months.days.date', '$twitterAccounts.years.months.days.date'] } } ,
}}
])
This gives me the output format
[{
_id: accountId, // facebook
brandName: 'Brand1'
date: ["2017-06-16T00:00:00.000Z", "2017-06-17T00:00:00.000Z", "2017-06-18T00:00:00.000Z"],
stat: [904025, null, 904345]
},
{
_id: accountId // twitter
brandName: 'Brand1',
date: ["2017-06-16T00:00:00.000Z", "2017-06-17T00:00:00.000Z", "2017-06-18T00:00:00.000Z"],
stat: [69457, 69390, 69397]
}]
So I now need to perform column-wise addition on my stat properties.And then I am stuck - I feel like there should be a more pipeline friendly way to sum these rather than column-wise addition.
Note I accept the extra work that the population required and am happy with that. Most of the repetition is done programmatically.
Thank you if you've gotten this far.
I can trim a lot of fat out of this and keep it compatible with MongoDB 3.2 ( which you must be using at least due to preserveNullAndEmptyArrays ) available operators with a few simple actions. Mostly by simply joining the arrays immediately after $lookup, which is the best place to do it:
Short Optimize
db.brands.aggregate([
{ "$lookup": {
"from": "facebookaccounts",
"localField": "facebookAccounts",
"foreignField": "_id",
"as": "facebookAccounts"
}},
{ "$lookup": {
"from": "twitteraccounts",
"localField": "twitterAccounts",
"foreignField": "_id",
"as": "twitterAccounts"
}},
{ "$project": {
"name": 1,
"all": {
"$concatArrays": [ "$facebookAccounts", "$twitterAccounts" ]
}
}},
{ "$match": {
"all.years.months.days.date": {
"$gte": new Date("2017-06-16"), "$lte": new Date("2017-06-18")
}
}},
{ "$unwind": "$all" },
{ "$unwind": "$all.years" },
{ "$unwind": "$all.years.months" },
{ "$unwind": "$all.years.months.days" },
{ "$match": {
"all.years.months.days.date": {
"$gte": new Date("2017-06-16"), "$lte": new Date("2017-06-18")
}
}},
{ "$group": {
"_id": {
"brand": "$name",
"date": "$all.years.months.days.date"
},
"total": {
"$sum": {
"$sum": [
{ "$ifNull": [ "$all.years.months.days.likes", 0 ] },
{ "$ifNull": [ "$all.years.months.days.followers", 0 ] }
]
}
}
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.brand",
"date": { "$push": "$_id.date" },
"stat": { "$push": "$total" }
}}
])
This gives the result:
{
"_id" : "Brand1",
"date" : [
ISODate("2017-06-16T00:00:00Z"),
ISODate("2017-06-17T00:00:00Z"),
ISODate("2017-06-18T00:00:00Z")
],
"stat" : [
973415,
69397,
973773
]
}
With MongoDB 3.4 we could probably speed it up a "little" more by filtering the arrays and breaking them down before we eventually $unwind to make this work across documents, or maybe even not worry about going across documents at all if the "name" from "brands" is unique. The pipeline operations to compact down the arrays "in place" though are quite cumbersome to code, if a "little" better on performance.
You seem to be doing this "per brand" or for a small sample, so it's likely of little consequence.
As for the chartjs data format, I don't seem to be able to get my hands on what I believe is a different data format to the array format here, but again this should have little bearing.
The main point I see addressed is we can easily move away from your previous output that separated the "facebook" and "twitter" data, and simply aggregate by date moving all the data together "before" the arrays are constructed.
That last point then obviates the need for further "convoluted" operations to attempt to "merge" those two documents and the arrays produced.
Alternate Optimize
As an alternate approach where this does in fact not aggregate across documents, then you can essentially do the "filter" on the array in place and then simply sum and reshape the received result in client code.
db.brands.aggregate([
{ "$lookup": {
"from": "facebookaccounts",
"localField": "facebookAccounts",
"foreignField": "_id",
"as": "facebookAccounts"
}},
{ "$lookup": {
"from": "twitteraccounts",
"localField": "twitterAccounts",
"foreignField": "_id",
"as": "twitterAccounts"
}},
{ "$project": {
"name": 1,
"all": {
"$map": {
"input": { "$concatArrays": [ "$facebookAccounts", "$twitterAccounts" ] },
"as": "all",
"in": {
"years": {
"$map": {
"input": "$$all.years",
"as": "year",
"in": {
"months": {
"$map": {
"input": "$$year.months",
"as": "month",
"in": {
"days": {
"$filter": {
"input": "$$month.days",
"as": "day",
"cond": {
"$and": [
{ "$gte": [ "$$day.date", new Date("2017-06-16") ] },
{ "$lte": [ "$$day.date", new Date("2017-06-18") ] }
]
}
}
}
}
}
}
}
}
}
}
}
}
}}
]).map(doc => {
doc.all = [].concat.apply([],[].concat.apply([],[].concat.apply([],doc.all.map(d => d.years)).map(d => d.months)).map(d => d.days));
doc.all = doc.all.reduce((a,b) => {
if ( a.findIndex( d => d.date.valueOf() == b.date.valueOf() ) != -1 ) {
a[a.findIndex( d => d.date.valueOf() == b.date.valueOf() )].stat += (b.hasOwnProperty('likes')) ? (b.likes || 0) : (b.followers || 0);
} else {
a = a.concat([{ date: b.date, stat: (b.hasOwnProperty('likes')) ? (b.likes || 0) : (b.followers || 0) }]);
}
return a;
},[]);
doc.date = doc.all.map(d => d.date);
doc.stat = doc.all.map(d => d.stat);
delete doc.all;
return doc;
})
This really leaves all the things that "need" to happen on the server, on the server. And it's then a fairly trivial task to "flatten" the array and process to "sum up" and reshape it. This would mean less load on the server, and the data returned is not really that much greater per document.
Gives the same result of course:
[
{
"_id" : ObjectId("5943f427e7c11ac3ad3652b0"),
"name" : "Brand1",
"date" : [
ISODate("2017-06-16T00:00:00Z"),
ISODate("2017-06-17T00:00:00Z"),
ISODate("2017-06-18T00:00:00Z")
],
"stat" : [
973415,
69397,
973773
]
}
]
Committing to the Diet
The biggest problem you really have is with the multiple collections and the heavily nested documents. Neither of these is doing you any favors here and will with larger results cause real performance problems.
The nesting in particular is completely unnecessary as well as not being very maintainable since there are limitations to "update" where you have nested arrays. See the positional $ operator documentation, as well as many posts about this.
Instead you really want a single collection with all those "days" entries in it. You can always work with that source easily for query as well as aggregation purposes and it should look something like this:
{
"_id" : ObjectId("5948cd5cd6eb0b7d6ac38097"),
"date" : ISODate("2017-06-16T00:00:00Z"),
"likes" : 904025,
"__t" : "Facebook",
"account" : ObjectId("5943f427e7c11ac3ad3652ac")
}
{
"_id" : ObjectId("5948cd5cd6eb0b7d6ac38098"),
"date" : ISODate("2017-06-17T00:00:00Z"),
"likes" : null,
"__t" : "Facebook",
"account" : ObjectId("5943f427e7c11ac3ad3652ac")
}
{
"_id" : ObjectId("5948cd5cd6eb0b7d6ac38099"),
"date" : ISODate("2017-06-18T00:00:00Z"),
"likes" : 904345,
"__t" : "Facebook",
"account" : ObjectId("5943f427e7c11ac3ad3652ac")
}
{
"_id" : ObjectId("5948cd5cd6eb0b7d6ac3809a"),
"date" : ISODate("2017-06-16T00:00:00Z"),
"followers" : 69390,
"__t" : "Twitter",
"account" : ObjectId("5943f427e7c11ac3ad3652aa")
}
{
"_id" : ObjectId("5948cd5cd6eb0b7d6ac3809b"),
"date" : ISODate("2017-06-17T00:00:00Z"),
"followers" : 69397,
"__t" : "Twitter",
"account" : ObjectId("5943f427e7c11ac3ad3652aa")
}
{
"_id" : ObjectId("5948cd5cd6eb0b7d6ac3809c"),
"date" : ISODate("2017-06-18T00:00:00Z"),
"followers" : 69428,
"__t" : "Twitter",
"account" : ObjectId("5943f427e7c11ac3ad3652aa")
}
{
"_id" : ObjectId("5948cd5cd6eb0b7d6ac3809d"),
"date" : ISODate("2017-06-19T00:00:00Z"),
"followers" : 69457,
"__t" : "Twitter",
"account" : ObjectId("5943f427e7c11ac3ad3652aa")
}
Combining those referenced in the brands collection as well:
{
"_id" : ObjectId("5943f427e7c11ac3ad3652b0"),
"name" : "Brand1",
"accounts" : [
ObjectId("5943f427e7c11ac3ad3652ac"),
ObjectId("5943f427e7c11ac3ad3652aa")
]
}
Then you simply aggregate like this:
db.brands.aggregate([
{ "$lookup": {
"from": "social",
"localField": "accounts",
"foreignField": "account",
"as": "accounts"
}},
{ "$unwind": "$accounts" },
{ "$match": {
"accounts.date": {
"$gte": new Date("2017-06-16"), "$lte": new Date("2017-06-18")
}
}},
{ "$group": {
"_id": {
"brand": "$name",
"date": "$accounts.date"
},
"stat": {
"$sum": {
"$sum": [
{ "$ifNull": [ "$accounts.likes", 0 ] },
{ "$ifNull": [ "$accounts.followers", 0 ] }
]
}
}
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.brand",
"date": { "$push": "$_id.date" },
"stat": { "$push": "$stat" }
}}
])
This is actually the most efficient thing you can do, and it's mostly because of what actually happens on the server. We need to look at the "explain" output to see what happens to the pipeline here:
{
"$lookup" : {
"from" : "social",
"as" : "accounts",
"localField" : "accounts",
"foreignField" : "account",
"unwinding" : {
"preserveNullAndEmptyArrays" : false
},
"matching" : {
"$and" : [
{
"date" : {
"$gte" : ISODate("2017-06-16T00:00:00Z")
}
},
{
"date" : {
"$lte" : ISODate("2017-06-18T00:00:00Z")
}
}
]
}
}
}
This is what happens when you send $lookup -> $unwind -> $match to the server as the latter two stages are "hoisted" into the $lookup itself. This reduces the results in the actual "query" run on the collection to be joined.
Without that sequence, then $lookup potentially pulls in "a lot of data" with no constraint, and would break the 16MB BSON limit under most normal loads.
So not only is the process a lot more simple in the altered form, it actually "scales" where the present structure will not. This is something that you seriously should consider.
I have a doc looks like below:
{
"contents": [
{
"translationId": "MENU",
},
{
"translationId": "PAGETITLE"
}
],
"slides": [
{
"translationId": "SLIDE1",
"imageUrl": "assets/img/room/1.jpg",
"desc": {
"translationId": "DESC",
}
},
{
"translationId": "SLIDE2",
"imageUrl": "assets/img/aa/2.jpg"
}
]}
I would like to aggregate against the translationId no matter in which subdocument the data is. My current query is like below which does not give me the expected result.
db.cursor.find({"contents.translationId": { $exists: true }},
{"contents.translationId":1,'slides.translationId':1,"slides.desc.translationId":1,'_id':0})
I expect result like below. Is there a good approach to retrieve such a result directly from mongodb query?
[
{
"translationId": "MENU"
},
{
"translationId": "PAGETITLE"
},
{
"translationId": "SLIDE1"
},
{
"translationId": "SLIDE2"
},
{
"translationId": "DESC"
}
]
Additionally, I might not know in which element translationId might exists. In this case it resides in contents, slides and slides.desc but it might also be under some other elements. Is it possible?
Thanks!
As long as the items are unqiue you can use the $setUnion operator in modern MongoDB releases 2.6 and over, as well as the $map operator for transaltion of just the required element from the other array:
db.cursor.aggregate([
{ "$project": {
"joined": {
"$setDifference": [
{ "$setUnion": [
"$contents",
{ "$map": {
"input": "$slides",
"as": "slide",
"in": {
"translationId": "$$slide.translationId"
}
}},
{ "$map": {
"input": "$slides",
"as": "slide",
"in": {
"$cond": [
{ "$ifNull": [ "$$slide.desc.translationId", false] },
{ "translationId": "$$slide.desc.translationId" },
false
]
}
}}
]},
[false]
]
}
}}
])
You also need $setDifference to filter out any false values returned where the "desc" field is not present.
It produces:
{
"_id" : ObjectId("55f13f444db9bc30de351c84"),
"joined" : [
{
"translationId" : "DESC"
},
{
"translationId" : "SLIDE2"
},
{
"translationId" : "SLIDE1"
},
{
"translationId" : "PAGETITLE"
},
{
"translationId" : "MENU"
}
]
}
Of course if you have no idea of the structure "at all", then you need a recursive function with mapReduce instead:
db.cursor.mapReduce(
function() {
var tags = [];
function walkObj(obj) {
Object.keys(obj).forEach(function(key) {
if ( typeof(obj[key]) == "object" ) {
walkObj(obj[key]);
} else if ( key == "translationId" ) {
tags.push({ "translationId": obj[key] })
}
});
}
walkObj(this);
emit(this._id,{ "joined": tags})
},
function(){},
{ "out": { "inline": 1 } }
)
Which gives basically the same output as before but of course does not need to be aware of the structure