how to project fields using another field's value in mongo db? - mongodb

I have a mongo document like this:
{"_id": {"$oid":"xx"} ,"start": "a", "elements": {"a":"large object", "b": "large object"}
My expected query result is to project only the start element, in this case, it is {"elements.a:"large object"}. But with the value of "start" unknow before the query, I don't know how to write the query.
2 undesirable alternatives:
One way I could figure is to query start once with _id, and project for start to get "a", and another for elements.a。(
Another way is query all, and get the start element in code. But I don't want to query all at once for the document may be very large)

You can make use of $objectToArray, $arrayToObject and $filter operators.
The below query will be helpful:
db.collection.aggregate([
{
$project: {
elements: {
$arrayToObject: {
$filter: {
input: {
$objectToArray: "$elements"
},
as: "e",
cond: {
$eq: [
"$$e.k",
"$start"
]
}
}
}
}
}
}
])
Output:
[
{
"_id": 1,
"elements": {
"a": "large object"
}
}
]
MongoPlayGroundLink
I hope, this is what you want.

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

Is there a way to give order field to the result of MongoDB aggregation?

Is there any way to give order or rankings to MongoDB aggregation results?
My result is:
{
"score":100
"name": "John"
},
{
"score":80
"name": "Jane"
},
{
"score":60
"name": "Lee"
}
My wanted result is:
{
"score":100
"name": "John",
"rank": 1
},
{
"score":80
"name": "Jane"
"rank": 2
},
{
"score":60
"name": "Lee"
"rank": 3
}
I know there is a operator called $includeArrayIndex but this only works with $unwind operator.
Is there any way to give rank without using $unwind?
Using $unwind requires grouping on my collection, and I'm afraid grouping pipeline would be too huge to process.
The other way is to use $map and add rank in document using its index, and don't use $unwind stage because it would be single field array you can directly access using its key name as mention in last line of code,
$group by null and make array of documents in root array,
$map to iterate loop of root array, get the index of current object from root array using $indexOfArray and increment that returned index number using $add because index start from 0, and that is how we are creating rank field, merge object with current element object and rank field using $mergeObjects
let result = await db.collection.aggregate([
{
$group: {
_id: null,
root: {
$push: "$$ROOT"
}
}
},
{
$project: {
_id: 0,
root: {
$map: {
input: "$root",
in: {
$mergeObjects: [
"$$this",
{
rank: { $add: [{ $indexOfArray: ["$root", "$$this"] }, 1] }
}
]
}
}
}
}
}
]);
// you can access result using root key
let finalResult = result[0]['root'];
Playground

Link each element of array in a document to the corresponding element in an array of another document with MongoDB

Using MongoDB 4.2 and MongoDB Atlas to test aggregation pipelines.
I've got this products collection, containing documents with this schema:
{
"name": "TestProduct",
"relatedList": [
{id:ObjectId("someId")},
{id:ObjectId("anotherId")}
]
}
Then there's this cities collection, containing documents with this schema :
{
"name": "TestCity",
"instructionList": [
{ related_id: ObjectId("anotherId"), foo: bar},
{ related_id: ObjectId("someId"), foo: bar}
{ related_id: ObjectId("notUsefulId"), foo: bar}
...
]
}
My objective is to join both collections to output something like this (the operation is picking each related object from the instructionList in the city document to put it into the relatedList of the product document) :
{
"name": "TestProduct",
"relatedList": [
{ related_id: ObjectId("someId"), foo: bar},
{ related_id: ObjectId("anotherId"), foo: bar},
]
}
I tried using the $lookup operator for aggregation like this :
$lookup:{
from: 'cities',
let: {rId:'$relatedList._id'},
pipeline: [
{
$match: {
$expr: {
$eq: ["$instructionList.related_id", "$$rId"]
}
}
},
]
}
But it's not working, I'm a bit lost with this complex pipeline syntax.
Edit
By using unwind on both arrays :
{
{$unwind: "$relatedList"},
{$lookup:{
from: "cities",
let: { "rId": "$relatedList.id" },
pipeline: [
{$unwind:"$instructionList"},
{$match:{$expr:{$eq:["$instructionList.related_id","$$rId"]}}},
],
as:"instructionList",
}},
{$group: {
_id: "$_id",
instructionList: {$addToSet:"$instructionList"}
}}
}
I am able to achieve what I want, however,
I'm not getting a clean result at all :
{
"name": "TestProduct",
instructionList: [
[
{
"name": "TestCity",
"instructionList": {
"related_id":ObjectId("someId")
}
}
],
[
{
"name": "TestCity",
"instructionList": {
"related_id":ObjectId("anotherId")
}
}
]
]
}
How can I group everything to be as clean as stated for my original question ?
Again, I'm completely lost with the Aggregation framework.
the operation is picking each related object from the instructionList in the city document to put it into the relatedList of the product document)
Given an example document on cities collection:
{"_id": ObjectId("5e4a22a08c54c8e2380b853b"),
"name": "TestCity",
"instructionList": [
{"related_id": "a", "foo": "x"},
{"related_id": "b", "foo": "y"},
{"related_id": "c", "foo": "z"}
]}
and an example document on products collection:
{"_id": ObjectId("5e45cdd8e8d44a31a432a981"),
"name": "TestProduct",
"relatedList": [
{"id": "a"},
{"id": "b"}
]}
You can achieve try using the following aggregation pipeline:
db.products.aggregate([
{"$lookup":{
"from": "cities",
"let": { "rId": "$relatedList.id" },
"pipeline": [
{"$unwind":"$instructionList"},
{"$match":{
"$expr":{
"$in":["$instructionList.related_id", "$$rId"]
}
}
}],
"as":"relatedList",
}},
{"$project":{
"name":"$name",
"relatedList":{
"$map":{
"input":"$relatedList",
"as":"x",
"in":{
"related_id":"$$x.instructionList.related_id",
"foo":"$$x.instructionList.foo"
}
}
}
}}
]);
To get a result as the following:
{ "_id": ObjectId("5e45cdd8e8d44a31a432a981"),
"name": "TestProduct",
"relatedList": [
{"related_id": "a", "foo": "x"},
{"related_id": "b", "foo": "y"}
]}
The above is tested in MongoDB v4.2.x.
But it's not working, I'm a bit lost with this complex pipeline syntax.
The reason why it's slightly complex here is because you have an array relatedList and also an array of subdocuments instructionList. When you refer to instructionList.related_id (which could mean multiple values) with $eq operator, the pipeline doesn't know which one to match.
In the pipeline above, I've added $unwind stage to turn instructionList into multiple single documents. Afterward, using $in to express a match of single value of instructionList.related_id in array relatedList.
I believe you just need to $unwind the arrays in order to lookup the relation, then $group to recollect them. Perhaps something like:
.aggregeate([
{$unwind:"relatedList"},
{$lookup:{
from:"cities",
let:{rId:"$relatedList.id"}
pipeline:[
{$match:{$expr:{$eq:["$instructionList.related_id", "$$rId"]}}},
{$unwind:"$instructionList"},
{$match:{$expr:{$eq:["$instructionList.related_id", "$$rId"]}}},
{$project:{_id:0, instruction:"$instructionList"}}
],
as: "lookedup"
}},
{$addFields: {"relatedList.foo":"$lookedup.0.instruction.foo"}},
{$group: {
_id:"$_id",
root: {$first:"$$ROOT"},
relatedList:{$push:"$relatedList"}
}},
{$addFields:{"root.relatedList":"$relatedList"}},
{$replaceRoot:{newRoot:"$root"}}
])
A little about each stage:
$unwind duplicates the entire document for each element of the array,
replace the array with the single element
$lookup can then consider each element separately. The stages in $lookup.pipeline:
a. $match so we only unwind the document with matching ID
b. $unwind the array so we can consider individual elements
c. repeat the $match so we are only left with matching elements (hopefully just 1)
$addFields assigns the foo field retrieved from the lookup to the object from relatedList
$group collects together all of the documents with the same _id (i.e. that were unwound from a single original document), stores the first as 'root', and pushes all of the relatedList elements back into an array
$addFields moves the relatedList in to root
$replaceRoot returns the root, which should now be the original document with the matching foo added to each relatedList element

Mongo aggregation vs Java for loop and performance

I have a below mongo document stored
{
"Field1": "ABC",
"Field2": [
{ "Field3": "ABC1","Field4": [ {"id": "123" }, { "id" : "234" }, { "id":"345" }] },
{ "Field3": "ABC2","Field4": [ {"id": "123" }, { "id" : "234" }, { "id":"345" }] },
{ "Field3": "ABC3","Field4": [{ "id":"345" }] },
]
}
from the above, I want to fetch the subdocuments which is having id "123"
ie.
{
"Field3" : "ABC1",
"Field4" : [ { "id": "123"} ]
} ,
{
"Field3" : "ABC2",
"Field4" : [ { "id": "123"} ]
}
1. Java way
A. use Mongo find method to get the ABC document from Mongo DB
B. for Loop to Iterate the Field2 Json Array
C. Again for Loop to Iterate over Field4 Json Array
D. Inside the nested for loop I've if condition to Match id value to "123"
E. Store the Matching subdocument into List
2. Mongo Way
A. Use Aggregation query to get the desired output from DB.No Loops and conditions in the Java side.
B. Aggregation Query below stages
I) $Match - match the ABC document
II) $unwind - Field2
III) $unwind - Field4
IV) $match - Match the with id ( value is "123")
V) $group - group the document based on Field3 (based on "ABC1" or "ABC2")
VI) execute aggregation and return results
Both are working good and returning proper results.
Question is which one is the better to follow and why ? I used the aggregation in restful service get method, So executing aggregation queries 1000 or more times in parallel will cause any performance problems?
With Aggregation, the whole query is executed as a single process on the MongoDB server - the application program will get the results cursor from the server.
With Java program also you are getting a cursor from the database server as input to the processing in the application. The response cursor from the server is going to be larger set of data and will use more network bandwidth. And then there is processing in the application program, and this adds more steps to complete the query.
I think the aggregation option is a better choice - as all the processing (the initial match and filtering the array) happens on the database server as a single process.
Also, note the aggregation query steps you had posted can be done in an efficient way. Instead of multiple stages (2, 3, 4 and 5) you can do those operations in a two stages - use a $project with $map on the outer array and then $filter on the inner array and then $filter the outer array.
The aggregation:
db.test.aggregate( [
{
$addFields: {
Field2: {
$map: {
input: "$Field2",
as: "fld2",
in: {
Field3: "$$fld2.Field3",
Field4: {
$filter: {
input: "$$fld2.Field4",
as: "fld4",
cond: { $eq: [ "$$fld4.id", "123" ] }
}
}
}
}
}
}
},
{
$addFields: {
Field2: {
$filter: {
input: "$Field2",
as: "f2",
cond: { $gt: [ { $size: "$$f2.Field4" }, 0 ] }
}
}
}
},
] )
The second way is probably better because it returns a smaller result from the datastore; shlepping bits over the wire is expensive.

How to concatenate all values and find specific substring in Mongodb?

I have json document like this:
{
"A": [
{
"C": "abc",
"D": "de"
},
{
"C": "fg",
"D": "hi"
}
]
}
I would check whether "A" contains string ef or not.
first Concatenate all values abcdefghi then search for ef
In XML, XPATH it would be something like:
//A[contains(., 'ef')]
Is there any similar query in Mongodb?
All options are pretty horrible for this type of search, but there are a few approaches you can take. Please note though that the end case here is likely the best solution, but I present the options in order to illustrate the problem.
If your keys in the array "A" are consistently defined and always contained an array, you would be searching like this:
db.collection.aggregate([
// Filter the documents containing your parts
{ "$match": {
"$and": [
{ "$or": [
{ "A.C": /e/ },
{ "A.D": /e/ }
]},
{"$or": [
{ "A.C": /f/ },
{ "A.D": /f/ }
]}
]
}},
// Keep the original form and a copy of the array
{ "$project": {
"_id": {
"_id": "$_id",
"A": "$A"
},
"A": 1
}},
// Unwind the array
{ "$unwind": "$A" },
// Join the two fields and push to a single array
{ "$group": {
"_id": "$_id",
"joined": { "$push": {
"$concat": [ "$A.C", "$A.D" ]
}}
}},
// Copy the array
{ "$project": {
"C": "$joined",
"D": "$joined"
}},
// Unwind both arrays
{ "$unwind": "$C" },
{ "$unwind": "$D" },
// Join the copies and test if they are the same
{ "$project": {
"joined": { "$concat": [ "$C", "$D" ] },
"same": { "$eq": [ "$C", "$D" ] },
}},
// Discard the "same" elements and search for the required string
{ "$match": {
"same": false,
"joined": { "$regex": "ef" }
}},
// Project the origial form of the matching documents
{ "$project": {
"_id": "$_id._id",
"A": "$_id.A"
}}
])
So apart from the horrible $regex matching there are a few hoops to go through in order to get the fields "joined" in order to again search for the string in sequence. Also note the reverse joining that is possible here that could possibly produce a false positive. Currently there would be no simple way to avoid that reverse join or otherwise filter it, so there is that to consider.
Another approach is to basically run everything through arbitrary JavaScript. The mapReduce method can be your vehicle for this. Here you can be a bit looser with the types of data that can be contained in "A" and try to tie in some more conditional matching to attempt to reduce the set of documents you are working on:
db.collection.mapReduce(
function () {
var joined = "";
if ( Object.prototype.toString.call( this.A ) === '[object Array]' ) {
this.A.forEach(function(doc) {
for ( var k in doc ) {
joined += doc[k];
}
});
} else {
joined = this.A; // presuming this is just a string
}
var id = this._id;
delete this["_id"];
if ( joined.match(/ef/) )
emit( id, this );
},
function(){}, // will not reduce
{
"query": {
"$or": [
{ "A": /ef/ },
{ "$and": [
{ "$or": [
{ "A.C": /e/ },
{ "A.D": /e/ }
]},
{"$or": [
{ "A.C": /f/ },
{ "A.D": /f/ }
]}
] }
]
},
"out": { "inline": 1 }
}
);
So you can use that with whatever arbitrary logic to search the contained objects. This one just differentiates between "arrays" and presumes otherwise a string, allowing the additional part of the query to just search for the matching "string" element first, and which is a "short circuit" evaluation.
But really at the end of the day, the best approach is to simply have the data present in your document, and you would have to maintain this yourself as you update the document contents:
{
"A": [
{
"C": "abc",
"D": "de"
},
{
"C": "fg",
"D": "hi"
}
],
"search": "abcdefghi"
}
So that is still going to invoke a horrible usage of $regex type queries but at least this avoids ( or rather shifts to writing the document ) the overhead of "joining" the elements in order to effect the search for your desired string.
Where this eventually leads is that a "full blown" text search solution, and that means an external one at this time as opposed to the text search facilities in MongoDB, is probably going to be your best performance option.
Either using the "pre-stored" approach in creating your "joined" field or otherwise where supported ( Solr is one solution that can do this ) have a "computed field" in this text index that is created when indexing document content.
At any rate, those are the approaches and the general point of the problem. This is not XPath searching, not is their some "XPath like" view of an entire collection in this sense, so you are best suited to structuring your data towards the methods that are going to give you the best performance.
With all of that said, your sample here is a fairly contrived example, and if you had an actual use case for something "like" this, then that actual case may make a very interesting question indeed. Actual cases generally have different solutions than the contrived ones. But now you have something to consider.