Querying the N firsts and M last documents in Mongo - mongodb

Everything is in the question : I don't find an efficient way to return the first and the two last documents of my query...
I pass in parameters and array containing the requireed _id, that I select with { parentPost: { $in: myArray } }, but I don't figure out how to get both the two lasts and the first document in the same query...
The goal would be to return a collection of documents containing all the wanted posts of each thread (identified by parentPost) in a single query I can return in one subscription.
Could someone lighten my path ?
Thank you
David

I assume each post in an separate doc, and there are other fields in that upon which we can filter, like name.
Let's start with a 2 query example just to "get it done." Assuming something in docs is sortable like post number or a datetime field, then this clearly works.
c = db.foo.find({"name": "matt"}).sort({"post":1}).limit(N);
c.forEach(function(r) { printjson(r); });
c = db.foo.find({"name": "matt"}).sort({"post":-1}).limit(M);
c.forEach(function(r) { printjson(r); });
Of course, if you don't have a sortable thing, just go based on natural on-disk ordering:
c = db.foo.find({"name": "matt"}).sort({"$natural":1}).limit(N);
c.forEach(function(r) { printjson(r); });
c = db.foo.find({"name": "matt"}).sort({"$natural":-1}).limit(M);
c.forEach(function(r) { printjson(r); });
Here is a variation. It is likely faster because it avoids sorting twice only to restrict the output to just a few things:
n = db.foo.find({"name": "matt"}).count();
c = db.foo.find({
"name": "matt",
"$or": [
{"post": {$lte: N}}
,{"post": {$gte: n-M}}
]
});

Related

MongoDB : Match with element in an array

I am working on a collection called Publications. Each publication has an array of objectives which are ids. I have also a custom array of objectives hand written. Now, I want to select all the publications that contains at least one element of the custom objectives array in their objectives. How can I do that ?
I've been trying to make this works with '$setIntersection' then '$count' and verify that the count is greater than 0 but I don't know how to implement this.
Example :
publication_1: {
'_id': ObjectId("sdfsdf46543")
'objectives': [ObjectId("1654351456341"), ObjectId("123456789")]
}
publication_2: {
'_id': ObjectId("sdfs216546543")
'objectives': [ObjectId("1654351456341"), ObjectId("46531132")]
}
custom_array = [ObjectId("123456789"), ObjectId("2416315463")]
The mongo query should return publication_1.
You can do like the following:
db.publications.find({
"objectives": {
"$in": [
ObjectId("123456789"),
ObjectId("2416315463")
]
}
})
Notice: "123456789" is not a valid ObjectId so the query itself may not work. Here is the working example
Mongodb playground link: https://mongoplayground.net/p/MbZK99Pd5YR
objectives is an array of objects, I guess you can just query that field directly:
let custom_array = [ObjectId("123456789"), ObjectId("2416315463")];
// You can search the array with $in property.
let result = await Model.find({ objectives: {$in : custom_array} })

Mongo Shell Querying one collection with the results of another

I have 2 different collections and I am trying to query the first collection and take the output of that as an input to the second collection.
var mycursor = db.item.find({"itemId":NumberLong(123)},{"_id":0})
var outp = "";
while(mycursor.hasNext()){
var rec = mycursor.next()
outp = outp + rec.eventId;
}
This query works fine and returns me a list of eventIds.
I have another collection named users, which has eventId field in it. A eventId can repeat in multiple users. So for each eventId I get in the above query I want to get the list of users too.
My query for the second collection would be something like this :
db.users.find({"eventId":ObjectdId("each eventId from above query")},{"_id":0})
My final result would be a unique list of users.
Whell this should basically work ( to a point that is ):
db.events.find({
"eventId": { "$in": db.item.find({
"itemId":NumberLong(123)
}).map(function(doc) { return doc.eventId }) }
})
Or even a bit better:
db.events.find({
"eventId": { "$in": db.item.distinct("eventId",{
"itemId":NumberLong(123) }) }
})
The reason is that "inner query" is evaluated before the outer query is sent. So you get an array of arguments for use with $in.
That is basically the same as doing the following, which translates better outside of the shell:
var events = db.item.distinct("eventId",{ "itemId":NumberLong(123) });
db.events.find({ "eventId": { "$in": events } })
If the results are "too large" however, then your best approach is to loop the initial results as you have done already and build up an array of arguments. Once at a certain size then do the same $in query several times in "pages" to get the results.
But looking for "ditinct" eventId via .distinct() or .aggregate() will help.

Publish all fields in document but just part of an array in the document

I have a mongo collection in which the documents have a field that is an array. I want to be able to publish everything in the documents except for the elements in the array that were created more than a day ago. I suspect the answer will be somewhat similar to this question.
Meteor publication: Hiding certain fields in an array document field?
Instead of limiting fields in the array, I just want to limit the elements in the array being published.
Thanks in advance for any responses!
EDIT
Here is an example document:
{
_id: 123456,
name: "Unit 1",
createdAt: (datetime object),
settings: *some stuff*,
packets: [
{
_id: 32412312,
temperature: 70,
createdAt: *datetime object from today*
},
{
_id: 32412312,
temperature: 70,
createdAt: *datetime from yesterday*
}
]
}
I want to get everything in this document except for the part of the array that was created more than 24 hours ago. I know I can accomplish this by moving the packets into their own collection and tying them together with keys as in a relational database but if what I am asking were possible, this would be simpler with less code.
You could do something like this in your publish method:
Meteor.publish("pubName", function() {
var collection = Collection.find().fetch(); //change this to return your data
_.each(collection, function(collectionItem) {
_.each(collectionItem.packets, function(packet, index) {
var deadline = Date.now() - 86400000 //should equal 24 hrs ago
if (packet.createdAt < deadline) {
collectionItem.packets.splice(index, 1);
}
}
}
return collection;
}
Though you might be better off storing the last 24 hours worth of packets as a separate array in your document. Would probably be less taxing on the server, not sure.
Also, code above is untested. Good luck.
you can use the $elemMatch projection
http://docs.mongodb.org/manual/reference/operator/projection/elemMatch/
So in your case, it would be
var today = new Date();
var yesterday = new Date(today);
yesterday.setDate(today.getDate() - 1);
collection.find({}, //find anything or specifc
{
fields: {
'packets': {
$elemMatch: {$gt : {'createdAt' : yesterday /* or some new Date() */}}
}
}
});
However, $elemMatch only returns the FIRST element matching your condition. To return more than 1 element, you need to use the aggregation framework, which will be more efficient than _.each or forEach, particularly if you have a large array to loop through.
collection.rawCollection().aggregate([
{
$match: {}
},
{
$redact: {
$cond: {
if : {$or: [{$gt: ["$createdAt",yesterday]},"$packets"]},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}], function (error, result ){
});
You specify the $match in a way similar to find({}). Then all the documents that match your conditions get pipped into the $redact which is specified by the $cond.
$redact scans the document from top level to bottom. At the top level, you have _id, name, createdAt, settings, packets; hence {$or: [***,"$packets"]}
The presence of $packets in the $or allows the $redact to scan the second level which contain the _id, temperature and createdAt; hence {$gt: ["$createdAt",yesterday]}
This is async, you can use Meteor.wrapAsync to wrap around the function.
Hope this help

MongoDB - Aggregation on referenced field

I've got a question on the design of documents in order to be able to efficiently perform aggregation. I will take a dummy example of document :
{
product: "Name of the product",
description: "A new product",
comments: [ObjectId(xxxxx), ObjectId(yyyy),....]
}
As you could see, I have a simple document which describes a product and wraps some comments on it. Imagine this product is very popular so that it contains millions of comments. A comment is a simple document with a date, a text and eventually some other features. The probleme is that such a product can easily be larger than 16MB so I need not to embed comments in the product but in a separate collection.
What I would like to do now, is to perform aggregation on the product collection, a first step could be for example to select various products and sort the comments by date. It is a quite easy operation with embedded documents, but how could I do with such a design ? I only have the ObjectId of the comments and not their content. Of course, I'd like to perform this aggregation in a single operation, i.e. I don't want to have to perform the first part of the aggregation, then query the results and perform another aggregation.
I dont' know if that's clear enough ? ^^
I would go about it this way: create a temp collection that is the exact copy of the product collection with the only exception being the change in the schema on the comments array, which would be modified to include a comment object instead of the object id. The comment object will only have the _id and the date field. The above can be done in one step:
var comments = [];
db.product.find().forEach( function (doc){
doc.comments.forEach( function(x) {
var obj = {"_id": x };
var comment = db.comment.findOne(obj);
obj["date"] = comment.date;
comments.push(obj);
});
doc.comments = comments;
db.temp.insert(doc);
});
You can then run your aggregation query against the temp collection:
db.temp.aggregate([
{
$match: {
// your match query
}
},
{
$unwind: "$comments"
},
{
$sort: { "comments.date": 1 } // sort the pipeline by comments date
}
]);

$unwind an object in aggregation framework

In the MongoDB aggregation framework, I was hoping to use the $unwind operator on an object (ie. a JSON collection). Doesn't look like this is possible, is there a workaround? Are there plans to implement this?
For example, take the article collection from the aggregation documentation . Suppose there is an additional field "ratings" that is a map from user -> rating. Could you calculate the average rating for each user?
Other than this, I'm quite pleased with the aggregation framework.
Update: here's a simplified version of my JSON collection per request. I'm storing genomic data. I can't really make genotypes an array, because the most common lookup is to get the genotype for a random person.
variants: [
{
name: 'variant1',
genotypes: {
person1: 2,
person2: 5,
person3: 7,
}
},
{
name: 'variant2',
genotypes: {
person1: 3,
person2: 3,
person3: 2,
}
}
]
It is not possible to do the type of computation you are describing with the aggregation framework - and it's not because there is no $unwind method for non-arrays. Even if the person:value objects were documents in an array, $unwind would not help.
The "group by" functionality (whether in MongoDB or in any relational database) is done on the value of a field or column. We group by value of field and sum/average/etc based on the value of another field.
Simple example is a variant of what you suggest, ratings field added to the example article collection, but not as a map from user to rating but as an array like this:
{ title : title of article", ...
ratings: [
{ voter: "user1", score: 5 },
{ voter: "user2", score: 8 },
{ voter: "user3", score: 7 }
]
}
Now you can aggregate this with:
[ {$unwind: "$ratings"},
{$group : {_id : "$ratings.voter", averageScore: {$avg:"$ratings.score"} } }
]
But this example structured as you describe it would look like this:
{ title : title of article", ...
ratings: {
user1: 5,
user2: 8,
user3: 7
}
}
or even this:
{ title : title of article", ...
ratings: [
{ user1: 5 },
{ user2: 8 },
{ user3: 7 }
]
}
Even if you could $unwind this, there is nothing to aggregate on here. Unless you know the complete list of all possible keys (users) you cannot do much with this. [*]
An analogous relational DB schema to what you have would be:
CREATE TABLE T (
user1: integer,
user2: integer,
user3: integer
...
);
That's not what would be done, instead we would do this:
CREATE TABLE T (
username: varchar(32),
score: integer
);
and now we aggregate using SQL:
select username, avg(score) from T group by username;
There is an enhancement request for MongoDB that may allow you to do this in the aggregation framework in the future - the ability to project values to keys to vice versa. Meanwhile, there is always map/reduce.
[*] There is a complicated way to do this if you know all unique keys (you can find all unique keys with a method similar to this) but if you know all the keys you may as well just run a sequence of queries of the form db.articles.find({"ratings.user1":{$exists:true}},{_id:0,"ratings.user1":1}) for each userX which will return all their ratings and you can sum and average them simply enough rather than do a very complex projection the aggregation framework would require.
Since 3.4.4, you can transform object to array using $objectToArray
See:
https://docs.mongodb.com/manual/reference/operator/aggregation/objectToArray/
This is an old question, but I've run across a tidbit of information through trial and error that people may find useful.
It's actually possible to unwind on a dummy value by fooling the parser this way:
db.Opportunity.aggregate(
{ $project: {
Field1: 1, Field2: 1, Field3: 1,
DummyUnwindField: { $ifNull: [null, [1.0]] }
}
},
{ $unwind: "$DummyUnwindField" }
);
This will produce 1 row per document, regardless of whether or not the value exists. You may be able tinker with this to generate the results you want. I had hoped to combine this with multiple $unwinds to (sort of like emit() in map/reduce), but alas, the last $unwind wins or they combine as an intersection rather than union which makes it impossible to achieve the results I was looking for. I am sadly disappointed with the aggregate framework functionality as it doesn't fit the one use case I was hoping to use it for (and seems strangely like a lot of the questions on StackOverflow in this area are asking) - ordering results based on match rate. Improving the poor map reduce performance would have made this entire feature unnecessary.
This is what I found & extended.
Lets create experimental database in mongo
db.copyDatabase('livedb' , 'experimentdb')
Now Use experimentdb & convert Array to object in your experimentcollection
db.getCollection('experimentcollection').find({}).forEach(function(e){
if(e.store){
e.ratings = [e.ratings]; //Objects name to be converted to array eg:ratings
db.experimentcollection.save(e);
}
})
Some nerdy js code to convert json to flat object
var flatArray = [];
var data = db.experimentcollection.find().toArray();
for (var index = 0; index < data.length; index++) {
var flatObject = {};
for (var prop in data[index]) {
var value = data[index][prop];
if (Array.isArray(value) && prop === 'ratings') {
for (var i = 0; i < value.length; i++) {
for (var inProp in value[i]) {
flatObject[inProp] = value[i][inProp];
}
}
}else{
flatObject[prop] = value;
}
}
flatArray.push(flatObject);
}
printjson(flatArray);