I have an aggregation query that i am running on mongo4.4 and getting weird sorting order . If the order of two document is same , getting random sorting order for those document having same sorting order . Ideally if sorting order is same then results should be sorted by natural order . Query is running fine on mongo3.6 .
db.getCollection('job').aggregate([
{"$match":{"$text":{"$search":"\"Cleaner\""}}},
{"$match":{"active":true}},
{"$match":{"status":"OPEN"}},
{"$project":
{"id":1,"source":1,"feed":1,"cardType":1,"groupCategory":1,"isPremium":1,"premiumTillDate":1,"createdDate":1,"title":1,"featuredImageUrl":1,"companyName":1,"salaryType":1,"contractType":1,"jobDescription":1,"location":1,"scope":1,"microRole":1,"address":1,"minimumSalary":1,"hiringManagerName":1,"hiringManagerImageUrl":1,"createdBy":1,"perks":1,"showMapView":1,"distance":1,"startDate":1,"link":"$externalJobDetail.link","publishDateTime":"$externalJobDetail.publishDateTime","salaryDescription":"$externalJobDetail.salaryDescription","companyJobLogoURL":"$externalJobDetail.companyJobLogoURL","monetisation":"$monetisation.value","order":"$monetisation.value"}},
{"$sort":{"order":-1}},
{"$skip":0},
{"$limit":29}]
Add the _id field to the sort query to achieve a stable sort.
db.getCollection('job').aggregate(
[
// pipeline stages
{ $sort : { order : -1, _id: -1 } }
]
)
From the docs:
If a stable sort is desired, include at least one field in your sort that contains exclusively unique values. The easiest way to guarantee this is to include the _id field in your sort query.
Related
I am trying to use MongoDB to sort through the collection based on ID. However, there are multiple records with the same ID, and I would like to use the a mongodb query to sort through my arraylist and give me all the records based on if it was the most recently updated item. I do not want to remove the duplicates from the database, but I want my query to just give me the most recently updated record. Is that possible?
try this way
db.test.aggregate(
[
{ $sort: { _id: 1, updatedOn: -1 } }
]
);
1 indicates ascending order and -1 indicates descending order
I have a collection with over 10 Million records, I need to match with a particular field and get
the distinct _ids of the records set.
after the $match pipeline the result set becomes less than 5 Million.
if i group with id to get the unique ids, the execution time on my local environment is over 20 seconds.
db.getCollection('viewscounts').aggregate(
[
{
$match: {
MODULE_ID: 4,
}
},
{
$group: {
_id: '$ITEM_ID',
}
}
], { allowDiskUse: true })
If I get rid of either $match or $group and have only 1 pipeline, the execution time is less than 0.1 seconds.
I'm okay with limiting the _ids, but they should be unique.
Can anyone suggest a better way to get the results faster?
You have already implemented the best Aggregation pipelines possible for the query to get your desired output.
The reason why your query results are faster when using only one of the aggregation pipelines is that the query result returns partial output instead of the entire 5 million records. where when you add both the stages, the entire output of the $match stage has to be processed by $group stage resulting in more time.
The only way to optimize your aggregation query is to apply indexes on MODULE_ID and ITEM_ID keys
db.viewscounts.createIndex({MODULE_ID: 1}, { sparse: true })
db.viewscounts.createIndex({ITEM_ID: 1})
It should be faster after you perform the above two indexes on your viewscounts collection.
Additionally, you can also get your desired output from MongoDB distinct command. Give the below query a try and see if it helps.
db.getCollection('viewscounts').distinct("ITEM_ID", {"MODULE_ID": 4})
Note: The above query returns an array of unique key-values instead of objects like in the aggregation query
Hope this helps
I have the following query that first sort the documents then skip and limit 10 records, following is my query:
db.getCollection('jobpostings').aggregate([
{"$match":{
"expireDate":{"$gte": ISODate("2018-08-12T00:00:00.000Z")},
"publishDate":{"$lt": ISODate("2018-08-13T00:00:00.000Z")},
"isPublished":true,
"isDrafted":false,
"deletedAt":{"$eq":null},
"deleted":false,
"blocked":{"$exists":false}
}},
{"$lookup":{"from":"companies","localField":"company.id","foreignField":"_id","as":"companyDetails"}},
{"$match":{"companyDetails":{"$ne":[]}}},
{"$sort":{
"isFeatured":-1,
"refreshes.refreshAt":-1,
"publishDate":-1
}},
{"$skip":0},
{"$limit":10},
{"$project":{
"position":1,"summary":1,"company":1,"publishDate":1,
"expireDate":{"$dateToString":{"format":"%Y-%m-%d","date":"$expireDate"}},
"locations":1,"minimumEducation":1,"workType":1,"skills":1,"contractType":1,
"isExtensible":1,"salary":1,"gender":1,"yearsOfExperience":1,"canApplyOnline":1,"number":1,
"isFeatured":1,"viewsCount":1,
"status":{"$cond":{
"if":{"$and":[
{"$lt":["$publishDate", ISODate("2018-08-13T00:00:00.000Z")]},
{"$gt":["$publishDate", ISODate("2018-08-11T00:00:00.000Z")]}]},"then":"New",
"else":{"$cond":{
"if":{"$lt":["$publishDate",ISODate("2018-08-12T00:00:00.000Z")]},"then":"Old","else":"Future"}}}},
"companyDetails.profilePic":1,"companyDetails.businessUnits":1,"companyDetails.totalRatingAverage":1,
"expiringDuration":{"$floor":{"$divide":[{"$subtract":["$expireDate",ISODate("2018-08-12T00:00:00.000Z")]},
86400000]}},
"companyDetails.totalReviews":{"$size":{"$ifNull":[{"$let":{"vars":{
"companyDetailOne":{"$arrayElemAt":["$companyDetails",0]}},"in":"$$companyDetailOne.reviews"}},[]]}}}}
])
And if I comment skip and limit following is my result:
But following is my result with skip = 0, limit = 10:
Now compare above results with following for skip=10, limit=10: highlighted documents are duplicate in second page (skip=10, limit=10):
And the same thing existed in other pages, for other documents.
It looks like the three fields you're sorting by are not unique and therefore the order can be different in subsequent executions. To fix that you can add additional field to your $sort. Since _id is always unique it can be a good candidate. Try:
{"$sort":{
"isFeatured":-1,
"refreshes.refreshAt":-1,
"publishDate":-1,
"_id": -1
}}
All,
I have a mangodb collection with below fields.
_ID
Title
Description
Tags , array
I have created 2 index on _id and tags field. I have created index for people to search the content with help of keywords.
I have created the index with tags:-1 to show the latest inserted records to show first. But even after that it is showing in the ascending order of _id.
How to create the index on tags field to show the last inserted to show first at the same time it should allow me to search on tags field faster .
If the _id field is the default ObjectId which reflects the insertion order, and you want to query all the documents with a specific Tags by descending insertion order, you can use the query as below:
find({ Tags : $value }).sort({ _id : -1 })
For this query, you can create a compound index on { Tags : 1, _id : -1 }. All the documents with the same Tags will be sorted in descending insertion order and this index should work well for this query.
Please note that if you are doing range query on Tags, like:
find({ Tags : { $in : [ $value1, $value2 ] }}).sort({ _id : -1 })
find({ Tags : { $gt : $value}}).sort({ _id : -1})
It wouldn’t be able to use the index to sort the result documents, and will need to sort the results in memory. You can run the query with .explain(true) to check the query plan. If scanAndOrder is true, it means the query cannot use the order of documents in the index for returning sorted results.
There are also some documents and blogs relate to indexes and that I'd recommend reading:
http://docs.mongodb.org/manual/tutorial/sort-results-with-indexes/
http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
http://blog.mongolab.com/2012/06/cardinal-ins/
I'm trying to use the aggregation framework to group a lot of strings together to indentify the unique ones. I must also keep some information about the rest of the fields. This would be analogous to me using the * operator in mysql with a group by statement.
SELECT *
FROM my_table
GROUP BY field1
I have tried using the aggregation framework, and it works fine just to get unique fields.
db.mycollection.aggregate({
$group : { _id : "$field1"}
})
What if I want the other fields that went with that. MySQL would only give me the first one that appeared in the group (which I'm fine with). Thats what I thought the $first operator did.
db.mycollection.aggregate({
$group : {
_id : "$field1",
another_field : {$first : "$field2"}
}})
This way it groups by field1 but still gives me back the other fields attached to document. When I try this I get:
exception: aggregation result exceeds maximum document size (16MB)
Which I have a feeling is because it is returning the whole aggregation back as one document. Can I return it as another json array?
thanks in advance
You're doing the aggregation correctly, but as the error message indicates, the full result of the aggregate call cannot be larger than 16 MB.
Work-arounds would be to either add a filter to reduce the size of the result or use map-reduce instead and output the result to another collection.
If you unique values of the result does not exceed 2000 you could use group() function like
db.mycollection.group( {key : {field1 : 1, field2 : 1}}, reduce: function(curr, result){}, initial{} })
Last option would be map reduce:
db.mycollection.mapReduce( function() { emit( {field1 :1, field2: 1}, 1); }, function(key, values) { return 1;}, {out: {replace: "unique_field1_field2"}})
and your result would be in "unique_field1_field2" collection
Another alternative is use the distinct function:
db.mycollection.distinct('field1')
This functions accepts a second argument, a query, where you can filter the documents.