Sorting MongoDB collection by same key with different values - mongodb

I am trying to query my Mongo database to display all values from a certain collection, sorted by all the values of a certain key. For example, I have the collection:
{
"id":"1235432",
"name":"John Smith",
"occupation":"janitor",
"salary":"30000"
},
{
"id":"23412312",
"name":"Mathew Colins",
"occupation":"janitor"
"salary":"32000"
},
{
"id":"7353452",
"name":"Depp Jefferson",
"occupation":"janitor"
"salary":"33000"
},
{
"id":"342434212",
"name":"Clara Smith",
"occupation":"Accountant",
"salary":"45000"
},
{
"id":"794563452",
"name":"Jonathan Drako",
"occupation":"Accountant",
"salary":"46000"
},
{
"id":"8383747",
"name":"Simon Well",
"occupation":"Accountant",
"salary":"41000"
}
and I am trying to display only the TOP 2 with highest salary by occupation. My query looks something like this at the moment:
Stats.find({occupation:{$exists:true}}).populate('name').sort({salary:1}).limit(2)
by that only returns only 1 result instead of one from each occupation.
How can I change my query to display the top 2 of each occupation by salary range?

You can use $aggregate as mentioned below.
db.collectionName.aggregate({$match:{occupation:{$exists:true}}},
{ $sort: {"salary":-1}},
{ $limit: 2},
{ $project: {"name":1,"salary":1,_id:0} })
Output JSON:
{"name" : "Jonathan Drako",
"salary" : "46000"},
{"name" : "Clara Smith",
"salary" : "45000"}

Related

Update a nested field with an unknown index and without affecting other entries

I have a collection with a layout that looks something like this:
student1 = {
"First_Name": "John",
"Last_Name": "Doe",
"Courses": [
{
"Course_Id": 123,
"Course_Name": "Computer Science",
"Has_Chosen_Modules": false
},
{
"Course_Id": 284,
"Course_Name": "Mathematics",
"Has_Chosen_Modules": false
}
]
};
I also have the following update query:
db.Collection_Student.update(
{
$and: [
{First_Name: "John"},
{Last_Name: "Doe"}
]
},
{
$set : { "Courses.0.Has_Chosen_Modules" : true }
}
);
This code will currently update the Computer Science Has_Chosen_Modules value to true since the index is hardcoded. However, what if I wanted to update the value of Has_Chosen_Modules via the Course_Id instead (as the course might not necessarily be at the same index every time)? How would I achieve this without it affecting the other courses that a given student is taking?
You can select any item in the sub array of your document by targeting any property in the sub array of your document by using dot .
You can easily achieve this by the following query.
db.Collection_Student.update(
{
First_Name: "John",
Last_Name: "Doe",
'Courses.Course_Id': 123
},
{
$set : { "Courses.$.Has_Chosen_Modules" : true }
}
);
Conditions in search filter are by default treated as $and operator, so you don't need to specifically write $and for this simple query.

MongoDB Aggregation with DBRef

Is it possible to aggregate on data that is stored via DBRef?
Mongo 2.6
Let's say I have transaction data like:
{
_id : ObjectId(...),
user : DBRef("user", ObjectId(...)),
product : DBRef("product", ObjectId(...)),
source : DBRef("website", ObjectId(...)),
quantity : 3,
price : 40.95,
total_price : 122.85,
sold_at : ISODate("2015-07-08T09:09:40.262-0700")
}
The trick is "source" is polymorphic in nature - it could be different $ref values such as "webpage", "call_center", etc that also have different ObjectIds. For example DBRef("webpage", ObjectId("1")) and DBRef("webpage",ObjectId("2")) would be two different webpages where a transaction originated.
I would like to ultimately aggregate by source over a period of time (like a month):
db.coll.aggregate( { $match : { sold_at : { $gte : start, $lt : end } } },
{ $project : { source : 1, total_price : 1 } },
{ $group : {
_id : { "source.$ref" : "$source.$ref" },
count : { $sum : $total_price }
} } );
The trick is you get a path error trying to use a variable starting with $ either by trying to group by it or by trying to transform using expressions via project.
Any way to do this? Actually trying to push this data via aggregation to a subcollection to operate on it there. Trying to avoid a large cursor operation over millions of records to transform the data so I can aggregate it.
Mongo 4. Solved this issue in the following way:
Having this structure:
{
"_id" : LUUID("144e690f-9613-897c-9eab-913933bed9a7"),
"owner" : {
"$ref" : "person",
"$id" : NumberLong(10)
},
...
...
}
I needed to use "owner.$id" field. But because of "$" in the name of field, I was unable to use aggregation.
I transformed "owner.$id" -> "owner" using following snippet:
db.activities.find({}).aggregate([
{
$addFields: {
"owner": {
$arrayElemAt: [{ $objectToArray: "$owner" }, 1]
}
}
},
{
$addFields: {
"owner": "$owner.v"
}
},
{"$group" : {_id:"$owner", count:{$sum:1}}},
{$sort:{"count":-1}}
])
Detailed explanations here - https://dev.to/saurabh73/mongodb-using-aggregation-pipeline-to-extract-dbref-using-lookup-operator-4ekl
You cannot use DBRef values with the aggregation framework. Instead you need to use JavasScript processing of mapReduce in order to access the property naming that they use:
db.coll.mapReduce(
function() {
emit( this.source.$ref, this["total_price"] )
},
function(key,values) {
return Array.sum( values );
},
{
"query": { "sold_at": { "$gte": start, "$lt": end } },
"out": { "inline": 1 }
}
)
You really should not be using DBRef at all. The usage is basically deprecated now and if you feel you need some external referencing then you should be "manually referencing" this with your own code or implemented by some other library, with which you can do so in a much more supported way.

mongodb fast tags query

I have a very large collection ( more than 800k ) and I need to implement a query for auto-complete ( based on word beginnings only ) functionality based on tags. my documents look like this:
{
"_id": "theid",
"somefield": "some value",
"tags": [
{
"name": "abc tag1",
"vote": 5
},
{
"name": "hij tag2",
"vote": 22
},
{
"name": "abc tag3",
"vote": 5
},
{
"name": "hij tag4",
"vote": 77
}
]
}
if for example my query would be for all tags that start with "ab" and has a "somefield" that is "some value" the result would be "abc tag1","abc tag3" ( only names ).
I care about the speed of the queries much more than the speed of the inserts and updates.
I assume that the aggregation framework would be the right way to go here, but what would be the best pipeline and indexes for very fast querying ?
the documents are not 'tag' documents they are documents representing a client object, they contain much more data fields that I left out for simplicity, each client has several tags and another field ( I changed its name so it wont be confused with the tags array ). I need to get a set without duplicates of all tags that a group of clients have.
Your document structure doesn't make sense - I'm assuming tags is an array and not an object. Try queries like this
db.tags.find({ "somefield" : "some value", "tags.name" : /^abc/ })
with an index on { "maintag" : 1, "tags.name" : 1 }. MongoDB optimizes left-anchored regex queries into range queries, which can be fulfilled efficiently using an index (see the $regex docs).
You can get just the tags from this document structure using an aggregation pipeline:
db.tags.aggregate([
{ "$match" : { "somefield" : "some value", "tags.name" : /^abc/ } },
{ "$unwind" : "$tags" },
{ "$match" : { "tags.name" : /^abc/ } },
{ "$project" : { "_id" : 0, "tag_name" : "$tags.name" } }
])
Index only helps for first $match, so same indexes for the pipeline as for the query.

Struggling to get ordered results from the last retrieved article, given array of elements to search in

I have a collections of objects with structure like this:
{
"_id" : ObjectId("5233a700bc7b9f31580a9de0"),
"id" : "3df7ce4cc2586c37607a8266093617da",
"published_at" : ISODate("2013-09-13T23:59:59Z"),
...
"topic_id" : [
284,
9741
],
...
"date" : NumberLong("1379116800055")
}
I'm trying to use the following query:
db.collection.find({"topic_id": { $in: [ 9723, 9953, 9558, 9982, 9833, 301, ... 9356, 9990, 9497, 9724] }, "date": { $gte: 1378944001000, $lte: 1378954799000 }, "_id": { $gt: ObjectId('523104ddbc7b9f023700193c') }}).sort({ "_id": 1 }).limit(1000)
The above query uses topic_id, date index but then it does not keep the order of returned results.
Forcing it to use hint({_id:1}) makes the results ordered, but the nscanned is 1 million documents even though limit(1000) is specified.
What am I missing?

Query grouped by two swap fields

I have collection messages with the following documents
{
"_id" : ObjectId("5164218f359f109fd4000012"),
"receiver_id" : ObjectId("5164211e359f109fd4000004"),
"sender_id" : ObjectId("5162de8a359f10cbf700000c"),
"body" : "Hello Billy!!!",
"readed" : false,
"updated_at" : ISODate("2013-04-09T14:11:27.17Z"),
"created_at" : ISODate("2013-04-09T14:11:27.17Z")
}
I need to make query for receive last messages(don't matter recieved or sended) for a given user (grouped by reciever_id+sender_id fields) and sorted by created_at.
To better explain the question, an example of how I did it in SQL:
SELECT DISTINCT ON (sender_id+receiver_id) * FROM messages
ORDER by (sender_id+receiver_id), created_at DESC
WHERE sender_id = given_user or receiver_id = given_user
I don't understand how to solve this problem with mondodb.
The Aggregation Framework in MongoDB 2.2+ provides the most obvious translation of your query. The MongoDB manual includes an SQL to Aggregation Framework Mapping Chart as a general guide, although there are definite differences in the two approaches.
Here's a commented example you can try in the mongo shell:
var given_user = ObjectId("5162de8a359f10cbf700000c");
db.messages.aggregate(
// match: WHERE sender_id = given_user or receiver_id = given_user
// NB: do the match first, because it can take advantage of an available index
{ $match: {
$or:[
{ sender_id: given_user },
{ receiver_id: given_user },
]
}},
{ $group: {
// DISTINCT ON (sender_id+receiver_id)
_id: { sender_id: "$sender_id", receiver_id: "$receiver_id" }
}},
// ORDER by (sender_id+receiver_id), created_at DESC
{ $sort: {
sender_id: 1,
receiver_id: 1,
created_at: -1
}}
)
Sample result:
{
"result" : [
{
"_id" : {
"sender_id" : ObjectId("5162de8a359f10cbf700000c"),
"receiver_id" : ObjectId("5164211e359f109fd4000004")
}
}
],
"ok" : 1
}
You may want to add additional fields on the grouping, such as a count of messages received.
If you actually want to combine the sender_id+receiver_id into a single field, you can use the $concat operator in MongoDB 2.4+.
There is no explicit way to do so. Let's review workarounds:
Way 1:
do the distinct at code level (after find), then just use find:
db.message.find({$or:[{sender_id:?}, {receiver_id:?}]})
Way 2:Using aggregation framework :
db.message.aggregate( [
{$match: {$or:[{sender_id:?}, {receiver_id:?}]},
$group: { _id: {sender:"$sender_id", receiver:"$receiver_id"},
other: { ... } } },
$sort: {sender_id,receiver_id,...}
] )
This way problem appears at sort level since sender_id, receiver_id is not the same as sender_id+receiver_id
Way 3: Introduce the surrogate field sender_id+receiver_id then use find or even distinct per Stennie hint.